Fwd: Re: [OpenAFS] openafs OSD

2013-06-17 Thread Hartmut Reuter
Sorry this mail should have gone also to the list!

 Original Message 
Subject: Re: [OpenAFS] openafs OSD
Date: Thu, 30 May 2013 10:27:14 +0200
From: Hartmut Reuter reu...@rzg.mpg.de
To: Staffan Hämälä s...@ltu.se

Staffan,

as I have told you we are migrating from TSM-HSM to HPSS in order to get rid of
TSM-HSM. Five years ago we had a really bad experience with TSM-HSM running with
GPFS: a so called reconcile job had started in the background and had removed
millions of files from the TSM database. It never really has become clear what
caused this behaviour. We had much work to restore everything from dumps also
because this job ran for two or three days before it was detected, so new files
already migrated to tape were not in the restored database...

in 2007 I wrote an interface to dcache which can not be mounted as a file system
and only can be called by library calls. This same technique I used later when I
wrote the interface to HPSS which also works by calls to a shared library. Today
the rxosd loads a shared interface library when started with the appropriate
parameters which then itself contains the calls to the HPSS shared library. This
generic interface could be used also for other non-mountable archive systems
such as TSM. I was interested three years ago to do that for TSM, but never got
the necessary documentation about the TSM-library and so nothing happened.

A general problem with AFS-OSD is that I am the only person which knows the code
and which feels responsible to keep it running. I am 68 years old and now
retired, but still with good health and willing to work on it, but I think some
one else should start to do this work in order to make this project run longer
than other few years. I myself learnt these lessons 20 years ago when we had got
MR-AFS from PSC (Pittsburgh Supercomputer Center) and half a year later the
whole group of people who had worked on it left PSC. I became willing or not the
only developer of MR-AFS for the next 15 years until I replaced it by AFS-OSD.
There are many sites which principally are interested in AFS-OSD, but do not
deploy it because they don't see a long term support for it. Even my own site
RZG has planned a migration back to standard openafs for the next years unless
some one else appears to take the support over.

Hartmut

Staffan Hämälä wrote:
 Hi,
 
 Yes, I'm intereseted in having a look. I'm primary checking out what can be 
 done
 right now, and will start the real work on this after the summer.
 
 I heard from Ragge (Anders Magnusson, a former collegue of me) that he had
 talked with you at the AFS conference a few years ago. Apparently, you were
 interested in using the TSM API to connect to TSM's archive function as a
 backend. Do you know if any work has been done on this?
 
 If there isn't such a function today, would it be difficult to use TSM's 
 archive
 function as backend?
 
 As Harald says on the list, the archive function would be less complex. I had
 thought we would use TSM-HSM for this, but if there is a better option
 available, we should have a look at this as well. I would prefer to use TSM to
 handle the tapes, as our tape robot is currently connected only to TSM. But, 
 the
 TS3500 is capable of connecting to several applications at the same time, so
 that's not really a requirement. But I would like to have all tapes handled by
 TSM. :-)
 
 We currently use the archive function in TSM quite extensively. Our AFS 
 montly /
 yearly backups are archived, as well as volumes for inactive users, etc.
 
 I'm also interested to know why you are migrationg from TSM-HSM to HPSS as a
 backend?
 
 /Staffan
 
 On 2013-05-27 17:10, Hartmut Reuter wrote:
 Hello Staffan,

 AFS-OSD is still alive and in use at two sites: RZG and PSI. At RZG we are 
 using
 it with HPSS and TSM-HSM as HSM back-ends. We are in the process to copy all
 data from TSM-HSM over into HPSS and hope to get rid of TSM-HSM by the end of
 this year. Presently our HPSS contains already  8 million AFS-files with 
 729 TB
 of data.

 RZG's cell ipp-garching.mpg.de is also in the process to migrate from 
 AFS-OSD
 1.4 to AFS-OSD 1.6.

 The current source in git://github.com/hwr/openafs-osd.git is based on 
 openafs
 1.6.2.

 Unlike HPSS TSM_HSM does not require special interface routines because you 
 can
 access the file-system directly by posix-calls. We are using GPFS along with
 TSM-HSM, but I suppose you also use other file-systems as well.

 If you are interested have a look at the current version and feel free to ask
 more.

 Hartmut Reuter



 Staffan Hämälä wrote:
 Hi,

 What's the current status of AFS-OSD? I've found a few presentations from 
 2009
 and 2010, but nothing more recent.

 I haven't found anything about OSD and openafs 1.6.

 Is there any, more recent, information on how to implement openafs/OSD to
 connect to TSM's HSM module?

 /Staffan
 LTU Sweden
 ___
 OpenAFS-info mailing list
 OpenAFS

Re: [OpenAFS] Re: bos blockscanner

2013-01-26 Thread Hartmut Reuter

When OpenAFS started in 2000 I pushed a lot of extensions into it to make it
easier to maintain MR-AFS. I think at that time we already were the last site to
use it. The next years the source code of MR-AFS could be reduced to the very
specific parts on the server side while with these extensions for all other
subdirectories simply the openafs code could be used.

The scanner was a stand alone program which traversed the volume metadata to
find out which files needed to get a copy elsewhere and which files were
eligible for wiping (removing from disk). In extreme situations it could be
helpful to stop the scanner...

There are many other remains of MR-AFS in different source files of openafs
which savely could be removed now that MR-AFS is out of service since five 
years.

MR-AFS was shutdown finally in 2008. Since than we are running AFS/OSD which has
a all the features MR-AFS had and some more. I am working on the project to
bring the AFS/OSD change into the official openafs source code.

-Hartmut

Andrew Deason wrote:
 On Fri, 25 Jan 2013 19:50:41 +0100 (CET)
 Thorsten Alteholz open...@alteholz.de wrote:
 
 the command 'bos help' says something about 'blockscanner' and
 'unblockscanner'.  This shall start something like
 '/usr/afs/bin/scanner -block'. But I did not find anything about such
 scanner. Can anybody please shed some light on this? Is this a new
 fearture for the future?
 
 No, it is very old. I believe that is there to accommodate installations
 with MR-AFS. MR-AFS is for using AFS with HSM systems but is not freely
 available or open source or anything; you can google around for what
 little information exists about it. I'm not sure if it's still in use.
 Those rpcs hard code the commands they run, ugh...
 
 I don't have any familiarity with it, but from what I've seen in the
 OpenAFS source, I think the 'scanner' process is something that shuffled
 data between disk and tape. Normally it runs continuously or something,
 but those commands allow you to temporarily disable or enable
 migrations.
 


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: [OpenAFS-announce] OpenAFS 1.6.2 release candidate 3 available

2013-01-24 Thread Hartmut Reuter

You can't build openafs-1.6.2pre3 outside the source tree!

Build ends with

gcc  -O -I/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11/src/config
-I/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11/include
-I../../../src/libafsauthent -I.  -DAFS_PTHREAD_ENV -pthread -D_REENTRANT
-D_LARGEFILE64_SOURCE -c ../../../src/libafsauthent/../volser/vsutils.c
../../../src/libafsauthent/../volser/vsutils.c:44:20: fatal error: volser.h:
Datei oder Verzeichnis nicht gefunden
compilation terminated.
make[3]: *** [vsutils.o] Fehler 1
make[3]: Leaving directory
`/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11/src/libafsauthent'
make[2]: *** [libafsauthent] Fehler 2
make[2]: Leaving directory `/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11'
make[1]: *** [build] Fehler 2
make[1]: Leaving directory `/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11'
make: *** [all] Fehler 2


-Hartmut

Stephan Wiesand wrote:
 The OpenAFS 1.6 Release Managers announce that release candidate
 1.6.2pre3 has been tagged in the OpenAFS source repository, available
 at:
 
git://git.openafs.org/openafs.git
 
 as tag: openafs-stable-1_6_2pre3 .
 
 Source files and available binaries can be accessed via the web at:
 
http://www.openafs.org/release/openafs-1.6.2pre3.html
 
 or
 
http://dl.openafs.org/dl/candidate/1.6.2pre3/
 
 or via AFS at:
 
UNIX: /afs/grand.central.org/software/openafs/candidate/1.6.2pre3/
UNC: \\afs\grand.central.org\software\openafs\candidate\1.6.2pre3\
 
 Among many fixes and enhancements, this release candidate includes
 support for Linux kernels up to 3.7, OS X 10.8 and recent Solaris releases.
 
 This is believed to be very close to 1.6.2 final. The Kerberos related
 changes mentioned in the last announcement are not yet ready, and will
 probably be part of the next stable release.
 
 Please assist us by deploying this release and providing positive or
 negative feedback. Bug reports should be filed to openafs-b...@openafs.org .
 Reports of success should be sent to openafs-info@openafs.org .
 
 Paul Smeddle and Stephan Wiesand, 1.6 Branch Release Managers
 for the OpenAFS Release Team
 ___
 OpenAFS-announce mailing list
 openafs-annou...@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-announce
 


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Running OpenAFS on top of GPFS?

2012-11-30 Thread Hartmut Reuter
One of the big advantages of GPFS is that it is a fast cluster filesystem. With
normal AFS you can not make use of this feature because the fileserver doesn't
share his partitions with anyone else. So the fileserver would use GPFS just as
his local fileystem instead of XFS or something else. GPFS is good for large
files while many many files in AFS are small. In our cell with 800 TB and 200
million files 92 % of the files are smaller than 1 MB which is a reasonable
block size for GPFS.

However, since some years a special version of AFS called AFS/OSD exists which
allows to store large files in object storage. This object storage are servers
running a program called rxosd. The idea is to keep the small files in the
fileserver's partition where the volume resides and have the large files in
object storage. Now the point why GPFS is of special interest here: the GPFS
used by the rxosd could be shared by all the compute nodes in a cluster and the
modified AFS client would allow users on these compute nodes to read and write
data form and to the AFS files located inside the GPFS rxosd partition directly
with nearly the native GPFS speed (200-300 MB/s depending on the network being
used). User's outside the cluster would see the files as normal AFS files and
access them with the normal low transfer rate of AFS. I gave a talk about this
some years ago:

Embedded Filesystems (Direct Client Access to Vice Partitions) Talk at AFS 
Kerberos Best Practice Workshop 2009, Stanford, 2009, which you can download
from http://www.rzg.mpg.de/~hwr/Stanford.pdf;

If you wan't to know more about this, feel free to contact me.

Hartmut Reuter

Craig Strachan wrote:
 Dear All,
 
 The Central Computing Service at Edinburgh University is introducing a new
 University wide filesystem intended for research based data. We in
 Informatics have been asked about the possibility of us using of some of this
 new file space to either expand our existing cell or (more likely) set up a
 new cell for the whole University to use. Unfortunately, this new research
 file system is based on GPFS and so this would involve us running AFS on top
 of GPFS.
 
 Does anyone on this list have experience of running AFS on top of GPFS which
 they would be willing to share with us? Failing that, would anyone like to
 make an educated guess as to the problems we are likely to encounter if we
 try this?
 
 Any advice would be appreciated,
 
 Craig. --- Craig Strachan, Computing Officer, School of Informatics,
 University of Edinburgh
 
 
 
 
 


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Using AFS dB clone

2012-01-24 Thread Hartmut Reuter


If you want do avoid that your secondary db server with the lowest IP-address 
ever becomes sync site it's correct to use -clone for it. However, in case your 
main server goes down you will loose the sync site until it is back up if you 
don't have a another non-clone db-server.


-Hartmut Reuter

Tom Mukunnemkeril wrote:

I currently have just one AFS database server and was considering setting up
another machine to just be a clone for back up purposes.


I was looking at the documentation for bos addhost and it indicates under the
-clone option that this should be used with caution.

Are there any issues I need to watch out for to set this machine as a clone.
Additionally, this machine has a lower IP Address and I don't want it to be
considered a sync site.

I'm currently running openafs 1.6.0 on my client/servers and linux kernel
2.6.38.8 on Slackware 13.37 machines (64 bit).

Bos addhost Page: http://docs.openafs.org/Reference/8/bos_addhost.html


Based on discussion about 1.4.x quorum election:
https://lists.openafs.org/pipermail/openafs-info/2011-October/037050.html

Tom Mukunnemkeril

___ OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Doubts about OpenAFS implementation in a company

2011-05-18 Thread Hartmut Reuter

Stanisław Kamiński wrote:

First of all, hi to everyone - it's my first own topic here :-)

I'm working for a company ~1000 ppl, three offices in Poland and three
other in bordering countries. OpenAFS was introduced about 6 years ago,
when the company was quite a bit smaller, and the guy that did this left
no documentation and some of his design decision are making me scratch
my head - that's part of the reason I'm writing this.

Other things that are important:
- about 2/3 of users work on Linux (CentOS) workstations, and their
homedirs are served from AFS
- 1/3 are Windows users
- Polish offices are connected using at least 10 Mbit symmetric links,
but the offices abroad might have much less. In one particular example,
the link is assymmetric 10/1 Mbit (d/u)
- there is single AFS cell covering all the offices
- every office has it's own db and fileserver (Debian 5/6)
- we rely on our partner to assign IP address space for us - net result
is that the weakest link location (10/1) has the lowest IP and there
_nothing_ we can do about it

The last thing causes Ubik elections to constantly choose the server
located on the weakest link as sync site.


This can be changed by making the slow database server with lowest ip-address a 
clone. Aclone never can become sync site and his votes do not count.

Use bos removehost dbserver slowdbserver for all dbservers and then
bos addhost debserver slowdbserver -clone on all dbservers and restart 
everywhere the database instances.




Also, we quite often have to move user volumes between different offices
- we've got quite a bit of rotation between them, say some 10-20 ppl per
week.

Now, I've been assigned to improve AFS performance in any way possible.
It was very bad, then I changed server parameters to tune it to large
server options - that yield enormous speedup, but I still believe I can
get much more from the system.

There are two things that are, ahem, not as fast as one would like. The
worse one is directory traversal - moving between levels of directories
can take 5-10 seconds (on a workstation with 1 Gbit link to AFS server
in its location). The other one is the upload/download speed itself -
last time I measured, windows client d/u was 2/5 MB/s - I think I can
get more than that.

As I'm currently making my way through Managing AFS by Richard
Campbell, I'm not yet fully up-to-speed on OpenAFS inner workings and
such. Right now I only want to ask: is the design of our AFS system
correct? Or did the guy introducing it made some short-sighted
projections which don't hold water in current environment (as
described). I'm talking here about single-cell design - although I'm not
sure it's easy to move volumes between different cells.

Other thing I'm worried about: can it be that having the sync site on
slowest uplink causes everything to slow down? Is there any way to get
some measurements for this?

Thanks for reading all of this and not falling asleep :-) And waiting
for you comments,
Stan
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Large files with 1.6.0pre2

2011-03-04 Thread Hartmut Reuter

Ryan C. Underwood wrote:


I am having trouble copying a large file (6GB) from a volume located on
one server to a volume located on another server.  After about 2GB
(2147295232 bytes to be exact), the volume gets offlined and marked
needs salvage.  I have reproduced this reliably several times.  This
large file and the volume it sits on was created with 1.4.x, while the
destination volume was created with 1.6.x, has something changed with
large file support on 32-bit builds perhaps?



What says the VolserLog on the receiving side? There were at least architectures 
(AIX) where you need explicitly to allow large files with ulimit.


-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Problem with Off-line volumes...unable to bring On-line

2011-01-24 Thread Hartmut Reuter
Looks like a crash of the salvager. The SalvageLog should end differently with 
the summary line for the RW-volume. Are there any core files in /usr/afs/logs? 
If not, make sure ulimit for core file size isn't set to 0 and retry.


You also could run the salvager by hand under gdb to see why it crashes. You 
need then to add the -debug flag to prevent it from forking. E.g.


gdb /usr/afs/bin/salvager
...
(gdb) run /vicepb 536871656 -debug


Good luck,
Hartmut

McKee, Shawn wrote:

Hi Everyone,

I am having a problem with one of my OpenAFS file servers. About ½ of
the volumes are “Off-line” and I am unable to bring them online. First
some system info and then I will list problem details and what I have tried.

The system is running Scientific Linux 5.5/x86_64 (basically CentOS 5.5
64-bit). The openafs rpms are:

[atums2:~]# rpm -qa | grep openafs

openafs-kpasswd-1.4.12-6.cern

openafs-client-1.4.12-6.cern

kernel-module-openafs-2.6.18-194.3.1.el5-1.4.12-5.cern

openafs-1.4.12-6.cern

kernel-module-openafs-2.6.18-194.8.1.el5-1.4.12-5.cern

openafs-krb5-1.4.12-6.cern

kernel-module-openafs-2.6.18-238.1.1.el5-1.4.12-6.cern

openafs-server-1.4.12-6.cern

The version of ‘e2fsprogs’ is 1.39

The system has an ext3 1TB partition for AFS:

[atums2:~]# df /vicepb

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda1 1007931664 635382472 321349196 67% /vicepb

The system has 931 volumes and only 470 are On-line while 461 are Off-line:

[atums2:~]# vos listvol atums2

Total number of volumes on server atums2 partition /vicepb: 931

chamber.OLD_eml4a07 536872814 RW 8634169 K Off-line

chamber.OLD_eml4a07.readonly 536872815 RO 8634169 K On-line

chamber.OLD_eml4a09 536872817 RW 702642 K Off-line

chamber.OLD_eml4a09.readonly 536872818 RO 702642 K On-line

…

Total volumes onLine 470 ; Total volumes offLine 461 ; Total busy 0

I have run ‘bos salvage’ on the partition multiple times. I have
restarted the system. I have run a force fsck.ext3 check on the
underlying partition (no problems found). Only RW volumes are Off-line.
All RO volumes are On-line. There are a few RW volumes On-line (8 out of
469) but the rest won’t come On-line.

Here is a particular volume which is Off-line:

[atums2:~]# vos examine chdata.sn

chdata.sn 536871656 RW 598 K Off-line

atums2.cern.ch /vicepb

RWrite 536871656 ROnly 0 Backup 0

MaxQuota 1000 K

Creation Fri May 26 04:02:49 2006

Copy Wed Oct 11 12:35:42 2006

Backup Sun Jun 11 00:30:10 2006

Last Access Fri Jan 7 16:38:32 2011

Last Update Wed Apr 4 15:29:42 2007

0 accesses in the past day (i.e., vnode references)

RWrite: 536871656 ROnly: 536871657 RClone: 536871657

number of sites - 3

server atums1.cern.ch partition /vicepi RO Site -- Old release

server atums2.cern.ch partition /vicepb RW Site -- New release

server atums2.cern.ch partition /vicepb RO Site -- New release

Try to bring online:

[atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn

The FileLog shows:

Sun Jan 23 22:57:03 2011 GetBitmap: addled vnode index in volume
chdata.sn; volume needs salvage

Sun Jan 23 22:57:03 2011 VAttachVolume: error getting bitmap for volume
(/vicepb//V0536871656.vol)

Try to Salvage:

[atums2:~]# bos salvage atums2 /vicepb chdata.sn

Starting salvage.

bos: salvage completed

The SalvageLog shows:

[atums2:~]# tail /usr/afs/logs/SalvageLog

@(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656

01/23/2011 22:58:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepb 536871656)

01/23/2011 22:58:19 2 nVolumesInInodeFile 64

01/23/2011 22:58:19 CHECKING CLONED VOLUME 536871657.

01/23/2011 22:58:19 chdata.sn.readonly (536871657) updated 04/04/2007 15:29

01/23/2011 22:58:19 Partially allocated vnode 2 deleted.

Try again:

[atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn


FileLog has the same message:

Sun Jan 23 22:59:05 2011 GetBitmap: addled vnode index in volume
chdata.sn; volume needs salvage

Sun Jan 23 22:59:05 2011 VAttachVolume: error getting bitmap for volume
(/vicepb//V0536871656.vol)

Salvage attempt again:

[atums2:~]# bos salvage atums2 /vicepb chdata.sn

Starting salvage.

bos: salvage completed

[atums2:~]# tail /usr/afs/logs/SalvageLog

@(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656

01/23/2011 23:00:07 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager
/vicepb 536871656)

01/23/2011 23:00:07 2 nVolumesInInodeFile 64

01/23/2011 23:00:07 CHECKING CLONED VOLUME 536871657.

01/23/2011 23:00:07 chdata.sn.readonly (536871657) updated 04/04/2007 15:29

01/23/2011 23:00:07 Partially allocated vnode 2 deleted.

Same result as if the prior salvage didn’t do anything. This is exactly
what happens on other volumes I have tried to bring online.

So how would I fix this? Any suggestions for how to get the rest of
these volumes On-line?

Let me know if you need further details. Thanks,

Shawn




--
-
Hartmut Reuter  e-mail  reu

Re: [OpenAFS] Locked Volume

2010-12-20 Thread Hartmut Reuter

Angela Hilton wrote:

Hi

I don't normally make requests of the lists but my colleague (Owen leBlanc)
is on leave.

I have a volume that has become locked.  I'm not certain of the reason for
this or even how to trace the problem.  There are actually 2 locked volumes
volume.name and volume.name.readonly.  There is a cron job that usually runs
to releases the read/write to the read only (I've paused this for the time
being)

I realise that I can issue vos unlock -id volume.name BUT, I am unsure if
there are any potential problems that this could cause.

Can anyone offer any advice?


In this case I normally do an 'vos examine' for the volume and then
e 'vos status ' for the fileserver vos exameine showed me that the RW-volume 
lives on. If that server doesn't show any activity on behalf of the volume I 
think it's save to unlock the volume. Such a situation may be caused by a 
volserver crash or a killed vos command or a network outage during a vos command.


Hartmut Reuter



TIA Angela

Thanks Angela -- -- Angela Hilton Web Infrastructure Coordinator
Infrastructure Applications IT Services Division The University of
Manchester G50 Kilburn Building, Oxford Rd Manchester M13 9PL

t: +44(0)161 275 8335 e: angela.hil...@manchester.ac.uk
___ OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Proposed changes for server log rotation

2010-12-03 Thread Hartmut Reuter

chas williams - CONTRACTOR wrote:

On Thu, 02 Dec 2010 22:22:11 -0500
Michael Meffiemmef...@sinenomine.net  wrote:


The key point is that currently some sites may be relying on
weekly restarts and the current rename from FileLog to FileLog.old
to avoid filling a disk partition. I think a more sensible approach
in long term, for sites that choose to log to regular files,
is to just let the server append, and let the modern log rotate
tool of your choice deal with the log rotation.


perhaps the absolute minimum would be to implement a signal that causes
the log files to be closed and reopened just like a restart.  this
could be issued weekly via bosserver to emulate the restart behavior.
people want new behavior like syslog, would need opt in and change
command line params (eventually switch to this as the default).


This signal already exists: kill -HUP is executed by a cron job at our site 
each midnight to close all server logs. If you set mrafsStyleLogs you also

get nice date and time suffixes instead of .old.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-23 Thread Hartmut Reuter

Simon Wilkinson wrote:



Yep, this is what's happening in the trace Achim provided, too. Every 4k
we write the chunk. I'm not sure how that's possible unless something is
closing the file a lot, or the cache is full of stuff we can't kick out.



Actually, it's entirely possible. Here's how it all goes wrong...

When the cache is full, every call to write results in us attempting to
empty the cache. On Linux the page cache means that we only call write
once for each 4k chunk. However, our attempts to empty the cache are a
little pathetic. We just attempt to store all of the chunks of the file
currently being written back to the fileserver. If it's a new file there
is only one such chunk - the one that we are currently writing. As
chunks are much larger than pages, and when a chunk is dirty we flush
the whole thing to the server, this is why we see repeated writes of the
same data. The process goes something like this:

*) Write page at 0k, dirties first chunk of file.
*) Discover cache is full, flush first chunk (0-1024k) to the file server
*) Write page at 4k, dirties first chunk of file
*) Cache is still full, flush first chunk to file server
*) Write page at 8k, dirties first chunk of file

... and so on.

The problem is that we don't make good decisions when we decide to flush
the cache. However, any change to flush items which are less active will
be a behaviour change - in particular, on a multi-user system it would
mean that one user could break write-on-close for other users simply by
filling the cache.


The problem here ist that afs_DoPartialWrite is called with each write. Normally 
it gets out without doing anything, but if the percentage of dirty chunks is to 
high it triggers a background store. However, this can happen multiple times 
before the background job starts executing. Therefore I introduced in AFS/OSD a 
new flag bit CStoring which is switched on when the background task is submitted 
and switched off when it's done. And during that time no new background stores 
are scheduled for this file.


Hartmut


Cheers,

Simon.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: bonnie++ on OpenAFS

2010-11-22 Thread Hartmut Reuter

Achim Gsell wrote:


On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote:



On 22 Nov 2010, at 23:06, Achim Gsell wrote:


3.) But if I first open 8 files and - after this is done - start writing
to these files sequentially, the problem occurs. The difference to 1.)
and 2.) is, that I have these 8 open files while the test is running.
This simulates the putc-test of bonnie++ more or less:


AFS is a write-on-close filesystem, so holding all of these files open
means that it is trying really hard not to flush any data back to the
fileserver. However, at some point the cache fills, and it has to start
writing data back. In 1.4, we make some really bad choices about which data
to write back, and so we end up thrashing the cache. With Marc Dionne's
work in 1.5, we at least have the ability to make better choices, but
nobody has really looked in detail at what happens when the cache fills, as
the best solution is to avoid it happening in the first place!


Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1GB
disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast
then performance drops to  3MB/s ...


We are always using memcache with only 64 or 256 MB, but I have seen this 
problem, too. I think it's on the server side: Today's server have a lot of 
memory and the data are written into the buffers first. Only when the buffers 
rach the limit the operating system starts to really sync them out to the disks. 
And with this huge amount of buffers you regularly see for some time the 
performance going down. I suppose that during the sync the fileserver's writes 
are hanging.


Hartmut



So long

Achim


___ OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] pts: Permission denied ; unable to create user admin

2010-10-19 Thread Hartmut Reuter


You are probably creating a new cell, right?
If so, you may run bos setauth localhost -authrequired off -localauth on your 
database server(s).


If you do that in a living cell it's a security risk, off course.

With bos setauth machine -authrequired on  you can then go back to the secure 
mode.


Hartmut Reuter

fosiul alam wrote:

He every one,
I need help to solved this issue
I am following bellow link to afs server

http://docs.openafs.org/QuickStartUnix/ch02s15.html

I believed i followed every steps
but when i am trying to create user

  pts createuser -name admin -noauth
pts: Permission denied ; unable to create user admin


i cant create user admin
What i am missing ??

I was comparing with another website.. where i saw few commands are
different then openafs website
example :
http://redflo.de/tiki-index.php?page=Configure+AFS+Server

bos createdopey.redflo.de  http://dopey.redflo.de  buserver simple 
/usr/lib64/openafs/buserver -cellredflo.de  http://redflo.de  -noauth
bos createdopey.redflo.de  http://dopey.redflo.de  ptserver simple 
/usr/lib64/openafs/ptserver -cellredflo.de  http://redflo.de  -noauth
bos createdopey.redflo.de  http://dopey.redflo.de  vlserver simple 
/usr/lib64/openafs/vlserver -cellredflo.de  http://redflo.de  -noauth

but openafs documentation say :

**# ./bos createmachine name  buserver simple /usr/afs/bin/buserver  
-noauth

# ./bos createmachine name  ptserver simple /usr/afs/bin/ptserver -noauth

# ./bos createmachine name  vlserver simple /usr/afs/bin/vlserver -noauth


So openafs website does not use -cell cellname paremeter ..


anyway .. i know what ever in openafs website have to true.

so can any one please tel me what i am doing wrong ??
thanks for your help and patiences

Fosiul

*

*






--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule

2010-06-16 Thread Hartmut Reuter
Russ Allbery schrieb:
 I'm aware of the following (largish) things that we want to deprecate or
 remove:
 
 * --enable-fast-restart and --enable-bitmap-later are earlier attempts to
   solve the problem that is solved in a more complete way by demand
   attach.  Demand attach will be available in 1.6 but not enabled by
   default.  These two options will conflict with demand-attach; in other
   words, you won't be able to enable either of them and demand attach at
   the same time.
 
   At the point at which we make demand attach the default, rather than
   optional behavior, I believe we should remove the code for these two
   flags.  I think that should be for either 1.10 or 2.0 based on
   experience with running 1.6 in production.  In the meantime, please be
   aware that most of the developers don't build with those flags by
   default and the code is not heavily tested.
 
   This code is not enabled by default, so if you're not compiling yourself
   and passing those flags to configure, you're not using this and don't
   need to worry about it.
 

Without --enable-fast-restart after a fileserver crash the salvager used to
salvage all volumes in all partitions before the start of the fileserver.
On large fileservers this could take hours and sometimes the salvager went out
of memory and crashed himself leaving still volumes not attachable.

With the Demand Attach Fileserver (DAFS) this initial salvage is not necessary
any more, however, each volume which was not cleanly detached before gets
salvaged in the background. This is a nice feature which allows the most
demanded volumes to come up soonly, I hope, but still salvaging will take hours
because it's the same amount of work that has to be done.

When I looked into the SalvageLog after a fileserver or machine crash I found
out that except the increment of the next uniquifier nearly never anything
important happened. Therefore I wrote many years ago the code to skip the
automatic salvage and sent it to openafs in 2001.

Right now I am working on the integration of rxosd into 1.5.74 (which represents
more or less the actual git master). I enabled for us again to have both options
(fast restart and demand attach) in parallel because my feeling is that a crash
of a  heavily used large fileserver with only demand attach still will be a pain
for a rather long time.

Hartmut

-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Crash in volserver when restoring volume from backup.

2009-08-26 Thread Hartmut Reuter
Anders Magnusson wrote:
 Hi,
 
 I have a problem that I need some advice on how to go on with.
 
 I have a volume dump file, but when trying to read it back volserver
 crashes.
 
 The dump was generated under 1.4.8, and the volserver segv appears with
 both 1.4.8 and 1.4.11.
 
 VolserLog.old says:
 Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0
 (/usr/afs/bin/volserver -p 16)
 Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421
 (students.waqazi-4) created
 
 BosLog says:
 Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11
 
 Any hints where to go from here?  I can provide the dump file on
 request, but since it's a
 student home directory I don't want it to be public.
 
 -- Ragge

Attach the volserver with gdb before running the command. Then you may
be able to see where it crashes and why.

You also could try to analyze the dump file with dumptool which is built
under sudirectory src/tests.

Hartmut

 
 
 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Crash in volserver when restoring volume from backup.

2009-08-26 Thread Hartmut Reuter
Anders Magnusson wrote:
 Hartmut Reuter wrote:
 Anders Magnusson wrote:
  
 Hi,

 I have a problem that I need some advice on how to go on with.

 I have a volume dump file, but when trying to read it back volserver
 crashes.

 The dump was generated under 1.4.8, and the volserver segv appears with
 both 1.4.8 and 1.4.11.

 VolserLog.old says:
 Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0
 (/usr/afs/bin/volserver -p 16)
 Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421
 (students.waqazi-4) created

 BosLog says:
 Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11

 Any hints where to go from here?  I can provide the dump file on
 request, but since it's a
 student home directory I don't want it to be public.

 -- Ragge
 

 Attach the volserver with gdb before running the command. Then you may
 be able to see where it crashes and why.

   
 Done:
 # gdb /usr/afs/bin/volserver
 GNU gdb Fedora (6.8-27.el5)
 Copyright (C) 2008 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 http://gnu.org/licenses/gpl.html
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type show copying
 and show warranty for details.
 This GDB was configured as x86_64-redhat-linux-gnu...
 (gdb) symb
 Discard symbol table from `/usr/afs/bin/volserver'? (y or n) y
 No symbol file now.
 (gdb) help file
 Use FILE as program to be debugged.
 It is read for its symbols, for getting the contents of pure memory,
 and it is the program executed when you use the `run' command.
 If FILE cannot be found as specified, your execution directory path
 ($PATH) is searched for a command of that name.
 No arg means to have no executable file and no symbols.
 (gdb) help symb
 Load symbol table from executable file FILE.
 The `file' command can also load symbol tables, as well as setting the file
 to execute.
 (gdb) symb volserver.debug
 Reading symbols from /usr/afs/bin/volserver.debug...done.
 (gdb) attach 30720
 Attaching to program: /usr/afs/bin/volserver, process 30720
 Reading symbols from /lib64/libpthread.so.0...done.
 [Thread debugging using libthread_db enabled]
 [New Thread 0x2ab6a9d3be90 (LWP 30720)]
 [New Thread 0x4dd02940 (LWP 30740)]
 [New Thread 0x4d301940 (LWP 30739)]
 [New Thread 0x4c900940 (LWP 30738)]
 [New Thread 0x4beff940 (LWP 30737)]
 [New Thread 0x4b4fe940 (LWP 30736)]
 [New Thread 0x4aafd940 (LWP 30735)]
 [New Thread 0x4a0fc940 (LWP 30734)]
 [New Thread 0x496fb940 (LWP 30733)]
 [New Thread 0x48cfa940 (LWP 30732)]
 [New Thread 0x482f9940 (LWP 30731)]
 [New Thread 0x478f8940 (LWP 30730)]
 [New Thread 0x46ef7940 (LWP 30729)]
 [New Thread 0x464f6940 (LWP 30728)]
 [New Thread 0x45af5940 (LWP 30727)]
 [New Thread 0x450f4940 (LWP 30726)]
 [New Thread 0x446f3940 (LWP 30725)]
 [New Thread 0x43cf2940 (LWP 30724)]
 [New Thread 0x432f1940 (LWP 30723)]
 [New Thread 0x428f0940 (LWP 30722)]
 [New Thread 0x41eef940 (LWP 30721)]
 Loaded symbols for /lib64/libpthread.so.0
 Reading symbols from /lib64/libresolv.so.2...done.
 Loaded symbols for /lib64/libresolv.so.2
 Reading symbols from /lib64/libc.so.6...done.
 Loaded symbols for /lib64/libc.so.6
 Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
 Loaded symbols for /lib64/ld-linux-x86-64.so.2
 0x003c13c0a899 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
 (gdb) c
 Continuing.
 [New Thread 0x4e703940 (LWP 30748)]
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x4aafd940 (LWP 30735)]
 0x003c13078d60 in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x003c13078d60 in strlen () from /lib64/libc.so.6
 #1  0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999,
fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395
 #2  0x00416a60 in vFSLog (
format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore
 aborted\n,

This is the message the volserver wanted to write into the VolserLog.
Unfortunately afs_error_message(errno) didn't return a usable string so
it came to the crash.

However, if you repeat that experiment you may say up 4 to get to
dumpstuff.c:1214 and then you can do a

print *vnode
and
print vnodeNumber

to see which vnode it is.

A possible reason why IH_CREATE could fail is that you already tried
this so many times that in the linktable of the volume all tags for this
vnode number are already in use.

Because of the crashes the volserver didn't remove the remains of the
unsuccessful restores.

args=0x4) at ../util/serverLog.c:135
 #3  0x0042550e in Log (
format=0x1311e7ec Address 0x1311e7ec out of bounds) at
 ../vol/common.c:41
 #4  0x0040e60c in RestoreVolume (call=value optimized out,
avp=0x2c03e780, incremental=value optimized out,
cookie=value optimized out) at ../volser/dumpstuff.c:1214
 #5  0x004067f4 in VolRestore (acid=0x41ff510,
atrans=value optimized out, aflags=1, cookie=0x4aafd000

Re: [OpenAFS] Resilience

2009-06-02 Thread Hartmut Reuter
Wheeler, JF (Jonathan) wrote:
 One of our (3) AFS servers has a mounted read-write volume which must be
 available 24x7 to our batch system.  The server is as resilient is we
 can make it, but still it may fail outside normal working hours for some
 reason.  For technical reasons related to the software installed on the
 volume it is not possible to use read-only volumes mounted from our
 other servers (the software must be installed and served from the same
 directory name), so I have devised the following plan in the event of a
 failure: 
 
 a) create read-only volumes on the other 2 servers, but do not mount
 them; use vos release whenever the software is updated
 b) in the event of a failure of server1 (which has the rw volume), drop
 the existing mount and mount one of the read-only volumes (we can live
 with the read-only copy whilst server1 is being repaired/replaced) in
 its place.
 
 Can anyone see problems with that scenario ?  We could use vos
 convertROtoRW; how would that affect the process ?

The problem with convertROtoRW is that a dying fileserver doesn't send
callbacks to the client as would happen when you move the RW-volume to
another place. So you will have to do a fs checkvol on all clients to
make sure they don't wait forever for the broken server, but use instead
the newly created RW-volume. Our backup strategy is completely based on
the possibility to do  convertROtoRW. CRON jobs on the batch worker do
the fs checkvol once in a while...

Hartmut
 
 Jonathan Wheeler 
 e-Science Centre 
 Rutherford Appleton Laboratory
 
 


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] solaris 10 versions supporting inode fileservers

2009-05-13 Thread Hartmut Reuter
David R Boldt wrote:
 
 We use Solaris 10 SPARC exclusively for our AFS servers.
 After upgrading to 1.4.10 from 1.4.8 we had a very few
 volumes that started spontaneously going off-line, recovering,
 and then going off-line again until they needed to be salvaged.
 
 Hearing that this might be related to inode, we moved these
 volumes to a set of little use fileservers that were running
 namei at 1.4.10. It made no discernible difference.
 
 Two volumes in particular accounted for 90% of our off-line
 volume issues.
 
 FileLog:
 Mon Apr 27 10:56:09 2009 Volume 2023867468 now offline, must be salvaged.
 Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.
 Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.
 Mon Apr 27 10:56:22 2009 fssync: volume 2023867469 restored; breaking
 all call backs
 (restored vol above being R/O for R/W in need of salvage)

That's interesting: I saw similar behavior on some of our volumes,
however, with AFS/OSD fileservers. I then made the ViceLog messages more
 eloquent and found out that this always happened when IH_OPEN failed.
This can fail if the handle in the vnode is missing. To prevent that I
added some lines in VGetVnode_r when an already existing vnode structure
is found to check whether the handle is in place and if not do a new
IH_INIT (and write a message into the log). I found about 100 cases per
day in our cell, but not all of them would have ended in taking the
volume off-line because in many cases the handle never would have been
used (All the GetStatus RPCs). Since then I never again saw volumes
going off-line.

Hartmut
 
 Both of the volumes most frequently impacted have content
 completely rewritten roughly every 20 minutes while being on
 an automated replication schedule of 15 minutes. One of them
 25MB, the other 95MB, both at about 80% quota.
 
 We downgraded just the fileserver binary to 1.4.8 on all of
 our servers and have not seen a single off-line message in
 36 hours.
 
 
 -- David Boldt
 dbo...@usgs.gov


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-22 Thread Hartmut Reuter
Todd DeSantis wrote:
 Hi Rainer - Hi Hartmut ::
 

 Yes, of course, but what error changed the MaxVolumeId in the vleserver
 is still completely unclear. BTW also we had a giant jump in the volume
 ids some years ago, but fortunately it was not big enough to reach the
 sign bit.

 
 The MaxVolumeId can be changed several ways, via a vos restore and I
 believe a vos syncvldb or syncserv.
 
 Most likely, the initial jump was via the vos restore command.
 
 [src] vos restore -h
 Usage: vos restore -server machine name -partition partition name
 -name nam
 e of volume to be restored [-file dump file] [-id volume ID]
 [-overwrite a
 bort | full | incremental] [-cell cell name] [-noauth] [-localauth]
 [-verbose
 ] [-timeout timeout in seconds ] [-help]
 
 If you use the [-id volume ID] and have a typo in the volume ID, the
 volumeID for the volume will be out of normal sequence and this will set
 the MaxVolumeID to this large number.
 
 Also, I believe that a vos syncvldb or syncserv will check the volumeIDs
 it is playing with and will check it against the MaxVolumeID and raise
 MaxVolumeID if necessary.
 
 I think when we saw this happen to an AFS cell, we gave the customer a
 tool to reset the MaxVolumeID to a more manageable number and they
 restored the volumes and gave them lower IDs.
 
 Thanks
 
 Todd DeSantis
 
Thank you Todd,

when this happened the 1st time I hexedited a copy of the vldb and reset
the maxVolumeId. Then having seen that the database version was still
the same I just copied my modified database over the actual one. But the
second time it happened I had already so many volumes with high numbers
that I resigned. Since then we live with these numbers...

I always had suspected ubik to have produced the jump but what you say
vos restore or vos sync looks much more probable.

Hartmut

-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-20 Thread Hartmut Reuter
Jeffrey Altman wrote:
 Giovanni Bracco wrote:
 I want to point out that in the past the issue of volumes with too large ID
 emerged also in our cell (enea.it). At that time (2002) we still had AFS 
 Transarc and the support provided us with a patched  AFS version, able to 
 operate with volumes having  too large IDs.

 Before migrating to OpenAFS we had to recover the normal AFS behaviour and 
 the 
 procedure we did at that time (2005) was described at 
 AFS  Kerberos Best Practices Workshop 2005 in Pittsburgh
 http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsburgh2005.pdf.

 At that time it was not clear the reason of the initial  problem. Do have I 
 to 
 assume that now it has been identified?

 Giovanni
 
 This just goes to show that giving a talk at a workshop is not
 equivalent to submitting a bug report to openafs-b...@openafs.org.
 If this issue had been submitted to openafs-bugs, it would have been
 addressed a long time ago.   The problem is quite obvious.  Some of the
 volume id variables are signed and others are unsigned.  A volume id is
 a volume id and the type used to represent it must be consistent.

Yes, of course, but what error changed the MaxVolumeId in the vleserver
is still completely unclear. BTW also we had a giant jump in the volume
ids some years ago, but fortunately it was not big enough to reach the
sign bit.

Hartmut

 
 Jeffrey Altman
 
 
 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread Hartmut Reuter

What says the VolserLog on the source server?

-Hartmut
McKee, Shawn wrote:
 Hi Everyone,
 
 I am having a problem trying to 'vos move' volumes after
 losing/restoring an AFS file server.   The server that was lost has
 been restored on new hardware.  The old RW volumes were moved to
 other servers (convertROtoRW) and now I want to use the 'vos move'
 command to move them back.
 
 Here is what happens (I have tokens as 'admin'.  Linat07 is the
 current RW home for OSGWN and Linat08 is the new server):
 
 vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose Starting
 transaction on source volume 536874901 ... done Allocating new volume
 id for clone of volume 536874901 ... done Cloning source volume
 536874901 ... done Ending the transaction on the source volume
 536874901 ... done Starting transaction on the cloned volume
 2681864210 ... Failed to start a transaction on the cloned
 volume2681864210 Volume not attached, does not exist, or not on line 
 vos move: operation interrupted, cleanup in progress... clear
 transaction contexts Recovery: Releasing VLDB lock on volume
 536874901 ... done Recovery: Accessing VLDB. move incomplete -
 attempt cleanup of target partition - no guarantee Recovery: Creating
 transaction for destination volume 536874901 ... Recovery: Unable to
 start transaction on destination volume 536874901. Recovery: Creating
 transaction on source volume 536874901 ... done Recovery: Setting
 flags on source volume 536874901 ... done Recovery: Ending
 transaction on source volume 536874901 ... done Recovery: Creating
 transaction on clone volume 2681864210 ... Recovery: Unable to start
 transaction on source volume 536874901. Recovery: Releasing lock on
 VLDB entry for volume 536874901 ... done cleanup complete - user
 verify desired result [linat08:local]# vos examine  2681864210 Could
 not fetch the entry for volume number 18446744072096448530 from VLDB
 
 I am assuming the large cloned volume ID is causing the problem as
 opposed to an inability to create a cloned volume.  I can make
 replicas on linat08 for existing volumes without a problem.
 
 NOTE: The hex representations of the cloned volume from the move
 attempt above and the 'vos examine':
 
 [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]#
 18446744072096448530 = 0x 9FDA0012
 
 Any suggestions?   This seems like a 64 vs 32 bit issue.
 
 Here is the information on servers and versions:
 
 We have 3 AFS DB servers: Linat02 - RHEL5/x86_64  -  OpenAFS 1.4.7 
 Linat03 - RHEL4/i686-  OpenAFS 1.4.6 Linat04 - RHEL5/x86_64  -
 OpenAFS 1.4.7
 
 We have 3 AFS file servers: Linat06 - RHEL4/x86_64  -  OpenAFS 1.4.6 
 Linat07 - RHEL4/x86_64  -  OpenAFS 1.4.6 Linat08 - RHEL5/x86_64  -
 OpenAFS 1.4.8
 
 Info on OSGWN volume:
 
 [linat08:~]# vos examine OSGWN OSGWN
 536874901 RW 505153 K  On-line linat07.grid.umich.edu /vicepf 
 RWrite  536874901 ROnly 18446744072096448530 Backup  0 
 MaxQuota200 K CreationTue Mar  3 03:43:06 2009 Copy
 Mon Dec  3 16:39:21 2007 Backup  Never Last Update Sat Feb 21
 15:18:05 2009 0 accesses in the past day (i.e., vnode references)
 
 RWrite: 536874901 ROnly: 536874902 number of sites - 2 server
 linat07.grid.umich.edu partition /vicepf RW Site server
 linat06.grid.umich.edu partition /vicepe RO Site
 
 Let me know if there is other info required to help resolve this.
 
 Thanks,
 
 Shawn McKee University of Michigan/ATLAS Group 
 ___ OpenAFS-info mailing
 list OpenAFS-info@openafs.org 
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Connection timed out?

2009-03-10 Thread Hartmut Reuter
Robbert Eggermont wrote:
 L.S.,
 
 We are evaluating OpenAFS for use with 50 clients. One of the tests is a
 kernel build on 50 clients at the same time.
 
 During this test we encounter 'Permission denied' errors, which seem to
 coincide with 'kernel: afs: failed to store file (110)' entries in
 /var/log/messages. 110=Connection timed out. The fileserver is busy but
 responsive, about 25 builds (out of 50) complete normally.
 
 We are running 1.4.8 client  server, kernel  2.6.18 64-bits. Currently
 all server processes run on the same server. Fileserver settings:
 /usr/afs/bin/fileserver -p 128 -b 512 -l 3072 -s 3072 -vc 3072 -cb 65536
 -busyat 1536 -rxpck 1024 -nojumbo
 
 What are we doing wrong (except for the way we test;-))?
 
 Regards,
 
 Robbert
 

My feeling is that here the famous new (with 1.4.8) idleDead mechanism
plays a role. It would be interesting whether the same happens on 1.4.7
clients or not.

Hartmut


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OpenAFS] move RO volume

2009-02-02 Thread Hartmut Reuter

Chaz Chandler wrote:

You could do it another way: vos dump from source partition and vos restore to
destination.  But generally the easiest way is addsite/remsite.

Note that vos remsite is the command to remove an RO volume, not vos remove.


This is not true:

vos remsite lets the vldb forget about the RO, but it doesn't remove it 
from the partition.


If you end up with the same RO on more than one partition on your server 
only the first one (depending on the order in which the partitions get 
attached) will come on-line. The other one(s) remain off-line.


Hartmut


The order doesn't matter, but you would vos addsite the volume on the 
destination
partition and (optionally) vos remsite the volume on the source partition.

It's generally best to keep your RW and RO volume on the same partition if disk 
space is
an issue.

Also, since you can have more than one RO volume per server, depending on what 
you're
trying to accomplish you may not even need to do this.



-Original Message-
From: l...@lwilke.de
Sent: Mon, 2 Feb 2009 11:03:55 +0100
To: openafs-info@openafs.org
Subject: [OpenAFS] move RO volume

Hi,

Just a quick question, is it correct, that it is not possible to move a
RO
volume from one partition to another partition on the *same* server
without
doing a vos remove, addsite, release?
Using OAFS 1.4.7.

Thanks

  --lars
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info




GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: [OpenAFS-devel] interface for vos split

2009-01-08 Thread Hartmut Reuter


Why wouldn't vos do the same thing that fs getfid currently does to get
the vnode?



fs getfid is in the cache manager.  Since vos is user-space, it
doesn't have access to the same routines.  An RPC to the volserver was
suggested as the best way to handle that.  My follow-up question on #3
is primarily to ask if that's how we want it handled, and if we want
to expose that interface via a vos command.  In other words, would a
command like:

vos getvnode -volume $vol -relative_path path

be useful.


Of course I had thought a little about the best user interface when 
designing the current syntax of vos split. In order to not mix 
volserver interfaces and cache manager/fileserver interfaces I decided 
to do it the most simple way. You always can write a wrapper running on 
a machine with AFS mounted like this:


#!/bin/sh
#
#  split volume newvolume splitpath
#
vnode=`fs getfid $3 | awk -F\. '{print $2}'`

vos split $1 $2 $vnode


I think before reinventing a cache manager within the volserver this is 
a much easier approach.


-Hartmut






It should, of course, double-check that the directory name given is within
the volume that one is splitting.



Definitely.




--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AFS without Kerberos headache

2008-12-21 Thread Hartmut Reuter

Harald Barth wrote:

In fact what I need ideally is a file system like NFS just with the
added features needed to use it in a Metropolitan Network setup, i.e.
local caching of files.



As an added feature, I hope you want to have control who wrote a file.



AFS seems to do this in a good way, but Kerberos is a constant annoyance
to it. I do have machines that generate simulation data and have to work
for weeks. If I like to do this with the current OpenAFS setup, I'll
have to log in once a day and refresh the damn Kerberos token :-(.



You can have longer timed tickets and tokens. You can save tickets in
keytabs. If your hosts have keytabs, you can use them to generate
tickets from.

You can have system:anyuser write if you want to mimic NFS ;)


And you can create pts groups based on IP-addresses and give such a 
group permissions in the ACL. That's less horrible than giving 
system:anyuser write access. But after you have done this you have to 
wait quite a while until the fileserver has re-evaluated those IP-groups

(typically 2 hours) before they work.

Hartmut


Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Add new device to the cell

2008-12-07 Thread Hartmut Reuter

Jesus arteche wrote:

hey,

Someone knows how can i add a new device to the same volumme?, i mean... 
I had /afs/mycell/my_directory...created and mounted in /vicepa...and I 
added a hd in /vicepb...and I want to mount IN 
/afs/mycell/my_directory...so if my 1º hd was 90GB and my 2º hd is 
90GB...the result would be /afs/mycell/my_directory 180Gb


thanks
You can't. You can create a new volume in the new partition and mount 
that volume inside your other one. Typically the data in a cell consist 
of thousands of volumes mounted as a tree.


Hartmut
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] replicas and mount points

2008-08-04 Thread Hartmut Reuter

Vladimir Konrad wrote:

Hello,

When I was creating mount points in our cell, I did not ask specifically for 
-rw (read/write)
mount point.

Not understanding (at the beginning) how the afs client works (preference for 
read only volumes), after
adding replicas and releasing the volumes, the clients could not write any-more.

Is there a way (other than re-doing the mount points with -rw) to make client 
prefer the read/write
volumes (when replicas exist)?


No, as long as all volumes in the path have RO-replicas



Also, is it true, that if I specify -rw to fs mkmount, this will stick (not 
change) and the mount
point will remain read/write even when replication sites for the volume exist?


Its true: mounted with -rw you always get the RW-volume



Kind regards,

Vlad

Please access the attached hyperlink for an important electronic communications 
disclaimer: http://www.lse.ac.uk/collections/secretariat/legal/disclaimer.htm
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] OpenAFS groups distrubution under windows client

2008-08-01 Thread Hartmut Reuter


Has nothing to do with Windows. Generally the membership of a user in 
groups is evaluated only once when the connection to the fileserver gets 
established. A new token enforces a new connection. So whenever you add 
some one to a group or remove some one from a group that has an effect 
only after the user has reauthenticated unless he didn't have any 
connection to the fileserver before.


Hartmut

Lars Schimmer wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi!

Just to ask/be sure:
User a is online under windows, OpenAFS client 1.5.51 and got a token,
browsing the OpenAFS filespace.
User a try to access a directory without the propper right, got no
access and mourn at me.
I set the User a into the correct group to access that directory.
But even after 1h or 2h, User a still cannot access that directory.

But if User a destroy token 10 min after I added him to the right group
and obtain a new token, he could access the dir right afterwards.

How long does it take under windows til the right group information is
distributed?
Or is this a bug?

MfG,
Lars Schimmer
- --
- -
TU Graz, Institut für ComputerGraphik  WissensVisualisierung
Tel: +43 316 873-5405   E-Mail: [EMAIL PROTECTED]
Fax: +43 316 873-5402   PGP-Key-ID: 0x4A9B1723
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiSwMsACgkQmWhuE0qbFyNmDwCdG/XVzrkuaunP62HBMIGErj8b
j6EAn1tmAf/tQcsjrT++9ekiSsSALa4h
=+IdP
-END PGP SIGNATURE-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Serious trouble, mounting /afs, ptserver, database rebuilding

2008-07-23 Thread Hartmut Reuter

kanou wrote:

My logs on the second machine tell me:

== /var/log/openafs/FileLog.old ==
Wed Jul 23 19:03:37 2008 File server starting
Wed Jul 23 19:03:37 2008 afs_krb_get_lrealm failed, myserver2.
Wed Jul 23 19:03:37 2008 VL_RegisterAddrs rpc failed; will retry  
periodically (code=5376, err=4)



code 5376 means no quorum elected. Are you sure your database servers 
are all running?


Try udebug server 7002 for the ptserver
and udebug server 7003 for the vldb

Wed Jul 23 19:03:37 2008 Couldn't get CPS for AnyUser, will try again  
in 30 seconds; code=267275.


== /var/log/openafs/SalvageLog ==
07/23/2008 19:08:27 SALVAGING OF PARTITION /vicepa COMPLETED

and aklog gives me:
aklog: Couldn't get hrf.uni-koeln.de AFS tickets:
aklog: Cannot contact any KDC for requested realm while getting AFS  
tickets


damn! i did not do anything on that second one!



Just to make sure you're working on the correct file:
As I understand you first deleted the file /var/lib/openafs/db/ prdb.DB0.
This file was then probably recreated when you restarted the ptserver.
Run this command on the backupfile you made first (or better on a  
copy of the backup file).


T/Christof

From: [EMAIL PROTECTED] [openafs-info- [EMAIL PROTECTED] 
On Behalf Of kanou [EMAIL PROTECTED]

Sent: Wednesday, July 23, 2008 6:46 PM
To: openafs-info@openafs.org
Subject: Re: [OpenAFS] Serious trouble, mounting /afs, ptserver,  
database rebuilding


Thanks for your answer.
Well I found the file prdb_check. It doesnt print any errors. Only
thing I can find is with
./prdb_check -database /var/lib/openafs/db/prdb.DB0 -uheader -verbose
this line:
Ubik header size is 0 (should be 64)

So there are no errors! I can start the server and everything runs
fine but the machine wont mount /afs!
kanou

Am 23.07.2008 um 17:26 schrieb Steven Jenkins:


On Wed, Jul 23, 2008 at 10:51 AM, kanou [EMAIL PROTECTED] wrote:


Hello,
well, there is a file called db_verify.c in the folder
/usr/src/modules/openafs/ptserver but I don' know how to build it.



If I recall correctly, db_verify gets renamed to 'prdb_check' during
the install, so you should check for the existence of that file.

If you can't find it, you'll need to build it from source code: the
directions on the AFSLore wiki are a good place to start:

http://www.dementia.org/twiki/bin/view/AFSLore/HowToBuildOpenAFSFromSource 



If you have problems building openafs-stable-1_4_x, you could get
openafs-stable-1_4_7 instead, as that is the latest official release.

Once you have built the tree, src/ptserver/db_verify should get  built,
so you can simply copy it out of the source tree for your use.  If it
doesn't get built automatically for you, you can cd into src/ptserver
and do a 'make db_verify' manuall.

Also, feel free to ask for help here  or on the irc channel.

Steven Jenkins
End Point Corporation
http://www.endpoint.com/



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos dump (different afs server versions)

2008-04-14 Thread Hartmut Reuter

Vladimir Konrad wrote:

Hello,

Is it possible to dump volumes on debian woody and restore them on
debian etch?

Vlad

Please access the attached hyperlink for an important electronic communications 
disclaimer: http://www.lse.ac.uk/collections/secretariat/legal/disclaimer.htm
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info
All OpenAFS fileservers should be interoperable. Only if you have some 
servers with large file support and others without it you may be unable 
to move volumes from those with large file support to the others.
The operating system, hardware, endianess, and what else should make no 
difference.


Hartmut
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Speed difference between OpenAFS 1.4.x on Debian and CentOS

2008-04-07 Thread Hartmut Reuter

Michał Droździewicz wrote:

Derrick Brashear pisze:


Not what I expected. When you self-compiled 1.4.6 on Debian, I assume
you downloaded a tarfile from OpenAFS and did ./configure; make, yes?
What options, if any, to configure?


I've build a debian package using default debian options (1.4.6) and 
I've compiled from source with no options for ./configure except from 
--prefix
In both cases the result was the same - slow speed around 8-12MiB 
(copying from local disk to AFS structure)





Are you sure your network interface is used in GBit/s mode with Debian 
and not just 100MBit-mode?


This could easily explain the low throughput.

Hartmut
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Hartmut Reuter

Jeffrey Altman wrote:

Hartmut Reuter wrote:


So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

[6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
  21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c

So there might be really some thing wrong with the SmallVnodeFile, but 
to do an AssertionFailed is not the best way to repair it!



What the AssertionFailed means is that no one has written code to
deal with a case where this error has occurred.   It can't be
fixed with Salvager until someone writes the missing code.


Of course, but for the user it might be better to skip handling of this 
error and to continue with the next vnode. So he could get back at least 
the damaged volume and copy whatever is still accessible.


So John, ifdef line 3175 and recompile. If this was a single bad vnode 
your volume may come online again, otherwise it's probably lost anyway.


Hartmut


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Hartmut Reuter

Jeffrey Altman wrote:

Hartmut Reuter wrote:


Jeffrey Altman wrote:


Hartmut Reuter wrote:


So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

[6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
  21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c

So there might be really some thing wrong with the SmallVnodeFile, 
but to do an AssertionFailed is not the best way to repair it!




What the AssertionFailed means is that no one has written code to
deal with a case where this error has occurred.   It can't be
fixed with Salvager until someone writes the missing code.



Of course, but for the user it might be better to skip handling of 
this error and to continue with the next vnode. So he could get back 
at least the damaged volume and copy whatever is still accessible.


So John, ifdef line 3175 and recompile. If this was a single bad vnode 
your volume may come online again, otherwise it's probably lost anyway.


Hartmut



I disagree.   The reason that assert is there is that continuing
will cause more damage to the data.  We do not know based upon
the available data whether this is a single bad vnode or whether
perhaps the wrong file is being reference for the SmallVnodeFile.

What is known is that one vnode, perhaps the first vnode examined
has completely valid data except for the fact that it is in the
wrong file.

There are several issues that are worth pursuing here.  Especially 
because whatever the problem is has begun occurring on multiple machines:


1. what is the actual damage that has taken place?

2. can the damage be correct?

3. can the damage be avoided in the first place?  What is the cause?

Jeffrey Altman


Of course we should not remove the assert() forever, but just for the 
test of this volume which otherwise probably will be lost anyway.


In MR-AFS we had a -nowrite option to do just a dry-run. I admit that 
it's a lot work to implement this, but some times it is very helpful.


Hartmut



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-04 Thread Hartmut Reuter

Jeffrey Altman wrote:

Hartmut Reuter wrote:


Jeffrey Altman wrote:


Hartmut Reuter wrote:


So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

[6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
  21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c

So there might be really some thing wrong with the SmallVnodeFile, 
but to do an AssertionFailed is not the best way to repair it!




What the AssertionFailed means is that no one has written code to
deal with a case where this error has occurred.   It can't be
fixed with Salvager until someone writes the missing code.



Of course, but for the user it might be better to skip handling of 
this error and to continue with the next vnode. So he could get back 
at least the damaged volume and copy whatever is still accessible.


So John, ifdef line 3175 and recompile. If this was a single bad vnode 
your volume may come online again, otherwise it's probably lost anyway.


Hartmut



I disagree.   The reason that assert is there is that continuing
will cause more damage to the data.  We do not know based upon
the available data whether this is a single bad vnode or whether
perhaps the wrong file is being reference for the SmallVnodeFile.

What is known is that one vnode, perhaps the first vnode examined
has completely valid data except for the fact that it is in the
wrong file.

There are several issues that are worth pursuing here.  Especially 
because whatever the problem is has begun occurring on multiple machines:


1. what is the actual damage that has taken place?

2. can the damage be correct?

3. can the damage be avoided in the first place?  What is the cause?

Jeffrey Altman


Of course we should not remove the assert() forever, but just for the
test of this volume which otherwise probably will be lost anyway.

In MR-AFS we had a -nowrite option to do just a dry-run. I admit that
it's a lot work to implement this, but some times it is very helpful.

I just saw -nowrite exists also in OpenAFS only that the bos command 
claims it would be possible only in MR-AFS. So one could at least run 
the salvager under the debugger with -nowrite


Hartmut



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos convertROtoRW requires salvage ?

2008-04-03 Thread Hartmut Reuter
eastside.cs 83 % vos listvldb root.cell
/usr/afsws/etc/vos: Connection timed out
eastside.cs 84 % /usr/afs/bin/vos listvldb root.cell

root.cell 
RWrite: 536870915 ROnly: 536870916 
number of sites - 2

   server solomons.cs.uwm.edu partition /vicepa RO Site  -- Not released
   server eastside.cs.uwm.edu partition /vicepa RW Site 
eastside.cs 85 % /usr/afs/bin/vos addsite eastside a root.cell

Added replication site eastside /vicepa for volume root.cell
eastside.cs 86 % /usr/afs/bin/vos release root.cell
Released volume root.cell successfully
eastside.cs 87 % fs checkv  
usage: /usr/openwin/bin/xfs [-config config_file] [-port tcp_port]

eastside.cs 88 % /usr/afsws/bin/fs checkv
/usr/afsws/bin/fs: Connection timed out
eastside.cs 89 % /usr/afs/bin/fs checkv
All volumeID/name mappings checked.
eastside.cs 90 % /usr/afsws/bin/fs checks
All servers are running.
eastside.cs 91 % vos listvldb root.cell

root.cell 
RWrite: 536870915 ROnly: 536870916 
number of sites - 3
   server solomons.cs.uwm.edu partition /vicepa RO Site 
   server eastside.cs.uwm.edu partition /vicepa RW Site 
   server eastside.cs.uwm.edu partition /vicepa RO Site 
___

OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

2008-04-03 Thread Hartmut Reuter

Jeffrey Altman wrote:

John Tang Boyland wrote:


OK I compiled the salvager with debugging and without optimization.

filip# /opt/SUNWspro/bin/dbx salvager.debug
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in 
your .dbxrc

Reading salvager.debug
Reading ld.so.1
Reading libresolv.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libintl.so.1
Reading libdl.so.1
Reading libc.so.1
(dbx) run /vicepa -debug -parallel 1
Running: salvager.debug /vicepa -debug -parallel 1 (process id 3491)

[after three hours, I pressed return]

Thu Apr  3 14:14:20 2008: Assertion failed! file vol-salvage.c, line 
3175.

signal ABRT (Abort) in __lwp_kill at 0xfee21157
0xfee21157: __lwp_kill+0x0007:  jae  __lwp_kill+0x15[ 
0xfee21165, .+0xe ]

Current function is AssertionFailed
   48   abort();
(dbx) where
  [1] __lwp_kill(0x1, 0x6), at 0xfee21157   [2] _thr_kill(0x1, 0x6), 
at 0xfee1e8c9   [3] raise(0x6), at 0xfedcd163   [4] abort(0x804694a, 
0x47f52c8c, 0x6854, 0x70412075, 0x33202072, 0x3a343120), at 
0xfedb0ba9 =[5] AssertionFailed(file = 0x808b724 vol-salvage.c, 
line = 3175), line 48 in assert.c
  [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 
21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c
  [7] SalvageVolume(rwIsp = 0x9ab0130, alinkH = 0x9ac0de8), line 3346 
in vol-salvage.c
  [8] DoSalvageVolumeGroup(isp = 0x9ab0130, nVols = 1), line 2104 in 
vol-salvage.c
  [9] SalvageFileSys1(partP = 0x80bacd8, singleVolumeNumber = 0), line 
1357 in vol-salvage.c
  [10] SalvageFileSys(partP = 0x80bacd8, singleVolumeNumber = 0), line 
1192 in vol-salvage.c

  [11] handleit(as = 0x80a9340), line 687 in vol-salvage.c
  [12] cmd_Dispatch(argc = 6, argv = 0x80aaba8), line 902 in cmd.c
  [13] main(argc = 5, argv = 0x8047650), line 845 in vol-salvage.c
(dbx) up
Current function is DistilVnodeEssence
 3175   assert(class == vLarge);
(dbx) list 3170,3180
 3170   vep-type = vnode-type;
 3171   vep-author = vnode-author;
 3172   vep-owner = vnode-owner;
 3173   vep-group = vnode-group;
 3174   if (vnode-type == vDirectory) {
 3175   assert(class == vLarge);
 3176   vip-inodes[vnodeIndex] = VNDISK_GET_INO(vnode);
 3177   }
 3178   }
 3179   }
 3180   STREAM_CLOSE(file);



So what is the value of 'class' if not vLarge?


As you can see from that line above it's vSmall:

   [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
 21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c

So there might be really some thing wrong with the SmallVnodeFile, but 
to do an AssertionFailed is not the best way to repair it!


Hartmut
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Max partition and Volume size

2008-04-01 Thread Hartmut Reuter

Rich Sudlow wrote:

I wanted to double check - is the max partition and volume size
both 2 TB?

Are there any plans on increasing the partition size
in the near future?

Thanks,

Rich



Partition size doesn't matter too much any more when you use OpenAFS + 
object storage (see best practice workshop in May). But maximum volume 
size right now is limited by the int32 to count the volume's blocks and 
also the disk quota. If you don't set a disk quota you may also today 
end up with bigger volumes. Our biggest has 6.3 TB, but the number of 
blocks you see with vos examine is modulo 4 TB, of course.


I admit: this is not what we wanted, but it happened.

Hartmut

~: vos ex aug-shotfiles.archive
aug-shotfiles.archive 536892900 RW -1728398890 K  On-line
afs4.bc.rzg.mpg.de /vicepx  23830 files
RWrite  536892900 ROnly  536892901 Backup  0
MaxQuota  0 K, osd flag1
CreationFri Oct 11 10:43:06 1996
CopyWed Feb  6 11:32:01 2008
Backup  Never
Last Update Tue Apr  1 10:13:09 2008
160693 accesses in the past day (i.e., vnode references)

RWrite: 536892900 ROnly: 536892901
number of sites - 3
   server afs4.bc.rzg.mpg.de partition /vicepx RW Site
   server afs16.rzg.mpg.de partition /vicepx RO Site
   server afs4.bc.rzg.mpg.de partition /vicepx RO Site
~:vos traverse afs4 -id aug-shotfiles.archive

servers:
afs4


File Size RangeFiles  %  run % Data %  run %

  0  B -   4 KB  240   1.01   1.01  399.045 KB   0.00   0.00
  4 KB -   8 KB9   0.04   1.04   50.339 KB   0.00   0.00
  8 KB -  16 KB   31   0.13   1.17  395.975 KB   0.00   0.00
 16 KB -  32 KB   79   0.33   1.511.851 MB   0.00   0.00
 32 KB -  64 KB   87   0.37   1.873.821 MB   0.00   0.00
 64 KB - 128 KB   92   0.39   2.268.427 MB   0.00   0.00
128 KB - 256 KB  105   0.44   2.70   17.480 MB   0.00   0.00
256 KB - 512 KB  137   0.57   3.27   53.902 MB   0.00   0.00
512 KB -   1 MB  189   0.79   4.07  143.429 MB   0.00   0.00
  1 MB -   2 MB 1281   5.38   9.441.816 GB   0.03   0.03
  2 MB -   4 MB 1154   4.84  14.283.210 GB   0.05   0.08
  4 MB -   8 MB 1567   6.58  20.868.989 GB   0.14   0.22
  8 MB -  16 MB 1647   6.91  27.77   18.796 GB   0.29   0.50
 16 MB -  32 MB 1869   7.84  35.61   42.181 GB   0.64   1.15
 32 MB -  64 MB 2161   9.07  44.68   96.362 GB   1.47   2.62
 64 MB - 128 MB 1997   8.38  53.06  181.517 GB   2.77   5.40
128 MB - 256 MB 3272  13.73  66.79  606.521 GB   9.27  14.66
256 MB - 512 MB 3754  15.75  82.551.319 TB  20.66  35.32
512 MB -   1 GB 2261   9.49  92.041.496 TB  23.42  58.74
  1 GB -   2 GB 1898   7.96 100.002.635 TB  41.26 100.00

Totals:23830 Files6.389 TB

Storage usage:
---
1 local_disk965 files 226.115 MB
  arch. Osd 4 raid63898 objects13.796 GB
  arch. Osd 5 tape22858 objects 6.387 TB
Osd 8 afs16-a47 objects32.475 GB
Osd10 afs4-a260 objects   206.205 GB
  arch. Osd13 hsmgpfs   510 objects   463.637 GB
Osd19 mpp-fs10-a  8 objects 7.143 GB
Osd32 afs8-a 31 objects21.075 GB
---
Total 28577 objects 7.115 TB

Data without a copy:
---
if !replicated: 1 local_disk965 files 226.115 MB
  arch. Osd 4 raid6   4 objects 3.625 MB
  arch. Osd 5 tape18413 objects 5.879 TB
Osd10 afs4-a  3 objects 1.682 GB
---
Total 19385 objects 5.881 TB
~:



--
-
Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Lost RW Volume Recovery?

2008-02-20 Thread Hartmut Reuter

Robert Sturrock wrote:

Hi all.

I'm not sure how, but we have lost the RW volume for our cell.user (a
structural volume under which live user home areas).  After a bit of
searching, I found this thread that describes a possible recovery
method involving dump/restoring from an RO and then salvaging:

http://www.openafs.org/pipermail/openafs-info/2002-December/007228.html

I tried this method and it _seemed_ to work, but I'm still having
problems accessing the volume after remounting it.  A quick rundown on
what I did:


$ vos dump cell.user.readonly  cell.user.dump

$ vos restore hermes2 a cell.user -verbose  cell.user.dump 
Restoring volume cell.user Id 536870918 on server hermes2.its.unimelb.edu.au partition /vicepa .. done

Updating the existing VLDB entry
--- Old entry ---

cell.user 
	ROnly: 536870919 
	number of sites - 2
	   server hermes1.its.unimelb.edu.au partition /vicepa RO Site 
	   server telos.its.unimelb.edu.au partition /vicepa RO Site 
--- New entry ---


cell.user 
	RWrite: 536870918 ROnly: 536870919 
	number of sites - 3
	   server hermes1.its.unimelb.edu.au partition /vicepa RO Site 
	   server telos.its.unimelb.edu.au partition /vicepa RO Site 
	   server hermes2.its.unimelb.edu.au partition /vicepa RW Site 
Restored volume cell.user on hermes2 /vicepa


$ bos salvage hermes2 a cell.user
Starting salvage.
bos: salvage completed

$ bos salvage hermes2 a cell.user
Starting salvage.
bos: salvage completed

.. but now the problem is as follows:

   $ fs mkmount /afs/.athena.unimelb.edu.au/user cell.user

   [ so far, so good .. but .. ]

   $ ls -ld user
   ls: user: Connection timed out

   $ ls -l
   total 14
   drwxrwxrwx 5 root root 2048 Nov 14 15:36 arch
   drwxrwxrwx 5 root root 2048 Feb 19 09:33 devlp
   drwxrwxrwx 2 root root 2048 Jan 23 14:44 group
   drwxrwxrwx 3 root root 2048 Oct 25 12:15 project
   drwxrwxrwx 4 root root 2048 Oct 15 11:13 pub
   drwxrwxrwx 2 root root 2048 Feb 20 12:38 tmp
   ?- ? ??   ?? user
   drwxrwxrwx 2 root root 2048 Oct  4 21:06 www  


Any pointers as to where I go from here?

The only thing I can think of is that there may be some caching going on
which in some way is still looking for the old RW volume.

One alternative might be to vos convertROtoRW, but I suspect that would
leave me with the same problem to solve.

Regards,

Robert.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info
You need a fs checkvol on the client because the disappearing of the 
old volume didn't the callbacks needed to provoke a new vldb lookup on 
the clients. The same problem you have after a vos convertROtoRW 


Hartmut

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Rebuild vldb?

2007-12-08 Thread Hartmut Reuter

Jerry Normandin wrote:

Hi

My directories in /afs/dafca.com/home
Are giving me a connection timed out for accessing most of them. Before 
this happened access to vldb through vos was slowing down.  Will a sync 
fix this or any ideas?


connection timed out looks like one of your fileservers could be down. 
If the volume wouldn't be in the vldb you would get no such device. 
Only then a sync between filserver and vldb can help.


If you do vos listvldb volume name does it give the correct answer?

-Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Server crash

2007-12-07 Thread Hartmut Reuter

Steve Devine wrote:
We had a server crash this morning and I after bringing it back up I am 
unable to get a vos listvol back from it. Or it is taking a very long 
time. These partitions are greater than 150 G . Its been running for 30 
minutes now and nothing back yet.
The server (Version 1.4.2 ) is compiled with fast restart and I am 
trying to see what vols are Off-line. Is there any other way to find out 
what vols are off line?

/sd


What says the FileLog like?

-Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Migration of DB servers

2007-11-30 Thread Hartmut Reuter

Xavier Canehan wrote:

We'll have a complete site shutdown next week. I'm planning to use this
opportunity to physically replace 2 afsdb servers. Same name and IP but host
and OpenAFS version upgrades.

Considering that clients and fileservers will be down, I guess that I'll
just have to pass through the steps of managing the change on the afsdb
servers.

Am I right or is there some tracking of DB hosts by the fileservers that I
should care of ?


Should be ok. The fileservers don't have any tracking of DB servers.
As logn as you don't change the IP-addresses nobody should see any 
difference.


Hartmut



Thanks,

Xavier



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 1.4.5 namei on solaris 9 sparc requires AlwaysAttach for vice partitions

2007-11-29 Thread Hartmut Reuter

I also was today surprised when I started the freshly compiled 1.4.5
fileserver on Solaris and it didn't attach any partition.

There was a change between 1.4.4 and 1.4.5 in favour of zfs, but a
unfortunately broken:

/* Ignore non ufs or non read/write partitions */
if ((strcmp(mnt.mnt_fstype, ufs) != 0)
|| (strncmp(mnt.mnt_mntopts, ro,ignore, 9) == 0))
#else
(strcmp(mnt.mnt_fstype, ufs) != 0)
#endif
|| (strncmp(mnt.mnt_mntopts, ro,ignore, 9) == 0))
continue;

was changed to

/* Ignore non ufs or non read/write partitions */
/* but allow zfs too if we're in the NAMEI environment */
if (
#ifdef AFS_NAMEI_ENV
(((!strcmp(mnt.mnt_fstype, ufs) 
strcmp(mnt.mnt_fstype, zfs
|| (strncmp(mnt.mnt_mntopts, ro,ignore, 9) == 0))
continue;
}
#else
continue;
#endif



The ! in front of strcmp in the new version lets him exactly ignore
ufs. Just remove it!

Hartmut

Jason Edgecombe wrote:

Hi all,

In my sordid saga to get a Sun fibre channel array working well with
AFS, I found the following:

When I upgraded the server to 1.4.5 namei, the fileserver would not
mount the /vicep? partitions without doing a touch
/vicep?/AlwaysAttach first. These are dedicated partitions on separate
hard drives.

I'm using a source-compiled openafs on solaris 9 sparc. openafs was
compiled with the following options:
CC=/opt/SUNWspro/bin/cc YACC=yacc -vd ./configure \
  --enable-transarc-paths \
  --enable-largefile-fileserver \
  --enable-supergroups \
  --enable-namei-fileserver \
  --with-krb5-conf=/usr/local/krb5/bin/krb5-config

We're using MIT kerberos 1.4.1 on the clients  fileservers with a 1.6.x KDC

# mount | grep vicep
/vicepa on /dev/dsk/c0t0d0s6
read/write/setuid/intr/largefiles/logging/xattr/onerror=panic/dev=1d80006
on Thu Nov 29 13:03:15 2007
/vicepd on /dev/dsk/c0t3d0s6
read/write/setuid/intr/largefiles/logging/xattr/onerror=panic/dev=1d80016
on Thu Nov 29 13:03:15 2007
/vicepc on /dev/dsk/c0t2d0s6
read/write/setuid/intr/largefiles/logging/xattr/onerror=panic/dev=1d8001e
on Thu Nov 29 13:03:15 2007
/vicepb on /dev/dsk/c0t1d0s6
read/write/setuid/intr/largefiles/xattr/onerror=panic/dev=1d8000e on Thu
Nov 29 13:03:15 2007

# grep vicep /etc/vfstab
/dev/dsk/c0t0d0s6   /dev/rdsk/c0t0d0s6  /vicepa ufs 3  
yes -
/dev/dsk/c0t1d0s6   /dev/rdsk/c0t1d0s6  /vicepb ufs 3  
yes -
/dev/dsk/c0t2d0s6   /dev/rdsk/c0t2d0s6  /vicepc ufs 3  
yes -


#cat SalvageLog
@(#) OpenAFS 1.4.5 built  2007-11-28
11/29/2007 09:52:59 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager)
11/29/2007 09:52:59 No file system partitions named /vicep* found; not
salvaged

Does anyone know why this would be happening?

Thanks,
Jason
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] File systems on Linux, again.

2007-11-29 Thread Hartmut Reuter

Smith, Matt wrote:

After the recent thread openafs upgrade from 1.4.1 to 1.5.7, and a
review of a thread[1] from July, I'm wondering if there is a definitive
recommendation for which file system to use on Linux AFS file servers.
Ext3, XFS, JFS, something else?

Thanks all,
-Matt

[1] http://www.openafs.org/pipermail/openafs-info/2007-July/026798.html


We are using exclusively xfs since many years. It is performant and you 
can enlarge partitions on the fly doing lvextend and xfs_growfs.


Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Best practice: inode or namei fileserver?

2007-11-13 Thread Hartmut Reuter

Jason Edgecombe wrote:

Hi all,

We are currently running inode-based fileservers on solaris 9.

I stumbled across the fact that solaris 9 -9/05HW makes logging the
default on UFS. I know that the AFS finode-based fileserver cannot work
with a logging filesystem.

Does the namei filesystem play nice with logging filesystems?


Yes


Going forward, which format is recommended, inode or namei?


Namei has another advantage: if you salvage a single volume it's not 
necessary to read all inodes, but only those pseudo-inodes (file names)

under the subdirectory belonging to the volume group. This is much faster.

An overhead traversing the AFSIDat-tree to open a file certainly exists, 
but I suppose it is neglectible compared to the advantages.


-Hartmut


I'm wondering if I should slowly migrate to namei.

Thanks,
Jason
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] VL_RegisterAddrs rpc failed (code=5376, err=22)

2007-09-26 Thread Hartmut Reuter

[EMAIL PROTECTED] wrote:

cannot get around this problem - see error messages from FileLog:

Tue Sep 25 16:15:21 2007 File server starting
Tue Sep 25 16:15:21 2007 afs_krb_get_lrealm failed, using freakout.de.
Tue Sep 25 16:15:21 2007 VL_RegisterAddrs rpc failed; will retry periodically 
(code=5376, err=22)
Tue Sep 25 16:15:21 2007 Set thread id 14 for FSYNC_sync
Tue Sep 25 16:15:21 2007 VInitVolumePackage: beginning single-threaded 
fileserver startup
Tue Sep 25 16:15:21 2007 VInitVolumePackage: using 1 thread to attach volumes 
on 1 partition(s)
Tue Sep 25 16:15:21 2007 Partition /vicepa: attaching volumes
Tue Sep 25 16:15:22 2007 Partition /vicepa: attached 4 volumes; 0 volumes not 
attached
Tue Sep 25 16:15:22 2007 Set thread id 15 for 'FiveMinuteCheckLWP'
Tue Sep 25 16:15:22 2007 Set thread id 16 for 'HostCheckLWP'
Tue Sep 25 16:15:22 2007 Set thread id 17 for 'FsyncCheckLWP'
Tue Sep 25 16:15:22 2007 Getting FileServer name...
Tue Sep 25 16:15:22 2007 FileServer host name is 'bongo'
Tue Sep 25 16:15:22 2007 Getting FileServer address...
Tue Sep 25 16:15:22 2007 FileServer bongo has address 192.168.7.68 (0x4407a8c0 
or 0xc0a80744 in host byte order)
Tue Sep 25 16:15:22 2007 File Server started Tue Sep 25 16:15:22 2007

i tried to follow the tips from different articles using NetInfo and 
NetRestrict - none worked.

i tried to check code=5376, err=22 from the source-code but i cannot get
enough information to find the source of the problem.


use translate_et 5376 :

~: translate_et 5376
5376 (u).0 = no quorum elected
~:

So your database servers haven't head a quorum when the fileserver 
started. Registration of a fileserver requires a write operation on the 
vlserver which is only  possible on the sync site. Without a quorum the 
sync site can't be elcted...


Do a udebug database-server 7003 to all your database servers
to find out what happens!

Hartmut Reuter


Please help i'm lost on this one.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Tuning openafs write speed

2007-08-23 Thread Hartmut Reuter

Kai Moritz wrote:

Hi folks!

I would like to try tuning the speed of my openafs installation, but
the only information I could google is this rather old thread 
(http://www.openafs.org/pipermail/openafs-info/2003-June/009753.html)

and the hint to use a big cache-partition.

For comparison I've created files with random data and different size
(1MB, 2MB, 4MB, 8MB, 16MB, 32MB, 64MB and 128MB) on my local disk. I
copied them into AFS and then I copied then to the same disk on the
same host via scp (without compression). I've done that 10 times and
computed the average. For the 1MB file AFS ist slightly faster then
scp (factor 0,89). For the 2 and the 4MB file AFS needs about 1,4 of
the time scp needs. For the 8, 16 and 32MB the factor is about 2,7
and for the 64 and the 128MB file it is about 3,3.

I've already tried bigger cache-partitions, but it does not make a
difference. Are there tuning parameters, which tell the system a
threshold for the size of files, beyond which data won't be written
to the cache?

Greetings Kai Moritz


What are your data rates in MB/s?
If you are on a fast network (Gbit Ethernet, Inifiband ...) a disk cache
may be remarkably slower than the network. In this case memory cache can 
help.


Another point is chunk size. The default (64 KB) is bad for reading 
where each chunk is fetched in a separate RPC. with disk cache bigger 
chunks (1 MB) can be recommanded, anyway. For memory cache of, say, 64 
MB you would limit the number of chunks to only 64 which is certainly 
too low.


Here ramdisks can help because many of the chunks are filled with short 
contents, such as directories and symbolic links. The additional 
overhead to go through the filesystem layer may be less than what you 
can earn from bigger chunks. With ramdisk 1 MB chunks aren't too bad.


Hartmut

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Tuning openafs write speed

2007-08-23 Thread Hartmut Reuter

Kai Moritz wrote:

What are your data rates in MB/s?


scp says: 4.6MB/s


Isn't great either. So may be you have some other problems in your network?

When I do a scp of a 100 MB file to my laptop I get ~ 8 MB/s and there 
is in parallel running a remote rsync with about another .7 MB/s in both 
directions (rsyncd and AFS).


So I normally get the full 100 Mbit/s bandwidth when I write into AFS or 
read from AFS: = 10 MB/s.







If you are on a fast network (Gbit Ethernet, Inifiband ...) a disk cache
may be remarkably slower than the network. In this case memory cache can 
help.


I haven't tried that yet, becaus in the file /etc/openafs/afs.conf of
my Debian Etch installation there is a comment that says:

# Using the memory cache is not recommended.  It's less stable than the disk
# cache and doesn't improve performance as much as it might sound.


We are using here in our Linux clusters and on the high performance AIX 
power 4/5 machines memcache without problems.


It's my special OpenAFS-1.4.4 with OSD support which is expected to 
arrive soonly in the OpenAFS CVS. But I suppose also the normal 
OpenAFS-1.4.4 should work without problems with memcache


Hartmut



Another point is chunk size. The default (64 KB) is bad for reading 
where each chunk is fetched in a separate RPC. with disk cache bigger 
chunks (1 MB) can be recommanded, anyway. For memory cache of, say, 64 
MB you would limit the number of chunks to only 64 which is certainly 
too low.



With automatically choosen values writing a 128 MB file in AFS takes
about 44-45 seconds.
On that machine I have a 3 GB cache.
With the following options, which a have taken from an example in a Debian 
Configfile, writing the 128 MB file takes about 48 seconds :(

-chunksize 20 -files 8 -dcache 1 -stat 15000 -daemons 6 -volume
s 500 -rmtsys

Greetings kai



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Full disk woes

2007-07-06 Thread Hartmut Reuter

Steve Devine wrote:

I committed the cardinal sin of letting a server partition fill up.
I have tried vos remove and vos zap .. I can't get rid of any 
vols.Volume management fails on this machine.
Its the old style (non namei) fileserver. It doesn't seem like I can 
just rm the V#.vol can I?

Any help?

To remove the small V#.vol files doesn't help, they are really only 
76 bytes long.


What happens if you do a vos remove or a vos zap?
Go the volumes away and the free space seems as low as before?

This can happen, if you only removed readonly and backup volumes which 
typically can free only the space used by their metadata while the space 
used by their files and directories is shared between them and the RW 
volume. But, of course, you don't want to remove your RW-volumes.
May be, if you have removed all RO- and BK- volumes you have enough free 
space for the temporary volume being created when you try to move your 
smallest RW-volume to another partition/server.


There is also a -live option for the vos move command which should doe 
the move without creating a clone. I suppose it has been written for 
such cases.


Good luck,
Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Full disk woes

2007-07-06 Thread Hartmut Reuter

Steve Devine wrote:

Hartmut Reuter wrote:


Steve Devine wrote:


I committed the cardinal sin of letting a server partition fill up.
I have tried vos remove and vos zap .. I can't get rid of any
vols.Volume management fails on this machine.
Its the old style (non namei) fileserver. It doesn't seem like I can
just rm the V#.vol can I?
Any help?



To remove the small V#.vol files doesn't help, they are really
only 76 bytes long.

What happens if you do a vos remove or a vos zap?


both commands fail. Even when I use force.



What says the VolserLog?





Go the volumes away and the free space seems as low as before?

This can happen, if you only removed readonly and backup volumes which
typically can free only the space used by their metadata while the
space used by their files and directories is shared between them and
the RW volume. But, of course, you don't want to remove your RW-volumes.
May be, if you have removed all RO- and BK- volumes you have enough
free space for the temporary volume being created when you try to move
your smallest RW-volume to another partition/server.

There is also a -live option for the vos move command which should
doe the move without creating a clone. I suppose it has been written
for such cases.

Good luck,
Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
  phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info







--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Full disk woes

2007-07-06 Thread Hartmut Reuter


I tried a

/afs/ipp/backups: vos listvldb 1938590434 -cell msu.edu
vsu_ClientInit: Could not get afs tokens, running unauthenticated.

svc.ml.mdsolids.31
RWrite: 1938590433ROnly: 1938590434RClone: 1938590434
number of sites - 3
   server afsfs7.cl.msu.edu partition /vicepa RW Site
   server afsfs9.cl.msu.edu partition /vicepa RO Site  -- Old release
   server afsfs7.cl.msu.edu partition /vicepa RO Site  -- New release
/afs/ipp/backups:

and found out it's your machine afsfs9.cl.msu.edu which does the 
trouble. Then I did a vos status  to this machine which did not respond.


rxdebug afsfs9.cl.msu.edu 7005 shows a lot of connections in state 
precall with source ports != 7005. That means you have a lot vos 
commands running anywhere. Those you should stop first! Then perhaps
restart your fileserver to get rid of the old transactions and then 
hopefully everthing is OK again.


Hartmut

Steve Devine wrote:

Hartmut Reuter wrote:


Steve Devine wrote:


Hartmut Reuter wrote:



Steve Devine wrote:



I committed the cardinal sin of letting a server partition fill up.
I have tried vos remove and vos zap .. I can't get rid of any
vols.Volume management fails on this machine.
Its the old style (non namei) fileserver. It doesn't seem like I can
just rm the V#.vol can I?
Any help?



To remove the small V#.vol files doesn't help, they are really
only 76 bytes long.

What happens if you do a vos remove or a vos zap?


both commands fail. Even when I use force.



What says the VolserLog?




Go the volumes away and the free space seems as low as before?

This can happen, if you only removed readonly and backup volumes which
typically can free only the space used by their metadata while the
space used by their files and directories is shared between them and
the RW volume. But, of course, you don't want to remove your
RW-volumes.
May be, if you have removed all RO- and BK- volumes you have enough
free space for the temporary volume being created when you try to move
your smallest RW-volume to another partition/server.

There is also a -live option for the vos move command which should
doe the move without creating a clone. I suppose it has been written
for such cases.

Good luck,
Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
 phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info







Lot of lines like this ..
Fri Jul  6 10:05:18 2007 trans 3811071 on volume 1938590434 is older
than 29730 seconds
Fri Jul  6 10:05:48 2007 trans 3811072 on volume 1937192577 is older
than 28530 seconds
Fri Jul  6 10:05:48 2007 trans 3811071 on volume 1938590434 is older
than 29760 seconds
Fri Jul  6 10:06:18 2007 trans 3811072 on volume 1937192577 is older
than 28560 seconds
Fri Jul  6 10:06:18 2007 trans 3811071 on volume 1938590434 is older
than 29790




--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos listvol and vos exam

2007-06-28 Thread Hartmut Reuter

Steve Devine wrote:

We have switched some of our servers over to binaries compiled with fast
restart.
This morning we had to bring a server down for maint and when we brought
it back up several of the vols where in need of salvage.
I ran vos listvol thinking I would get a list of vols that were offline.
Instead it showed all vols as On-line yet I had to salvage over a dozen
that I have found so far.
I get waiting for busy volume errors.
Any advice how I can locate vols on a fileserver that need  salvaging 
short of just biting the bullet and salvaging the whole partition.?

/sd


There should be messages in the FileLog such as

VAttachVolume: Error attaching volume ; volume needs salvage; 
error=XXX


-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos listvol and vos exam

2007-06-28 Thread Hartmut Reuter

Steve Devine wrote:



Well from what I can see I am not sure I can count on the FileLog. Is
there anyway to make the fileserver keep more than  FileLog and
FileLog.old ?
/sd



If you start the fileserver with -mrafslogs it renames old FileLogs to
FileLog.date-time, for instance FileLog.20070627235901

-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] My salvager was cored by my volume.

2007-06-28 Thread Hartmut Reuter

Harald Barth wrote:

Yesterday I had a server crash after a HW-RAID box decided to go out
for lunch wihout even trying to have a reason. After I restarted with
fast-restart and then salvaged everything. First pass with 
orphans ignore:


+ /usr/openafs/bin/bos salvage -server ruffe -partition a -volume 
pdc.vol.module -showlog -orphans ignore -localauth
Starting salvage.
bos: salvage completed
SalvageLog:
@(#) OpenAFS 1.4.4 built  2007-04-25 
06/27/2007 20:07:27 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans ignore)
06/27/2007 20:07:28 2 nVolumesInInodeFile 64 
06/27/2007 20:07:28 CHECKING CLONED VOLUME 537045986.

06/27/2007 20:07:28 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10
06/27/2007 20:07:28 SALVAGING VOLUME 537045984.
06/27/2007 20:07:28 pdc.vol.module (537045984) updated 06/01/2005 14:10
06/27/2007 20:07:28 totalInodes 3019
06/27/2007 20:07:29 dir vnode 451: ??/.. (vnode 449): unique changed from 6629 
to 11697 -- deleted
06/27/2007 20:07:29 dir vnode 455: ??/.. (vnode 453): unique changed from 6631 
to 7491 -- deleted
06/27/2007 20:07:29 Vnode 449: link count incorrect (was 2, now 1)
06/27/2007 20:07:29 Vnode 453: link count incorrect (was 9, now 8)
06/27/2007 20:07:29 Found 2 orphaned files and directories (approx. 4 KB)
06/27/2007 20:07:29 Salvaged pdc.vol.module (537045984): 3012 files, 25862 block

Second pass with orphans attach:

+ /usr/openafs/bin/bos salvage -server ruffe -partition a -volume 
pdc.vol.module -showlog -orphans attach -localauth
Starting salvage.
bos: salvage completed
SalvageLog:
@(#) OpenAFS 1.4.4 built  2007-04-25 
06/28/2007 15:57:26 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans attach)
06/28/2007 15:57:27 2 nVolumesInInodeFile 64 
06/28/2007 15:57:27 CHECKING CLONED VOLUME 537045986.

06/28/2007 15:57:27 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10
06/28/2007 15:57:27 SALVAGING VOLUME 537045984.
06/28/2007 15:57:27 pdc.vol.module (537045984) updated 06/01/2005 14:10
06/28/2007 15:57:27 totalInodes 3019
06/28/2007 15:57:28 The dir header alloc map for page 0 is bad.
06/28/2007 15:57:28 Directory bad, vnode 451; salvaging...
06/28/2007 15:57:28 Salvaging directory 451...
06/28/2007 15:57:28 Checking the results of the directory salvage...
06/28/2007 15:57:28 The dir header alloc map for page 0 is bad.
06/28/2007 15:57:28 Directory bad, vnode 455; salvaging...
06/28/2007 15:57:28 Salvaging directory 455...
06/28/2007 15:57:28 Checking the results of the directory salvage...
06/28/2007 15:57:28 Salvage volume group core dumped!

How unhappy is my volume or my salvager and where is that core?

Yes, I can access the volume and no, it is not written very often.

[EMAIL PROTECTED] /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ ls
amd64_fc3  i386_fc3  ia64_deb30  man  rs_aix43
bini386_rh9  initmodulefiles  src
[EMAIL PROTECTED] /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ fs lq .
Volume Name   Quota  Used %Used   Partition
pdc.vol.module5 25862   52% 69%  


# vos exa pdc.vol.module -local
pdc.vol.module537045984 RW  25862 K  On-line
ruffe.pdc.kth.se /vicepa 
RWrite  537045984 ROnly  0 Backup  537045986 
MaxQuota  5 K 
CreationFri May 16 10:20:22 2003

CopyWed May  2 21:42:08 2007
Backup  Thu Jun 28 02:18:52 2007
Last Update Wed Jun  1 14:10:44 2005
4874 accesses in the past day (i.e., vnode references)

RWrite: 537045984 Backup: 537045986 
number of sites - 1
   server ruffe.pdc.kth.se partition /vicepa RW Site 


Tips and tricks how to proceed?


The best would certainly be to find out why and where it core-dumped.
Compile the salvager with -g and without -O and run it under gdb with 
-debug (to avoid it forks) or gdb the core file.


Hartmut


Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Salvaging an RO-Volume

2007-06-13 Thread Hartmut Reuter


You need to specify the RW-volumeId for salvage even if there is no RW 
volume in the partition!


Hartmut

Frank Burkhardt wrote:

Hi,

a broken RO-volume resides on one of my fileserver:

 $ vos listvol [fileserver] a
 [...]
  Could not attach volume 536877628 
 
 Total volumes onLine 352 ; Total volumes offLine 1 ; Total busy 0



I don't need it, so I want to remove it:

 # vos remove [heilbutt] a 536877628

 Transaction on volume 536877628 failed
Volume needs to be salvaged


Volume needs to be salvaged
 Error in vos remove command.
 Volume needs to be salvaged

Ok - let's salvage it:

 # bos salvage [fileserver] a 536877628 -showlog
 Starting salvage.
 bos: salvage completed
 SalvageLog:
 @(#) OpenAFS 1.4.4 built  2007-04-23
 06/13/2007 09:10:23 STARTING AFS SALVAGER 2.4 (/usr/lib/openafs/salvager 
/vicepa 536877628)
 06/13/2007 09:10:23 536877628 is a read-only volume; not salvaged

That doesn't work :-( . What is the best way to handle this?

Regards,

Frank
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Illegal instruction

2007-05-07 Thread Hartmut Reuter

Karen L Eldredge wrote:


We are trying to configure our initial AFS server on AIX 5.3, and we are 
running the bos addkey server -knvo 0 -cell cellname -noauth, and it 
will ask for the password for afs twice.  After the entering the 
password the second time we enter and it core dumps and give the Illegal 
instruction message.  Could you help us figure out what we are doing 
wrong.  We have tried several different things without success.


Sometimes these errors disappear if you switch of optimization and 
switch on -g during your build. Do a make clean and edit 
src/config/Makefile.config to remove all -O and to add -g to XCFLAGS.


Hartmut



Karen Eldredge
PSD (AIX/Linux System Support)
AIX Certified Advanced Technical Expert
Certified System Expert - Enterprise Technical Support
Linux Professional Institute Certified
External:  303-924-5767 Tie line:  263-5767  Internal: 5-5767
[EMAIL PROTECTED]



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] How to replicate files on different machines

2006-12-15 Thread Hartmut Reuter

[EMAIL PROTECTED] wrote:

Hi.

I'm using OpenAFS 1.4.2 on Fedora 5. I want to replicate file(s) on 2
machines (both Fedora 5). How could this be achieved? Do I need to
install OpenAFS server on both the machines, and if this is the
requirement, how could the servers be synchronized?


Yes you need a fileserver on the second machine and you need to define
the replication side for each volume by using vos addsite 
The synchronisation is acchieved by vos release for the volume.
This doesn't happen automatically, but you start some script via cron.




Write now I'm facing one other issue. I have installed server on 1st
machine and client on 2nd machine (both Fedora 5). I have given the
cell information for the server on 2nd machine in
/usr/vice/etc/CellServDB, CellServDB.dist and ThisCell.

However, when I start the client, the cell under /afs/ is not
displayed as a directory.

# ls -l /afs/ total 0 ?- 0 root root 0 Jan  1  1970
ps2750.pspl.co.in #

Hence I could not do any further file operations. Am I missing
something?


Did you start the afsd on the client with the option -dynroot?
Do you have a volume root.cell in your cell?
If both is true and your CellServDB and ThisCell information are correct
you should see your cell.

If you don't start afsd with -dynroot you need a volume root.afs and
inside it a mountpoint for your root.cell under the name of your cell.





Thanks and Regards, Shailesh Joshi 
___ OpenAFS-info mailing
list OpenAFS-info@openafs.org 
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Volume numbers

2006-12-04 Thread Hartmut Reuter

Jakub Witkowski wrote:

Hello!


From my observation of AFS volume ID numbers, it appears that they are

always large, pseudo-random numbers unique to a given cell. I'd like to
ask if there any lower/upper bounds to that value? Does volume number
zero can exist? Or is that perhaps a special case?


The volume numbers in a new cell start with what the vldb database 
initialization has but in as starting number. This is 0x2000

or decimal 536870912.

Hartmut


Thank you,

Jakub Witkowski.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] disaster recovery

2006-09-20 Thread Hartmut Reuter

Dimitris Zilaskos wrote:


Hi,

One 1.3.87 linux fileserver died today. After a reboot, the 
filesystem check on vicepa spitted out numerous errors, it fixed them 
filling lost+found with data, and then after salavage I ended up with 
half the volumes missing or corrupted.
I had one backup a few  days old which I used to restore the 
volumes.  I also have a copy of the /vicepa contents from yesterday, 
when the server started to behave strangely. Is there a way to use the 
/vicepa contents in order to  access certain files/directories? 
Unfortunately I do not have a copy of the db files.



The db-files do not matter.

If you have a copy of your /vicepa with correct modebits, ownership,
and group settings for the files you may use this instead of your old 
/vicepa.


It is possible tar/untar vicep-partitions and to use them after that 
again. If you do that on another fileserver you should stop the 
corrupted one, start the new one and do a vos syncvldb newserver in 
order to update the volume location database. This will overwrite the 
location of each volume found on the new server. If this doesn't work 
try vos syncserv newserver (I never understood which one of those 
does what, but one of them does the job).


You need then probably a fs checkvol on the client that he gets the 
new location.


You should also think about having RO-volumes of your RW-volumes on 
other servers in the future. Then you easily can do a vos convertROtoRW 
... to get again working cell.


Good luck!
Hartmut




Cheers,
--
 



Dimitris Zilaskos

Department of Physics @ Aristotle University of Thessaloniki , Greece
PGP key : http://tassadar.physics.auth.gr/~dzila/pgp_public_key.asc
  http://egnatia.ee.auth.gr/~dzila/pgp_public_key.asc
MD5sum  : de2bd8f73d545f0e4caf3096894ad83f  pgp_public_key.asc
 


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: help, salvaged volume won't come back online, is it corrupt? [trimmed log]

2006-09-13 Thread Hartmut Reuter

Juha Jäykkä wrote:

I was wonderin, somewhat off the topic, about the threat posed by this
issue.

Suppose, I lost the root vnode of a replicated volume. What happens to
the replicas? Are they still fine (except perhaps for those on the same
site as the rw volume), or does this corruption destroy all replicas as
well? This is quite important to know for us...

-Juha

In my understanding you easily can propagate this error to your readonly 
replicas by vos releaseing the corrupt volume. The volserver on the 
receiving side would remove any data not mentioned in the dump stream.


Better you do a vos convertROtoRW on the RO-site as soon as possible 
to regain a valid RW-volume in this case.


Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: help, salvaged volume won't come back online, is it corrupt? [trimmed log]

2006-09-13 Thread Hartmut Reuter

Juha Jäykkä wrote:

In my understanding you easily can propagate this error to your
readonly replicas by vos releaseing the corrupt volume. The volserver
on the receiving side would remove any data not mentioned in the dump
stream.



This is frightening. Can I actually vos release the corrupt volume? From
the posts on the list, I'd gather it cannot even be attached - how could
it be released, then?


If it can't be attached anymore, probably not. But I don't know whether 
it really won't come online after the salvager has thrown away the root 
directory!





Better you do a vos convertROtoRW on the RO-site as soon as possible 
to regain a valid RW-volume in this case.



Except that I'm unlikely to notice the corruption before it's released,
which happens automatically. Sounds like we need to change our backup
policy...



The best way to prevent the salvager from corrupting volumes is not to 
run it automatically. If you configure your OpenAFS with with 
--enable-fast-restart then the fileserver will not salvage 
automatically after a crash. So if you find after a ccrash volumes which 
couldn't be attached you salvage them by bos salvage server partition 
volume and examine the SalvageLog. I suppose in the case he throws the 
root-directory away you will see some thing in the log.


Hartmut


-Juha




--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Building OpenAFS 1.4.1 on SLES 9.0

2006-08-09 Thread Hartmut Reuter

David Werner wrote:



The configure-script issues still a warning:

Cannot determine sys_call_table status. assuming it isn't exported

I hope i don't need to patch the kernel.
Does anyone knows something about it?  In March and April there 
were some messages on this list regarding sys_call_table-issues, 
but i dont really know whether they apply in my case.


We are running a client derived from OpenAFS 1.4.1 on sles9 wihtout 
problems since long.


In /var/log/messages a start looks like:

Jul 10 09:11:10 mpp-fs10 syslogd 1.4.1: restart.Jul 10 09:11:12 mpp-fs10 
kernel: libafs: module not supported by Novell, setting U taint flag.
Jul 10 09:11:12 mpp-fs10 kernel: libafs: module license 
'http://www.openafs.org/dl/license10.html' taints kernel.
Jul 10 09:11:12 mpp-fs10 kernel: libafs: no version for sys_close 
found: kernel tainted.
Jul 10 09:11:12 mpp-fs10 kernel: Found system call table at 0xc03a7a00 
(scan: close+wait4)
Jul 10 09:11:12 mpp-fs10 kernel: Starting AFS cache scan...found 0 
non-empty cache files (0%%).


Hartmut

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: About partition size limitations again

2006-07-21 Thread Hartmut Reuter

Adam Megacz wrote:

Serge Torres [EMAIL PROTECTED] writes:


A 2 Tb partition worked without a glitch.



Out of curiosity, what is the largest file you've tried creating?


Largest files here are about 80 GB.

Hartmut




One of the labs here at Berkeley asked if AFS could handle
creation/access of their 200gb simulation data files and I couldn't
find any anecdotal evidence.

Aside from vos_move/vos_dump not being very useful with 200gb
volumes I can't see a problem with this

  - a

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] rxkad patch

2006-05-17 Thread Hartmut Reuter
Here once again my patch for rxkad to allocate only as much space as 
necessary for the security object and not always 12K.

This patch is based on the 1.4.1 version.
For some unknown reason my 1st patch didn't make it into the CVS and 
stable releases.


Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
--- private_data.h.orig 2003-07-16 01:16:42.345588002 +0200
+++ private_data.h  2005-12-16 11:52:19.527178509 +0100
@@ -48,15 +48,17 @@
 afs_int32 ipAddr;  /* or an approximation to it */
 };
 
+#define PDATA_SIZE(l) (sizeof(struct rxkad_cprivate) - MAXKTCTICKETLEN + (l))
+
 /* private data in client-side security object */
 struct rxkad_cprivate {
 afs_int32 kvno;/* key version of ticket */
-afs_int32 ticketLen;   /* length of ticket */
+afs_int16 ticketLen;   /* length of ticket */
+rxkad_type type;   /* always client */
+rxkad_level level; /* minimum security level of client */
 fc_KeySchedule keysched;   /* the session key */
 fc_InitializationVector ivec;  /* initialization vector for cbc */
 char ticket[MAXKTCTICKETLEN];  /* the ticket for the server */
-rxkad_type type;   /* always client */
-rxkad_level level; /* minimum security level of client */
 };
 
 /* Per connection client-side info */
--- rxkad_client.c.orig 2006-02-28 01:19:20.107241106 +0100
+++ rxkad_client.c  2006-04-25 09:41:37.955757683 +0200
@@ -181,7 +181,7 @@
 struct rx_securityClass *tsc;
 struct rxkad_cprivate *tcp;
 int code;
-int size;
+int size, psize;
 
 size = sizeof(struct rx_securityClass);
 tsc = (struct rx_securityClass *)rxi_Alloc(size);
@@ -189,15 +189,15 @@
 tsc-refCount = 1; /* caller gets one for free */
 tsc-ops = rxkad_client_ops;
 
-size = sizeof(struct rxkad_cprivate);
-tcp = (struct rxkad_cprivate *)rxi_Alloc(size);
-memset((void *)tcp, 0, size);
+psize = PDATA_SIZE(ticketLen);
+tcp = (struct rxkad_cprivate *)rxi_Alloc(psize);
+memset((void *)tcp, 0, psize);
 tsc-privateData = (char *)tcp;
 tcp-type |= rxkad_client;
 tcp-level = level;
 code = fc_keysched(sessionkey, tcp-keysched);
 if (code) {
-   rxi_Free(tcp, sizeof(struct rxkad_cprivate));
+   rxi_Free(tcp, psize);
rxi_Free(tsc, sizeof(struct rx_securityClass));
return 0;   /* bad key */
 }
@@ -205,7 +205,7 @@
 tcp-kvno = kvno;  /* key version number */
 tcp-ticketLen = ticketLen;/* length of ticket */
 if (tcp-ticketLen  MAXKTCTICKETLEN) {
-   rxi_Free(tcp, sizeof(struct rxkad_cprivate));
+   rxi_Free(tcp, psize);
rxi_Free(tsc, sizeof(struct rx_securityClass));
return 0;   /* bad key */
 }
--- rxkad_common.c.orig 2006-02-28 01:19:20.361083608 +0100
+++ rxkad_common.c  2006-04-25 09:43:04.572665345 +0200
@@ -68,7 +68,7 @@
 #include strings.h
 #endif
 #endif
-
+#include afs/afsutil.h
 #endif /* KERNEL */
 
 #include des/stats.h
@@ -311,7 +311,8 @@
 tcp = (struct rxkad_cprivate *)aobj-privateData;
 rxi_Free(aobj, sizeof(struct rx_securityClass));
 if (tcp-type  rxkad_client) {
-   rxi_Free(tcp, sizeof(struct rxkad_cprivate));
+   afs_int32 psize = PDATA_SIZE(tcp-ticketLen);
+   rxi_Free(tcp, psize);
 } else if (tcp-type  rxkad_server) {
rxi_Free(tcp, sizeof(struct rxkad_sprivate));
 } else {


Re: [OpenAFS] Problem with vos move

2006-04-21 Thread Hartmut Reuter

Wheeler, JF (Jonathan) wrote:

I have attempted to move a large (about 80 Gb) AFS volume from one
partition to another using the command:

vos move VOLUME MACHINE i MACHINE l

I left this running overnight and found the following error messages
when I checking this morning:

Failed to move data for volume 536871896
   rxk: sealed data inconsistent
vos move: operation interrupted, cleanup in progress...
clear transaction contents
FATAL: VLDB access error: abort cleanup
cleanup complete - user verify desired result

The current situation is:

1. The volume is on-line on the source partition, but the volume (VLDB
entry) is locked.  Here is the result of the command vos listvol
MACHINE i:

Total number of volumes on server wallace partition /vicepi: 1 
bfactory.vol2 536871896 RW   83886080 K On-line


Total volumes onLine 1 ; Total volumes offLine 0 ; Total busy 0

and the result of the command vos listvldb bfactory.vol2:

bfactory.vol2 
RWrite: 536871896 
number of sites - 1
   server wallace.cc.rl.ac.uk partition /vicepi RW Site 
Volume is currently LOCKED


2. The volume is off-line on the destination partition (same size and
same volume number).  Here is the result of the command vos listvol
MACHINE l:

Total number of volumes on server wallace partition /vicepl: 1 
bfactory.vol2 536871896 RW   83886080 K Off-line


Total volumes onLine 0 ; Total volumes offLine 1 ; Total busy 0

Please note that the server is running IBM/Transarc version 3.6 (though
it may not matter in this case).

My questions are:

a) What went wrong ?
b) What chance is there that the volumes are identical ?  In other
words, is it possible that I can complete the move manually ?
c) Is there anyway to compare the 2 volumes ?

Any help would be very much appreciated


I suppose your token expired over night.

The vos move command 1st creates a clone of the volume, then dumps this 
clone over to the new partition. This probably takes the long time and 
in this phasis your token expired.
Then the still off-line volume on the sink side should have been updated 
by an incremental dump of the RW-volume on the source side.
This either could not happen any more because of the expired token or it 
happened on behalf of the source side volserver whithout any new rpc 
from your vos-command. But then the vldb must be updated by a ubik-call 
from your vos command and that one failed. The next rpc would have 
brought on-line the volume on the sink side and other rpcs would hvae 
removed the volume and the clone and backup.


If the RW-volume has not been modified since the begin of the vos move 
the off-line version on the sink side should be complete.


Hartmut Reuter




Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory
(cell rl.ac.uk)
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Inode file server on Linux?

2006-04-12 Thread Hartmut Reuter

Frederic Gilbert wrote:

Hi,

I have seen in the 1.4.0 announcement:


Cache chunk locking has been refined, and native operating
system vnodes on Linux and MacOS X are now supported.


This has nothing to do with the fileserver, only with the client.
Unfortunately the term vnode is used in different sense on the client 
and server: while on the client the vnode is a structure in the kernel 
on the server a vnode is a line in either the small or large 
vnode-file of the volume (and of course also a structure in the 
fileserver's memory).


Hartmut



When I have read this, I understood that now the inode file server was 
supported on Linux. But I don't see any other reference to this, so 
maybe I am wrong?
So, is it possible to use the inode file server with Linux/ext2, and if 
so how can I activate it?


Thanks,
Frederic Gilbert.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] VOS commands

2006-02-07 Thread Hartmut Reuter

Juha Jäykkä wrote:

I was wondering, what do the commands vos changeloc and vos
convertROtoRW actually do? The vos help on these is rather scarce and
openafs.org docs have nothing at all. Obviously new additions. Are they
documented somewhere? Don't tell me in the defunct wiki. =) Luckily it's
on its way back up...

-Juha



vos converRotoRW can be used if you have lost the partition where the 
RW-volume was but still have a RO-volume somewhere else. In this case 
you can convert the RO-volumes to the new RW-volume. This is much faster 
than a vos dump | vos restore because it only changes some fields in 
the volinfo-file and renames some files in /vicepx/AFSIDat//special


This is part of a backup strategy:

The mount points for our home directories are always made with the -rw 
option. We then have in the same partition a RO-volume and another one 
on a different server. The first RO-volume keeps the time needed for the 
reclone during vos release short and doesn't really consume much disk 
space because only changed files and some metadata files are duplicate.

The remote RO-volumes are our real backup system.

Even if you have lost a TB partition you can be back again in production 
after half an hour.


Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AFS clients and IP address 192.168.67.1

2006-01-04 Thread Hartmut Reuter

Wheeler, JF (Jonathan) wrote:

Whilst investigating a network problem, one of our network gurus noticed
that our AFS client systems are sending packets to IP address
192.168.67.1 (I confirmed this by using tcpdump).  Issuing the command:

fs getserverprefs -numeric | grep 192

gave the reply:

192.168.67.1   40006

Please would someone explain what is happening here; is it related to
dynroot ?  I expect that it is a FAQ.


No. Probably one the fileservers contacted by this client has also a 
secondary private interface which is not masked by a NetInfo or 
NetRestrict file.
The server preferences of 40006 indicates it is a fileserver not 
belonging to your own network, so it's some work to find out where it 
belongs to.


If you do a vos listaddr -cell XXX for all cells in your CellServDB 
you should be able to indentify this server.


Hartmut



Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory (AFS cell rl.ac.uk)
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] TSM and AFS

2005-09-27 Thread Hartmut Reuter

Tony Derry wrote:
Is anyone using Tivoli Storage Manager with AFS filesystems?  This is a 
terribly slow solution and we havnt been able to find an acceptable 
alternative.  We are using a P650 as a server and LTO2 tape drives.  We 
recently did a restore of 20 GB that took 27 hours.  Non AFS data can be 
restored at about 50 GB per hour.


Any ideas would really be appreciated.  Thanks.

t.


We are using also TSM for file-based backup. This, however, is used only 
to restore old versions of single files, not in case of broken 
filesystems. With today's huge /vicep-partitions we use RO-volumes on 
separate servers as backup solution. These RO-volumes can be converted 
to RW-volumes within few minutes.


Hartmut




Anthony Derry, Manager
University of Maryland, OIT
Enterprise Storage Services
Building 224, Rm. 1317
College Park, Md. 20742
[EMAIL PROTECTED] 301-405-3059


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unusable empty partition

2005-06-01 Thread Hartmut Reuter

Derek Atkins wrote:

Last I checked, Vice Partitions were only allowed a one
character name, meaning /vicepcy is an invalid Vice
Partition.  Unless it's been changed recently, you can
only have /vicepa through /vicepz


That's not true: since long you can have up to 255 partitions, that is 
from /vicepa to /vicepju or /vicepjv.


Hartmut



-derek

Hans-Gunther Borrmann [EMAIL PROTECTED] writes:



Hello,

I have one partition on a server, which is empty:
Total number of volumes on server localhost partition /vicepcy: 0 
Total volumes onLine 0 ; Total volumes offLine 0 ; Total busy 0


But I cannot move any volumes to this partition:
[EMAIL PROTECTED]:~ vos move www.natscan.kirsche7 servera by serverb cy 
-verbose
Starting transaction on source volume 537085072 ... done
Cloning source volume 537085072 ... done
Ending the transaction on the source volume 537085072 ... done
Starting transaction on the cloned volume 537091168 ... done
Creating the destination volume 537085072 ... done
Dumping from clone 537091168 on source to volume 537085072 on 
destination ...Failed to move data for the volume 537085072

  VOLSER: Problems encountered in doing the dump !
vos move: operation interrupted, cleanup in progress...
clear transaction contexts
access VLDB
move incomplete - attempt cleanup of target partition - no guarantee
cleanup complete - user verify desired result

The VolserLog shows:
Wed Jun  1 10:18:03 2005 VAttachVolume: Failed to open /vicepcy/V0537085072.vl 
(errno 2)
Wed Jun  1 10:18:03 2005 1 Volser: CreateVolume: volume 537085072 
(www.natscan.kirsche7) created

unable to allocate inode: File exists
Wed Jun  1 10:18:03 2005 1 Volser: ReadVnodes: Restore aborted
Wed Jun  1 10:18:03 2005 1 Volser: Delete: volume 537085072 deleted

and df -k gives:

[EMAIL PROTECTED]:logs]# df -k /vicepcy
Filesystem1024-blocks  Free %UsedIused %Iused Mounted on
/dev/vicepcy262144000 202566836   23% 2191 1% /vicepcy

which means that about 60 GB are occupied.
What to do? The server is a namei server. My idea is therefore to simply 
remove all files and directories except 
./lost+found

./Lock
./Lock/vicepcy
./AFSIDat
./AFSIDat/README

Will this be safe?



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] can not change a backup or readonly volume

2005-04-28 Thread Hartmut Reuter
You are certainly in your readonly-tree.
What shows fs lq?
To get into the RW-path do a
cd /afs/.dma_lab.ecs.syr.edu/usr/bob
then it will work. Later do an vos release for the volume where you 
created the mount point.

Hartmut
[EMAIL PROTECTED] wrote:
hi,
one step forward, 

At least the dot in front of the cell name could be a showstopper. Do
you need the -cell option at least? Mine vos create works without.

vos create works;
[EMAIL PROTECTED]/afs/computer_lab.edu/usr/bob% klog admin
Password:
[EMAIL PROTECTED]/afs/computer_lab.edu/usr/bob% vos create -server 
addedserver.edu -partition /vicepe -name addedserver-afs
Volume 536871114 created on partition /vicepe of addedserver.edu
[EMAIL PROTECTED]/afs/computer_lab.edu/usr/bob% vos listvol -server 
addedserver.edu
Total number of volumes on server addedserver.edu partition /vicepe: 1
addedserver-afs536871114 RW  2 K On-line
Total volumes onLine 1 ; Total volumes offLine 0 ; Total busy 0
-
but this command doesn't work ...
[EMAIL PROTECTED]/afs/dma_lab.ecs.syr.edu/usr/bob% fs mkm -dir 
/afs/computer_lab.edu/added-afs -vol 536871114
fs: You can not change a backup or readonly volume

__
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register
Netscape. Just the Net You Need.
New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Please help! Error in server synchronisation

2005-03-14 Thread Hartmut Reuter
[EMAIL PROTECTED] wrote:
Hello everybody,
I've got two servers running AFS. Both of them are running the fileserver and 
database  processes. But there is a problem whenever I want to synchronize the 
VLDB with the vos syncvldb command.
The error it shows is always the no quorum elected-error.
I've read a lot about no quorum elected-errors in the archives of this 
mailing lists. No I've already setup one server as ntp-server and the other as client so 
I think it isn't the synchronisation problem of Ubik.
Is there someone who has experience with this error, or someone who can help me.
Big THX
Greetz 
Have you restarted the databse servers on the older machine after adding 
the new host by bos addhost? If not, the old node doesn't know about 
the new node.

What say udebug node1 7003 and udebug node2 7003?
Hartmut

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Please help! Error in server synchronisation

2005-03-14 Thread Hartmut Reuter
[EMAIL PROTECTED] wrote:
What say rxdebug node1 7003 -version and rxdebug node2 7003 -version?

Sorry, should be udebug instead of rxdebug!!

rxdebug SCM 7003 gives:
Trying 10.1.202.20 (port 7003):
Free packets: 634, packet reclaims: 4, calls: 1094, used FDs: 6
not waiting for packets.
0 calls waiting for a thread
16 threads are idle
Done.
rxdebug secondserver 7003 gives:
Trying 10.1.202.6 (port 7003):
Free packets: 634, packet reclaims: 0, calls: 385, used FDs: 6
not waiting for packets.
0 calls waiting for a thread
16 threads are idle
Done.
Hope someone can help me, because this problem is taking too much time :-(
THX for your response!
Greetz


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] afs partition on nfs ?

2005-03-11 Thread Hartmut Reuter
HM wrote:

Hi all,
I;m just starting the afs adventure, checked the archives but couldn't 
find anything about this.
I would like to mount_nfs the vicepx partition on a nas. I was wondering 
if this is possible at all ?
Do i need to use special builds etc ? (like using 
--enable-namei-fileserver)
with --enable-namei-fileserver should it be no problem to use a 
NFS-mounted partition as vicep-partition. I would recommand using the 
namei server any way.

Hartmut

Thanks,
Hans

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to move volumes

2005-03-11 Thread Hartmut Reuter
Hans-Gunther Borrmann wrote:
On Wednesday 09 March 2005 17:21, Derrick J Brashear wrote:
On Wed, 9 Mar 2005, Hans-Gunther Borrmann wrote:
On the destination the VolserLog contains:
Wed Mar  9 16:52:11 2005 VAttachVolume: Failed to open
/vicepcp/V0537085440.vl (errno 2)
Wed Mar  9 16:52:11 2005 1 Volser: CreateVolume: volume 537085440
(usr.md0) created
unable to allocate inode: File exists
Namei or inode? I'm unsure why the file would exist, but if it's namei, I
suppose you could use a syscall tracer and see what's getting EEXIST, I'd
be curious to hear what you have that's in the way.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Sorry. I forgot the VolserLog:
Fri Mar 11 14:38:57 2005 VAttachVolume: Failed to open /vicepcp/V0537085534.vl 
(errno 2)
Fri Mar 11 14:38:57 2005 1 Volser: CreateVolume: volume 537085534 
(usr.sperling) created
unable to allocate inode: File exists
Fri Mar 11 14:38:57 2005 1 Volser: ReadVnodes: Restore aborted
Fri Mar 11 14:38:57 2005 1 Volser: Delete: volume 537085534 deleted 

Gunther
If you look into
/vicepcp/AFSIDat/S=/SNo+U/special
is there anything? If so remove the subtree
/vicepcp/AFSIDat/S=/SNo+U
and - if existent - /vicepcp/V0537085534.vl
and try again. The message unable to allocate inode: File exists looks 
like there is some volume special file from an earlier try around.

Hartmut
--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Second AFS-server problems!

2005-03-08 Thread Hartmut Reuter
You forgot to tell what problems you have! How can we guess?
Hartmut
[EMAIL PROTECTED] wrote:
Hello Everyboby,
First maybe a little introduction because this is my first mail to
this mailing list. I'm a 23 jear old student from Belgium and since
this year an official Linux user. For this last year all the students
have to do a big project, my project is to do research about
clustering filesystems. Now I am trying to get an OpenAFS system
running, I have succeeded in getting one SCM up and running. But I'm
having trouble with the second server. I hope someone can help me
with this problem because I'm running out of time. The purpose is to
get the exact same server as the first running. To set up the first,
I used the Gentoo Manual for AFS. I allmost read the whole manual on
OpenAFS while trying to get the second server up and running, but
still I didn't succeed. Hope someone can help a bit closer to my
objective and give me the wright commands to get the second server up
and running and synchronizing with the first.
Thx
Loretto
___ OpenAFS-info mailing
list OpenAFS-info@openafs.org 
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AFS+Large file support

2005-03-08 Thread Hartmut Reuter
The current unstable release 1.3.79 has large file support on client and 
server side. At least if you compile from source you can configure with 
--enable-largefile-support and that compiles the fileserver with large 
file support. I am not sure whether this is possible only with the so 
called namei-fileserver (which stores the data visible in the /vicep 
partition) or also with the classical interface which hides the files in 
the /vicep partition.

Hartmut
Manel Euro wrote:
Hello,
I am studying the implementation of OpenAFS in the company I work for.
I have reading about OpenAFS' large file support and the fact that AFS 
has 2gb file size limite is a problem to my implementation.

We have several processes that need to use files larger than 7 Gb and 
produce large log files.

I have read that support to Large file and volumes is projected but it 
is not yey in progress.

Is there any way I can add support for large files with the latest sable 
OpeanAfs version for linux? Or, do I need to wait for the next versions 
with large file support?

Regards,
FM
_
Express yourself instantly with MSN Messenger! Download today - it's 
FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: fs process doesn't exit until I send a signal 9

2005-02-26 Thread Hartmut Reuter
Mike Polek wrote:
On Thu, 24 Feb 2005, Derrick J Brashear wrote:
  On Thu, 24 Feb 2005, Gabe Castillo wrote:
 I had to shut down one of my AFS servers to replace a disk. When 
I issue a
   bos shutdown command, all the processes seem to shutdown, except 
for   the fs
   process. When I vos status the server, it says that the 
fileserver has been
   disabled, and the sub-status is in the process of shutting down. 
Is there

 Wait while it breaks callbacks. you can watch the status in
 /usr/afs/logs/FileLog
---
For what it's worth, I have servers that have thousands of volumes
on each partition. (Ok... maybe a poor design choice, but I didn't
know the single threaded volume server would be an issue when I did
the design...) After 30 minutes, the bosserver assumes that the
fileserver isn't going to stop, and does a kill -9 to stop it.
I'm pretty sure it's just because of the sheer number of volumes
to unmount.
1) Is there an easy way to change the timeout value? I'm not sure
   yet if it's faster to do the kill -9 one minute into the shutdown
   and just let the salvager do it's thing, or if it's better to
   let the shutdown take an hour. I can say that it would be helpful
   to have an emergency procedure that won't corrupt volumes for when
   the shutdown is triggered by a power failure. :-)
I think it's unsane if the shutdown takes that long. There must be a 
problem with your clients, perhaps switched off PCs, that the callback 
has to wait for timeouts. The writing of the volume information to disk 
never should take that long even if you have 1 volumes on a server.
If you have compiled with --enable-fast-restart you can kill your 
fileserver after a while (after all active RPCs have finished) and the 
only disadvantage at restart may be that the uniquifier is too low.

2) I noticed that in the 1.3 branch, the volume server is multi-
   threaded. (THANK YOU!!!) Does anybody know how this affects
   shutdown/startup time? Should I still be looking for a way to
   reduce the number of volumes on a server?
The volserver has nothing to do with the time needed by the fileserver 
to shutdown. The volserver only does volume operations such as move, 
backup or release.


3) I've seen references to a NoSalvage option. Is that also new
   in 1.3? or is it some sort of patch? Anybody have a really good
   way of dealing with lots of volumes on a server? We currently
   have almost 60T of storage, and it's growing. I like the idea
   of having things well organized into finite volumes... it works
   for our setup.
Is your NoSalvage option the same as --enable-fast-restart? if so, this 
I introduced to avoid hours of salvaging after a crash. My experieance 
was that the log contained nearly never a real error message. I think 
it's better to let the fileserver automatically take a volume off-line 
when he detects an inconsistency than to have to wait hours for a restart.

Hartmut

Any assistance is appreciated.
Thanks,
Mike
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AIX client panics machine if ThisCell is invalid

2005-02-11 Thread Hartmut Reuter
Have you started afsd with -afsdb ? This at least crashed my client 
under AIX 5.2 regularly.

Hartmut
Ben Staffin wrote:
So ... I discovered the hard way that (on AIX at least) if you put a
non-existent cellname in /usr/vice/etc/ThisCell and start the AFS
client, the system will panic as soon as afsd is started.  Weird, huh?
Pretty obscure error condition, but still ... kernel panic?  Yow.

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] releasing volumes automatically

2005-02-10 Thread Hartmut Reuter
Marco Spatz wrote:
Hi,
I've set up an OpenAFS und a 4 machines cluster, and have made replicas 
of the user's home volumes. I know, that generally, this is a bad idea, 
but I want to use this as backup the user can access without my help 
(I've mounted the readonly volumes to another mountpoint).
And now I want the AFS system to release this volumes at night, but I 
don't know how to this. Thought about writing a cronjob, but I don't 
know how do gain access as admin to get the rights to execute 'vos 
release'.
Is there any possibility to tell OpenAFS to release certain (or all) 
changed volumes at a certain time? Would be a great help.
We run on our fileserver machines cronjobs under root which use
vos release xxx -localauth
Hartmut

Thanks for your help,
Marco
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] crash on AIX 5.2

2005-01-11 Thread Hartmut Reuter
I am in the process of tracking down all differences between my good 
version and 1.3.77.

I am now not very distant from 1.3.77, and at least one problem seems
to be the new code in afs_pioctl.c for get and set tokens along with
the huge ticket size introduced for compatibilty with active directory.
Keeping the old ticket size and the old code for tokens in afs_pioctl.c
results in a fairly stable client. At least I can get a token, make 
clean in the openafs-tree and make dest without crashing the system.
This is certainly not enough testing for putting it into production,
but a hint where the problem may be hidden.

Hartmut
Michael Niksch wrote:
  I have compiled 1.3.77 under AIX 5.1 and see the same problem. In my
  case the machine crashes after getting a token. It seems to work
  before.
I am seeing that same problem with 1.3.77 on all AIX 4.3.3, AIX 5.1, and 
AIX 5.2, both 32 and 64 bit kernel. In contrast to previous versions of 
1.3, I can at least load the kernel extensions now. Obtaining a token 
with 1.3.77 'klog' from a kaserver causes a core dump, and trying to use 
a token obtained with 1.2.10 'klog' results in a system crash.

Yes, I understand that 'klog' and 'kaserver' are considered more or less 
deprecated, and we are in fact planning to change to a Kerberos 5 setup, 
but this migration will take us quite some time to complete. Also, we 
will need continued interoperability with legacy IBM Transarc AFS cells 
for quite some time to come. So it would be great if 'kaserver' support 
wasn't silently dropped out of OpenAFS already at this point.

So far, I haven't been able to compile ANY version of OpenAFS client to 
work on AIX 5. The server code might be less problematic as it doesn't 
involve kernel extensions, but I am also still running 1.2.11 server 
binaries compiled on AIX 4.3.3 on my AIX 5.2 servers.


--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] crash on AIX 5.2

2005-01-11 Thread Hartmut Reuter
Jeffrey Altman wrote:
Hartmut Reuter wrote:
I am in the process of tracking down all differences between my good 
version and 1.3.77.

I am now not very distant from 1.3.77, and at least one problem seems
to be the new code in afs_pioctl.c for get and set tokens along with
the huge ticket size introduced for compatibilty with active directory.
Keeping the old ticket size and the old code for tokens in afs_pioctl.c
results in a fairly stable client. At least I can get a token, make 
clean in the openafs-tree and make dest without crashing the system.
This is certainly not enough testing for putting it into production,
but a hint where the problem may be hidden.

Hartmut

We know the problem is in the set/get token code on AIX.  More then
likely the stack is too small to support a 12000 byte object and it
is getting blown away on AIX.  The question is:
  * where is this object that is located on the stack?
If you can find that, then you will have solved the bug.
Does not look like stack overflow. The crash always happens in xmalloc1:
(0) f
pvthread+00A500 STACK:
[006021F0]xmalloc1+0007AC (0200, F1E00C22E000,
   , F1E00C22E000, 0400, F1E03B964269,
   0002, 003E4338 [??])
[00606B70]xmalloc+000208 (??, ??, ??)
[08E41978]afs_osi_Alloc+5C (??)
[08EBC6DC]afs_HandlePioctl+0003D4 (, 800C5608800C5608,
   F0002FF3A400, , F0002FF3A438)
[08EC74F8]afs_syscall_pioctl+000294 (, 800C5608800C5608,
   2FF21FC0, )
[08E46000]syscall+0001A0 (00140014, ,
   800C5608800C5608, 2FF21FC02FF21FC0, , 2E6D70672E6D7067,
   00800080)
[08E45DB8]lpioctl+50 (, 800C5608800C5608,
   2FF21FC0, )
[379C]sc_msr_2_point+28 ()
Not a valid dump data area @ 2FF21CF0
(0)
So there probably storage on the kernel heap was overwritten.
Hartmut
Jeffrey Altman


--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] SuSE 9.2: anyone?

2005-01-04 Thread Hartmut Reuter
Derek Atkins wrote:
Sergio Gelato [EMAIL PROTECTED] writes:

* Sensei [2004-12-23 16:54:08 +0100]:
Has anyone got AFS working on suse 9.2 using their afs client? I had to 
fix a script (it searched for kernel module libafs, actually the one 
shipped with suse is called kafs) but anyway, afsd isn't starting:
Is kafs the OpenAFS implementation or something else?

kafs is the (broken) Linux Kernel AFS implementation.  It is not
OpenAFS, does not work with OpenAFS' afsd, and if you build kafs you
will lose badly.
-derek
SuSE 9.2 has the openafs client in its distribution.
After running YOU you also schould get a working version.
You should run it with -memcache to avoid the bug which fills
up the cache partition, I think it's not corrected there.
I replaced in /etc/sysconfig/afs-client the following:
XXLARGE=-stat 4000 -daemons 6 -volumes 256 -fakestat -blocks 262144
XLARGE=-stat 3600 -daemons 5 -volumes 196 -fakestat -blocks 133072
LARGE=-stat 2800 -daemons 5 -volumes 128 -fakestat -blocks 65536
MEDIUM=-stat 2000 -daemons 3 -volumes 70 -fakestat -blocks 32768
SMALL=-stat 300 -daemons 2 -volumes 50 -fakestat -blocks 16384
MEMCACHE=yes
You may also add -chunksize 20 if you have enough memory because it 
makes the client much faster (otherwise it does an rpc for each 8 K
for read).

Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] memcache or diskcache on ramdisk

2004-12-02 Thread Hartmut Reuter
EC wrote:
Hi,
Do using memory cache for AFSD 'better' (faster, more stable, etc..) than
disk cache over RAMDISK or TMPFS ?
EC. 
My experience with 1.3.74 is that memcache is really fast. We reach 70 MB/s
for write and 48 MB/s for read of an 8 GB file which is about 20 MB/s 
faster than
with ramdisk. This was on SuSE SLES-9 with kernel 2.6.5-7.

With the 2.6 kernel there is still a problem that the disk cache gets 
full which makes
the use of ramdisk nearly impossible.

Of course disk cache still makes sense if your network is slower than 
your local
disk. But in production environments such as blade centres the network 
typically
is much faster than the internal disk.

Hartmut

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Question about RO/RW Volumes...

2004-10-13 Thread Hartmut Reuter
use the -rw  option for fs mkm to force use of the RW volume.
We do the same: all user volumes are mounted with -rw and have 2
RO copies one in the same partition to make the reclone fast and
another one on another fileserver as a backup in case the 1st
partition gets lost.
We also have another tree where the RO-volumes are mounted to
allow users to get back their files from yesterday.
The automatic release of volumes theat have changed is done in
a cron job on each fileserver machine during the night.
Hartmut
Lars Schimmer wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi!
Just a question about RO/RW copies.
We have set up 3 volumes for every user (home, work, ftp) and few others
with CVS, svn, data,...
For easy backup we've made RO copies of nearly all volumes.
But now, with 2 database servern and all the RO copies, we run into a
problem not thought about before.
With the 2nd database server in the cellservdb, most machine use the RO
copies of the volumes. With some volumes (archive, cdonline) that's OK
for working (but hey, these data isn't really small to hold a RO copy),
but with CVS, svn or home dirs, a RO copy-mount isn't really nice.
How can we be sure, to have RW Access to these volumes?
It would be nice, if OpenAFS would loadbalance the read to all the RW 
RO volumes, but write only to the RW volume and than automaticly release
this volume...
The only dirty solution I found is to mount the root.cell volume RW as
/afs/.url.to.domain to have guranteed RW access to the volumes.
Cya
Lars
- --
- -
Technische Universität Braunschweig, Institut für Computergraphik
Tel.: +49 531 391-2109E-Mail: [EMAIL PROTECTED]
PGP-Key-ID: 0xB87A0E03
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Debian - http://enigmail.mozdev.org
iD8DBQFBbOo/VguzrLh6DgMRAjcjAKCZOu57oAGC4UCu7uiVgMCCjg5OnwCeP6hn
wLaX2jZOksBZfo7iA6bI+40=
=GIK6
-END PGP SIGNATURE-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] can't mount afs on /afs

2004-09-09 Thread Hartmut Reuter
[EMAIL PROTECTED] wrote:
hi,
i'm trying to work this out for a couple of weeks with no success. i'm using
RedHat 9 and openafs 1.2.11. Everything goes ok until i install the client.
At bootup, and after starting afs daemon i always got
Can't mount AFS on /afs(22)
Lost contact with file server at 127.0.0.1 in cell ...

What says vos ex root.afs?
I guess that the vldb contains the wrong address and your client 
therefore tries to
contact the fileserver 127.0.0.1 which is he himself. But there is no 
filserver running.

if the vldb for some reason got the 127.0.0.1 associated with this 
volume you may try
vos changeaddr 127.0.0.1 correct ip adress
to solve the problem.

Hartmut

i read about a problema about hostname and fqdn but i have no idea what
to do. my hostname is set to fileserver, that's what hostname says, and
my /etc/hosts resolves fileserver to the local ip (but not to 127.0.0.1).
I read about it somewhere in the list but was not good to me
any clues?
thx a lot
claudio
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] ACLs not working on afs volumes! Help!

2004-08-19 Thread Hartmut Reuter
This is intended behaviour. It may be discussed whether it's really
a good idea, but the code in src/viced/afsfileprocs.c in the
routine Check_PermissionRights (line 835 ff) shows
if (CallingRoutine == CHK_STOREACL) {
if (!(rights  PRSFS_ADMINISTER)
 !VolumeOwner(client, targetptr))
return (EACCES);
} else {
That means if the client user is the owner of the volume (the owner of 
the volume's
root directory) he doesn't get EACCES.

-Hartmut
matt cocker wrote:
Hi
We are having a weird problem with some afs volumes in that if a user 
has had admin access to a volume and we remove admin access from the acl 
list for that user (or remove the user from the acl list completely) the 
user can just add themselves back. Is this intended behavior?

All our user volumes are prefixed with user. i.e user.username
We have tested other volumes but it only seems to be volumes the user 
has had full access to.

The problem (same for linux and windows)
$ fs listacl /afs/ec.auckland.ac.nz/users/t/ctcoc006
Access list for tcoc006 is
$ fs listacl /afs/.ec.auckland.ac.nz/users/t/c/tcoc006
Access list for /afs/.ec.auckland.ac.nz/users/t/c/tcoc006 is
$ ls /afs/ec.auckland.ac.nz/users/t/ctcoc006
ls: tcoc006: Permission denied
$ fs setacl -dir /afs/ec.auckland.ac.nz/users/t/c/tcoc006 -acl tcoc006 all
$ fs listacl /afs/.ec.auckland.ac.nz/users/t/c/tcoc006
Access list for /afs/.ec.auckland.ac.nz/users/t/c/tcoc006 is
Normal rights:
  tcoc006 rlidwka
$ fs listacl /afs/ec.auckland.ac.nz/users/t/c/tcoc006
Access list for tcoc006 is
Normal rights:
  tcoc006 rlidwka
We are looking into other effected volumes but at the moment I just want 
to know if we have miss understood how acls work but users can't even 
view the acls of volume mount points that the don't have acl entries for 
i.e.

fs: You don't have the required access rights on 'tcle012'
Access list for tcoc006 is
Confused
Cheers
Matt


___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Redundant cell

2004-07-15 Thread Hartmut Reuter
You don't need another cell, just other database and - may be - fileserver
on your other floor. If the connection is cut off the sync-site database 
server
(the one with the lower ip-address) will work as before and the other one
should still be sufficient to reply to read-olnly requests such as
volume location or pts membership requests.  Most cells are built with this
redundancy because it is one of the main feautures of AFS.

-Hartmut
Sensei wrote:
Hi. I'm back and I have a question, maybe not so common. :)
I've built an openafs cell, on debian stable. It authenticates over
kerberos 5, and gains a token from openafs_session, so no kaserver and
no passwords anywhere other than kerberos db. Good it works. Now, my
question about it is: how to make it redundant?
We have a quite unreliable network. The server is on one floor and I'm
thinking about having a second server on the second floor. I need these
two cells to work cooperatively but ``independent'' one from each other.
In other words, if the link between the two servers goes down, each
floor keep to authenticate and work. Login can work fine, even without
the home directory, which can reside on the other server. How can I do
this?
Do not bother about krb5. I'm dealing now with all the afs issues.

--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Volumes lost on fileserver / vldb post fs crash

2004-04-24 Thread Hartmut Reuter
Is this a namei-fileserver? If so you could cd as root into the 
/vicep-partitions
and look what remained. If the fileserver doesn't report any volumes 
probably
the volume header files are gone, but not necessarily all the data in 
AFSIDat.
You may do a du to see what remained.

What says the SalvageLog. It should report about any volumes the salvager so
an why he deleted them.
Forrest D Whitcher wrote:
I had a fileserver crash yesterday with apparently bad consequences.

The volumes are no longer listed in the vldb (I have a listing of
ID's names etc but I'm not sure how much that helps.
command vos listvol fileserver gives:

Total number of volumes on server thing partition /vicepa: 0 

Total volumes onLine 0 ; Total volumes offLine 0 ; Total busy 0

Total number of volumes on server thing partition /vicepb: 0 

Total volumes onLine 0 ; Total volumes offLine 0 ; Total busy 0

Total number of volumes on server thing partition /vicepd: 0 

Total volumes onLine 0 ; Total volumes offLine 0 ; Total busy 0

After the crash I stopped, then restarted all services on the fs,
tho I have not yet done any restart on the database server.
post restarting services I ran a salvage, which ran fairly quick
(3-4 min. on 2 4g and 1 8g partitions on a k7/600 system). I fear
the speed with which this finished may well indicate the fs's view
of what volumes it houses are well and truly lost.
the vldb showed the correct entries for a few hours after this fs
crashed and restarted, I've tried to do the following to restore 
volumes that had been on the fileserver:

vos syncv fileserver /vicepd 536870970

So the question is, do I have any reasonable chance of recovering 
the currently invisible volumes on this fs?

If so, how should I be going about it?

thanks for any ideas

forrest
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] File Size Limitations

2004-03-10 Thread Hartmut Reuter
With the stable releases (which are not compiled with large file support)
the maximum filesize is (2GB - chunksize).
With instable 1.3.52 you have large file support on the client and (as 
far as I know)
on the server side. However, on the client side it's available only for
AIX (4.2 - 5.2), Linux 2.4
and (Solaris =8 with 64bit kernel), unless someone else has done the work.

Hartmut



Penney Jr., Christopher (C.) wrote:
How exactly does the file size limitation show up in OpenAFS (ie. 2GB 
per file maximum)?  I've been told that the maximum file size depends on 
a couple of factors and I'm trying to figure out what they are.  Right 
now I have a test environment with a couple of Solaris 9 boxes (one the 
file server and one a client) and I'm seeing a 2GB per file size limitation.
 
   Chris
 


--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Questions, vol. 2.

2004-01-21 Thread Hartmut Reuter
Stephen Bosch wrote:
More questions!

-Volumes and volume sizes -- what do you use as a typical volume 
size/quota? The default is 5 Mb, which is ridiculously small (and points 
toward an assumption that AFS will be used largely for user home 
directories). What is too big? For example, I have just created a volume 
with a 4 Gb quota, as that will comfortably fit on a DVD-R.
We have many home-directory volumes with ~ 5 GB, but the larger a volume 
is the longer takes it
to move it to another server or partition.


-Volume granularity -- at a minimum, a volume must correspond to one 
directory, correct? In other words, I can't concatenate volumes invisibly.
correct.

-Another partition question -- on a /vicepxx partition, where does the 
data actually reside?
If you have a namei-fileserver (under Linux always) they are under 
/vicepxx/AFSIDat
there is a tree of subdirectories where data belonging to a volume are 
in a common subtree.



-Unix/AFS user account synchronization: We have two existing 
workstations that are heavily used. These workstations will also use 
AFS, but we don't want to move their local home directories to the AFS 
cell. Do we have to? All the docs seemed geared to that, but all we want 
is an AFS cell where we can save critical data and then replicate it or 
back it up.
You don't have to synchronize uids and AFS-ids. It's only nicer to see 
the file ownership correctly
because it is translated by /etc/passwd.


The docs leave me with the understanding that a client workstation will 
treat the mounted AFS filespace the same as a mounted local disk. That 
is, a file owned by user ID 501 in AFS will appear the same as a file on 
 a local disk owned by user ID 501.

If I want to create a new user in the cell, does this mean that I have to

first create a user in AFS
create a user on the user's workstation with the same UID/GID as the new 
AFS user?
If you use uss to create the user that may be true. But if you create 
the ptserver entry by hand
you can give the afs user his unix uid by specifying

pts createuser name -id uid



-Group IDs -- AFS uses negative group ID numbers. The Linux machines 
have no idea what to do with that -- they just read the group ID's as 0

-afs-modified login, etc. The documentation recommends using the afs 
modified login. In our case, that essentially means using pam for afs 
authentication, but as one poster has just pointed out, some 
applications like openssh don't always function properly with the afs 
pam module. What do you use in your installations? Is it better to just 
put klog in the login script?
We use pam and also a special slogin which transfers tokens from one 
machine to another.

-Hartmut

Thanks,

-Stephen-

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] blank view of an slavaged volume(offline-online) through its mount point

2004-01-19 Thread Hartmut Reuter
If this is a namei fileserver you can have a look into the 
vicep-Partition. If you give me the number of the
RW-volume I can tell you the path where the files belonging to this 
volume can be found.

Then you can do there a du to find out how much of the data still exist.

If you have luck it's only the root directory of the volume which is gone.

You also should run volinfo or MR-AFS's traverser to find out which 
files and directories
are expected from meta-data view of your volume.

Hartmut

Hongliang Gai wrote:
Hi Hartmut,
  The dump + restore is done. I made a mount point in my home directory
for this new volume. But it is still has nothing in that, even I restart
both client and server machine. All the other partions in the server are
fine, never have this problem. I examine the volume, it appears ok.
any further hint?
Thanks,

-Hongliang

On Fri, 16 Jan 2004, Hartmut Reuter wrote:


What says the SalvageLog?

Try to dump and restore the volume
vos dump volume 0 | vos restore server partition new volume
If it's not an AFS bug it easily could also be a bad disk!

Hartmut

Hongliang Gai wrote:

Hi All:
 I had experienced a problem with afs 1.2.2a on Redhat 7.0. One of my
user suddenly could not access its afs home directory.After I examined the
afs server, I found his volume appeared offline, though its backup volume
is online. I followed openafs doc to salvage the offline volume, then it's
back online. However, after the user cd to his home director, the
directory is empty. I tried remount the volume to the directory. It does
say that dir is the mount point to the volume. The size of the volume
shows as before, 3G. Looks like the volume is ok. But cannot let user see
it. How to fix it?
Thanks in advance,

-Hongliang
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos changeaddr -remove isnt removing an unreachableserver

2003-03-25 Thread Hartmut Reuter
vos remsite afsb2 a tools.java.sun131

will do it. It does only a change in the database without contacting the 
server afsb2.

Hartmut

Tim Prendergast schrieb:
It looks like all of our volumes have all of our afs servers listed. How
do I remove only the one from all of them so our releases work properly
again? I am trying to remove afsb2 from all of our volumes because our
releases all error out and do not complete to all the clients, citing
the fact that they cannot reach that server.
Example (we have 211 volumes):
tools.java.sun131 
RWrite: 536871140 ROnly: 536871443 Backup: 536871718 
number of sites - 4
   server afsp partition /vicepa RW Site 
   server afsp partition /vicepa RO Site 
   server afsb partition /vicepa RO Site 
   server afsb2 partition /vicepa RO Site 

Regards,
Tim Prendergast
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Hartmut Reuter
Sent: Monday, March 24, 2003 11:41 PM
To: Tim Prendergast
Cc: [EMAIL PROTECTED]
Subject: Re: [OpenAFS] vos changeaddr -remove isnt removing an
unreachable server
there is probably still a volume entry in the vldb which points to this 
server. Try a vos listvldb -server id to find out.

Hartmut

Tim Prendergast schrieb:

Hi,



I am trying to remove a server from our afs cell, but upon issuing the


following:

vos changeaddr x.x.x.x -remove



I get the following response:

Could not remove server x.x.x.x from the VLDB

VLDB: volume Id exists in the vldb



This command is listed in the admin's guide as the way to remove 
obsolete servers. Since this system is no longer reachable and gone 
forever, how can I remove it effectively? Thanks in advance.



Regards,

Tim Prendergast








--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Upgrading Transarc servers

2003-01-30 Thread Hartmut Reuter

Depends on the architectures your fileservers are running on. For Linux 
and NT Transarc had implemented the NAMEI-interface where you don't need 
a special fsck. Here everything should be ok.

On the other architectures I suggest to use also the NAMEI-interface for 
 some reasons:

1) You get rid of the special fsck (may be a problem with software RAIDs 
etc)

2) salvage of single volumes is much faster because all files of a 
volume group are under the same directory

3) you can dump or tar and restore partitions and you can see the files.

If you switch between the traditional mechanism and NAMEI you have to 
move the volumes by vos move because the NAMEI-fileserver does not 
understand the traditional partition and vice versa.

If you keep using the traditional mechanism you schould be able to just 
start the new binaries with the old partitions.

-Hartmut


Kevin Coffman wrote:
I think this is the case, but wanted to verify.

When upgrading from Transarc fileserver binaries to OpenAFS, there are 
no disk format changes?  Just swap out the binaries and go.  Correct?

Also, there is no need to change from the Transarc fsck program.  
Correct?

Thanks!
Kevin

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info


--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
	   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] Question about Large Files

2002-12-18 Thread Hartmut Reuter
Sven Oehme wrote:


has anybody ever created a file  2 G in a AFS Volume ? IBM AFS is not 
able to so .
is this possible with a Namei Fileserver on a Filesystem with Large file 
Support ?

Yes, we have files up to 20 GB

Presently this is possible only with the combination of MR-AFS servers 
and clients built from OpenAFS CVS. I know that rpi.edu is working on an 
OpenAFS fileserver with large file support. For details about the status
ask R. Lindsay Todd [EMAIL PROTECTED].

Hartmut Reuter


i have seen a lot of discussions already , but no clear answer ..

Sven



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
	   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] OpenAFS volumes filesystem

2002-11-12 Thread Hartmut Reuter

For the fileserver (with NAMEI-interface which is obligatory for Linux) 
you may take whatever you want. We are using reiserfs, other people 
ext3. ext2 has the disadvantage of the slow fsck if for some reason your 
system should crash.

Hartmut

yam wrote:
Hello,

I'm starting up an OpenAFS installation, and I've arrived to my first
dilema... What filesystem to use for openafs volumes?

Ext2? Ext3? ReiserFS? XFS?

Any hint? Shouldnt use any of the above? beter performance with any of
those? Only ext2 is the way to go?

Thanks in advance.

PD: Haven't found information about this anywhere.

/Yam


___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



--
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
	   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] AIX 5.1

2002-09-10 Thread Hartmut Reuter


Not yet, but I am working on it and I am nearly through.

Hartmut

Mark Campbell wrote:
 Is AIX 5.1 currently supported?
 
 Thanks
 
 Mark
 
  
 


-- 
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] 'vos dump VOLUME.backup' locks VOLUME, not VOLUME.backup!

2002-05-17 Thread Hartmut Reuter

Turbo Fredriksson wrote:
 
 I wrote a script to either backup a specific volume, or all
 volumes. It create the backup volume 'VOLUME.backup', and mounts
 that on 'MOUNTPOINT/OldFiles'. But it locks the VOLUME, not the
 expected VOLUME.backup... And I can't seem to unlock it.
 
 'vos unlock VOLUME' don't seem to work, and neither does 'vos unlockvldb'...
 
 The idea is to backup the user(s) volume, but still allowing rw access
 to the volume (incase a user is logged in). This is something I got from
 the 'manual', but it don't work as I had expected, what am I missing?
 
 It just released the lock, seems to be timed, took about 10 minutes or so...

In the first step namely cloning or recloning the backup volume the RW
volume is busy and cannot be accessed.

The second step namely dumping the backup volume lets the RW-volume
on-line, so access to the RW-volume should be possible.

The lock of the volume during both operations (clone/reclone and dump)
doesn't stop access to the RW volume. It is only necessary to prevent
concurrent volserver activities on this volume group.

Hartmut

 --
 munitions Legion of Doom iodine North Korea Cocaine World Trade Center
 fissionable bomb Khaddafi Cuba Serbian PLO explosion assassination
 Waco, Texas
 [See http://www.aclu.org/echelonwatch/index.html for more about this]
 ___
 OpenAFS-info mailing list
 [EMAIL PROTECTED]
 https://lists.openafs.org/mailman/listinfo/openafs-info

-- 
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301 
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] 'vos dump VOLUME.backup' locks VOLUME, not VOLUME.backup!

2002-05-17 Thread Hartmut Reuter

Turbo Fredriksson wrote:
 
  Hartmut == Hartmut Reuter [EMAIL PROTECTED] writes:
 
 Hartmut In the first step namely cloning or recloning the backup
 Hartmut volume the RW volume is busy and cannot be accessed.
 
 It's a small volume (163Mb), but it locks for about 10 minutes...
 Shouldn't the lock be ONLY for the number of seconds it takes for
 'vos backup' to finish?

The being busy and locked ends when the command vos backup has
finished.

 
 Hartmut The second step namely dumping the backup volume lets the
 Hartmut RW-volume on-line, so access to the RW-volume should be
 Hartmut possible.
 
 That don't seem to happen on my system... OR, the lock while cloning
 the volume isn't released when the volume is finished cloning...

Are you sure you are dumping really the backup volume not the RW-volume?

Hartmut
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301 
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] CopyOnWrite failed - orphaned files

2002-03-05 Thread Hartmut Reuter
 orphaned files and directories (approx. 587713 KB)
 02/28/2002 16:02:30 Salvaged cs.usr0.naveen (536877621): 4785 files, 587714 blocks
 
 SalvageLog:
 @(#) OpenAFS 1.2.3 built  2002-02-01 
 02/28/2002 16:05:06 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepc 
536877621 -orphans attach)
 02/28/2002 16:05:07 SALVAGING VOLUME 536877621.
 02/28/2002 16:05:07 cs.usr0.naveen (536877621) updated 02/28/2002 15:46
 02/28/2002 16:05:07 Attaching orphaned directory to volume's root dir as 
__ORPHANDIR__.3.207280
 02/28/2002 16:05:07 Attaching orphaned directory to volume's root dir as 
__ORPHANDIR__.11.46709
 02/28/2002 16:05:07 Attaching orphaned directory to volume's root dir as 
__ORPHANDIR__.13.7
 [ similar lines deleted ]
 02/28/2002 16:05:07 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.7846.198967
 02/28/2002 16:05:07 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.9320.219854
 02/28/2002 16:05:07 Vnode 1: link count incorrect (was 2, now 45)
 02/28/2002 16:05:07 Vnode 3: link count incorrect (was 2, now 3)
 02/28/2002 16:05:07 Vnode 11: link count incorrect (was 1, now 2)
 02/28/2002 16:05:07 Vnode 13: link count incorrect (was 7, now 8)
 02/28/2002 16:05:07 Vnode 15: link count incorrect (was 2, now 3)
 [ similar lines deleted ]
 02/28/2002 16:05:07 Vnode 9320: link count incorrect (was 0, now 1)
 02/28/2002 16:05:07 Salvaged cs.usr0.naveen (536877621): 4785 files, 587715 blocks
 02/28/2002 16:05:07 The volume header file V0536877622.vol is not associated with 
any actual data (deleted)
 
 
 The FileLog says errno 4, which is Interrupted System Call.  Could this
 be a clue?
 
 We have a lot of older machines of many architectures here running older
 client software.  Could that cause this problem?
 
 Is there anything I can do to instrument the servers to help find the
 root cause of the problem?  
 
 
 Another problem that may or may not be related:
 
 On these same machines, the fileserver processes can get into a state such
 that a bos restart, shutdown or stop will not kill them and neither will a
 simple kill pid.  You must do a kill -9 on each of the fileserver
 processes to make them go away.  In the case of a bos restart, new
 fileserver processes are started without the old ones having been killed,
 the new processes fail, causing the salvager to start.  When the salvager
 finishes, the fileserver processes are started again and fail again,
 leading to an endless fileserver-salvager-fileserver-salvager cycle.
 This has led to us disabling the 4:00 AM Sunday restarts.
 
 
 Thanks in advance for your help.
 
   ---Bob.
 --
 Bob Hoffman, N3CVL  University of Pittsburgh   Tel: +1 412 624 8404
 [EMAIL PROTECTED] Department of Computer Science Fax: +1 412 624 8854
 ___
 OpenAFS-info mailing list
 [EMAIL PROTECTED]
 https://lists.openafs.org/mailman/listinfo/openafs-info
 

-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301 
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



Re: [OpenAFS] Duplicate special inodes in volume header

2001-11-20 Thread Hartmut Reuter

Martin Schulz wrote:
 
 Hartmut Reuter [EMAIL PROTECTED] writes:
   Is there any way to copy the RO over the RW to get a working RW
   volume again?
 
  You can copy the RO to a RW volume by
 
  vos dump 536870916 0 | vos restore otherserver partition name -id
  536870915 -overwrite full
 
 Nice idea, but does not work:
 
 $ vos restore iwrsun1 /vicepa -name root.afs -id 536870915 -file 
/tmp/working_root_afs_RO
 The volume root.afs 536870915 already exists in the VLDB
 Do you want to do a full/incremental restore or abort? [fia](a): f
 Volume exists; Will delete and perform full restore
 Restoring volume root.afs Id 536870915 on server iwrsun1.mathematik.uni-karlsruhe.de 
partition /vicepa ..Failed to start transaction on 536870915
 Volume needs to be salvaged
 Error in vos restore command.
 Volume needs to be salvaged
 
  The salvager should be able to rebuild the volume-header in the case he
  finds all the volume special files. I have no idea how it was possible
  to get duplicate special inodes in the header,
 
 Nor do I;
 
  but - after you have
  dumped the RO-volume - you could try to remove the volume-header and
  then rerun the salvager.
 
 Hmm. I got a little closer to the problem. In the early days of this
 installation, I had some problems with the first server and switched
 over to another platform. This could be the reason for the following:
 
 In fact, the output from vos listvol and the vos listvldb were not
 consistent.
 
 vldb showed me:
 
 root.afs
 RWrite: 536870915 ROnly: 536870916 Backup: 536870917
 
 but listvol tells:
 
 root.afs  536870912 RW  4 K On-line
 root.afs.backup   536870917 BK  4 K On-line
 root.afs.readonly 536870916 RO  5 K On-line
  Could not attach volume 536870915 
 

Did you the restore to another server or to the one with the bad volume?
I don't know whether the volserver and/or the salvager are clever enough
to compare the names of the volumes they find. If so the 536870912 could
have caused that 536870915 with the same name could not be attached. 

In such a case you should use vos zap server partition volume-id
to get rid of the wrong volume. To do a remove using the volume name is
dangerous because vos asks the vldb to translate the volume name to the
volume-id and then proceeds with this one (removing possibly the wrong
volume).

One more reason to use the namei-Interface because you can easily see
what is really there and what is missing! But if you want to migrate to
namei you will have to move the volumes to another server, of course.

Hartmut Reuter
 
 By know, I was able to remove the 536870912 volume, and run vos
 syncvldb without errors. By now, the listvol does not mention this
 anymore but no other root.afs as well (the root.afs.backup and
 .readonly are shown and working, however).
 
 A try to restore as above yields:
 
 --
 Volume exists; Will delete and perform full restore
 Restoring volume root.afs Id 536870915 on server
 iwrsun1.mathematik.uni-karlsruhe.de partition /vicepa ..Failed to
 start transaction on 536870915
 Volume needs to be salvaged
 Error in vos restore command.
 --
 
 I cannot remove that volume:
 -
 $ vos remove iwrsun1 /vicepa 536870915
 Transaction on volume 536870915 failed
 Volume needs to be salvaged
 Error in vos remove command.
 -
 
 And trying to salvage yield the same messages as above.
 
   Any hints greatly appreaciated,
 
 Yours,
 --
 Martin Schulz [EMAIL PROTECTED]
 Uni Karlsruhe, Institut f. wissenschaftliches Rechnen u. math. Modellbildung
 Engesser Str. 6, 76128 Karlsruhe
 ___
 OpenAFS-info mailing list
 [EMAIL PROTECTED]
 https://lists.openafs.org/mailman/listinfo/openafs-info

-- 
-
Hartmut Reuter   e-mail [EMAIL PROTECTED]
   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)   fax   +49-89-3299-1301 
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-

___
OpenAFS-info mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-info



  1   2   >