Fwd: Re: [OpenAFS] openafs OSD
Sorry this mail should have gone also to the list! Original Message Subject: Re: [OpenAFS] openafs OSD Date: Thu, 30 May 2013 10:27:14 +0200 From: Hartmut Reuter reu...@rzg.mpg.de To: Staffan Hämälä s...@ltu.se Staffan, as I have told you we are migrating from TSM-HSM to HPSS in order to get rid of TSM-HSM. Five years ago we had a really bad experience with TSM-HSM running with GPFS: a so called reconcile job had started in the background and had removed millions of files from the TSM database. It never really has become clear what caused this behaviour. We had much work to restore everything from dumps also because this job ran for two or three days before it was detected, so new files already migrated to tape were not in the restored database... in 2007 I wrote an interface to dcache which can not be mounted as a file system and only can be called by library calls. This same technique I used later when I wrote the interface to HPSS which also works by calls to a shared library. Today the rxosd loads a shared interface library when started with the appropriate parameters which then itself contains the calls to the HPSS shared library. This generic interface could be used also for other non-mountable archive systems such as TSM. I was interested three years ago to do that for TSM, but never got the necessary documentation about the TSM-library and so nothing happened. A general problem with AFS-OSD is that I am the only person which knows the code and which feels responsible to keep it running. I am 68 years old and now retired, but still with good health and willing to work on it, but I think some one else should start to do this work in order to make this project run longer than other few years. I myself learnt these lessons 20 years ago when we had got MR-AFS from PSC (Pittsburgh Supercomputer Center) and half a year later the whole group of people who had worked on it left PSC. I became willing or not the only developer of MR-AFS for the next 15 years until I replaced it by AFS-OSD. There are many sites which principally are interested in AFS-OSD, but do not deploy it because they don't see a long term support for it. Even my own site RZG has planned a migration back to standard openafs for the next years unless some one else appears to take the support over. Hartmut Staffan Hämälä wrote: Hi, Yes, I'm intereseted in having a look. I'm primary checking out what can be done right now, and will start the real work on this after the summer. I heard from Ragge (Anders Magnusson, a former collegue of me) that he had talked with you at the AFS conference a few years ago. Apparently, you were interested in using the TSM API to connect to TSM's archive function as a backend. Do you know if any work has been done on this? If there isn't such a function today, would it be difficult to use TSM's archive function as backend? As Harald says on the list, the archive function would be less complex. I had thought we would use TSM-HSM for this, but if there is a better option available, we should have a look at this as well. I would prefer to use TSM to handle the tapes, as our tape robot is currently connected only to TSM. But, the TS3500 is capable of connecting to several applications at the same time, so that's not really a requirement. But I would like to have all tapes handled by TSM. :-) We currently use the archive function in TSM quite extensively. Our AFS montly / yearly backups are archived, as well as volumes for inactive users, etc. I'm also interested to know why you are migrationg from TSM-HSM to HPSS as a backend? /Staffan On 2013-05-27 17:10, Hartmut Reuter wrote: Hello Staffan, AFS-OSD is still alive and in use at two sites: RZG and PSI. At RZG we are using it with HPSS and TSM-HSM as HSM back-ends. We are in the process to copy all data from TSM-HSM over into HPSS and hope to get rid of TSM-HSM by the end of this year. Presently our HPSS contains already 8 million AFS-files with 729 TB of data. RZG's cell ipp-garching.mpg.de is also in the process to migrate from AFS-OSD 1.4 to AFS-OSD 1.6. The current source in git://github.com/hwr/openafs-osd.git is based on openafs 1.6.2. Unlike HPSS TSM_HSM does not require special interface routines because you can access the file-system directly by posix-calls. We are using GPFS along with TSM-HSM, but I suppose you also use other file-systems as well. If you are interested have a look at the current version and feel free to ask more. Hartmut Reuter Staffan Hämälä wrote: Hi, What's the current status of AFS-OSD? I've found a few presentations from 2009 and 2010, but nothing more recent. I haven't found anything about OSD and openafs 1.6. Is there any, more recent, information on how to implement openafs/OSD to connect to TSM's HSM module? /Staffan LTU Sweden ___ OpenAFS-info mailing list OpenAFS
Re: [OpenAFS] Re: bos blockscanner
When OpenAFS started in 2000 I pushed a lot of extensions into it to make it easier to maintain MR-AFS. I think at that time we already were the last site to use it. The next years the source code of MR-AFS could be reduced to the very specific parts on the server side while with these extensions for all other subdirectories simply the openafs code could be used. The scanner was a stand alone program which traversed the volume metadata to find out which files needed to get a copy elsewhere and which files were eligible for wiping (removing from disk). In extreme situations it could be helpful to stop the scanner... There are many other remains of MR-AFS in different source files of openafs which savely could be removed now that MR-AFS is out of service since five years. MR-AFS was shutdown finally in 2008. Since than we are running AFS/OSD which has a all the features MR-AFS had and some more. I am working on the project to bring the AFS/OSD change into the official openafs source code. -Hartmut Andrew Deason wrote: On Fri, 25 Jan 2013 19:50:41 +0100 (CET) Thorsten Alteholz open...@alteholz.de wrote: the command 'bos help' says something about 'blockscanner' and 'unblockscanner'. This shall start something like '/usr/afs/bin/scanner -block'. But I did not find anything about such scanner. Can anybody please shed some light on this? Is this a new fearture for the future? No, it is very old. I believe that is there to accommodate installations with MR-AFS. MR-AFS is for using AFS with HSM systems but is not freely available or open source or anything; you can google around for what little information exists about it. I'm not sure if it's still in use. Those rpcs hard code the commands they run, ugh... I don't have any familiarity with it, but from what I've seen in the OpenAFS source, I think the 'scanner' process is something that shuffled data between disk and tape. Normally it runs continuously or something, but those commands allow you to temporarily disable or enable migrations. -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: [OpenAFS-announce] OpenAFS 1.6.2 release candidate 3 available
You can't build openafs-1.6.2pre3 outside the source tree! Build ends with gcc -O -I/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11/src/config -I/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11/include -I../../../src/libafsauthent -I. -DAFS_PTHREAD_ENV -pthread -D_REENTRANT -D_LARGEFILE64_SOURCE -c ../../../src/libafsauthent/../volser/vsutils.c ../../../src/libafsauthent/../volser/vsutils.c:44:20: fatal error: volser.h: Datei oder Verzeichnis nicht gefunden compilation terminated. make[3]: *** [vsutils.o] Fehler 1 make[3]: Leaving directory `/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11/src/libafsauthent' make[2]: *** [libafsauthent] Fehler 2 make[2]: Leaving directory `/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11' make[1]: *** [build] Fehler 2 make[1]: Leaving directory `/home/hwr/tarfiles/openafs-1.6.2pre3/amd64_sles11' make: *** [all] Fehler 2 -Hartmut Stephan Wiesand wrote: The OpenAFS 1.6 Release Managers announce that release candidate 1.6.2pre3 has been tagged in the OpenAFS source repository, available at: git://git.openafs.org/openafs.git as tag: openafs-stable-1_6_2pre3 . Source files and available binaries can be accessed via the web at: http://www.openafs.org/release/openafs-1.6.2pre3.html or http://dl.openafs.org/dl/candidate/1.6.2pre3/ or via AFS at: UNIX: /afs/grand.central.org/software/openafs/candidate/1.6.2pre3/ UNC: \\afs\grand.central.org\software\openafs\candidate\1.6.2pre3\ Among many fixes and enhancements, this release candidate includes support for Linux kernels up to 3.7, OS X 10.8 and recent Solaris releases. This is believed to be very close to 1.6.2 final. The Kerberos related changes mentioned in the last announcement are not yet ready, and will probably be part of the next stable release. Please assist us by deploying this release and providing positive or negative feedback. Bug reports should be filed to openafs-b...@openafs.org . Reports of success should be sent to openafs-info@openafs.org . Paul Smeddle and Stephan Wiesand, 1.6 Branch Release Managers for the OpenAFS Release Team ___ OpenAFS-announce mailing list openafs-annou...@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-announce -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Running OpenAFS on top of GPFS?
One of the big advantages of GPFS is that it is a fast cluster filesystem. With normal AFS you can not make use of this feature because the fileserver doesn't share his partitions with anyone else. So the fileserver would use GPFS just as his local fileystem instead of XFS or something else. GPFS is good for large files while many many files in AFS are small. In our cell with 800 TB and 200 million files 92 % of the files are smaller than 1 MB which is a reasonable block size for GPFS. However, since some years a special version of AFS called AFS/OSD exists which allows to store large files in object storage. This object storage are servers running a program called rxosd. The idea is to keep the small files in the fileserver's partition where the volume resides and have the large files in object storage. Now the point why GPFS is of special interest here: the GPFS used by the rxosd could be shared by all the compute nodes in a cluster and the modified AFS client would allow users on these compute nodes to read and write data form and to the AFS files located inside the GPFS rxosd partition directly with nearly the native GPFS speed (200-300 MB/s depending on the network being used). User's outside the cluster would see the files as normal AFS files and access them with the normal low transfer rate of AFS. I gave a talk about this some years ago: Embedded Filesystems (Direct Client Access to Vice Partitions) Talk at AFS Kerberos Best Practice Workshop 2009, Stanford, 2009, which you can download from http://www.rzg.mpg.de/~hwr/Stanford.pdf; If you wan't to know more about this, feel free to contact me. Hartmut Reuter Craig Strachan wrote: Dear All, The Central Computing Service at Edinburgh University is introducing a new University wide filesystem intended for research based data. We in Informatics have been asked about the possibility of us using of some of this new file space to either expand our existing cell or (more likely) set up a new cell for the whole University to use. Unfortunately, this new research file system is based on GPFS and so this would involve us running AFS on top of GPFS. Does anyone on this list have experience of running AFS on top of GPFS which they would be willing to share with us? Failing that, would anyone like to make an educated guess as to the problems we are likely to encounter if we try this? Any advice would be appreciated, Craig. --- Craig Strachan, Computing Officer, School of Informatics, University of Edinburgh -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Using AFS dB clone
If you want do avoid that your secondary db server with the lowest IP-address ever becomes sync site it's correct to use -clone for it. However, in case your main server goes down you will loose the sync site until it is back up if you don't have a another non-clone db-server. -Hartmut Reuter Tom Mukunnemkeril wrote: I currently have just one AFS database server and was considering setting up another machine to just be a clone for back up purposes. I was looking at the documentation for bos addhost and it indicates under the -clone option that this should be used with caution. Are there any issues I need to watch out for to set this machine as a clone. Additionally, this machine has a lower IP Address and I don't want it to be considered a sync site. I'm currently running openafs 1.6.0 on my client/servers and linux kernel 2.6.38.8 on Slackware 13.37 machines (64 bit). Bos addhost Page: http://docs.openafs.org/Reference/8/bos_addhost.html Based on discussion about 1.4.x quorum election: https://lists.openafs.org/pipermail/openafs-info/2011-October/037050.html Tom Mukunnemkeril ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Doubts about OpenAFS implementation in a company
Stanisław Kamiński wrote: First of all, hi to everyone - it's my first own topic here :-) I'm working for a company ~1000 ppl, three offices in Poland and three other in bordering countries. OpenAFS was introduced about 6 years ago, when the company was quite a bit smaller, and the guy that did this left no documentation and some of his design decision are making me scratch my head - that's part of the reason I'm writing this. Other things that are important: - about 2/3 of users work on Linux (CentOS) workstations, and their homedirs are served from AFS - 1/3 are Windows users - Polish offices are connected using at least 10 Mbit symmetric links, but the offices abroad might have much less. In one particular example, the link is assymmetric 10/1 Mbit (d/u) - there is single AFS cell covering all the offices - every office has it's own db and fileserver (Debian 5/6) - we rely on our partner to assign IP address space for us - net result is that the weakest link location (10/1) has the lowest IP and there _nothing_ we can do about it The last thing causes Ubik elections to constantly choose the server located on the weakest link as sync site. This can be changed by making the slow database server with lowest ip-address a clone. Aclone never can become sync site and his votes do not count. Use bos removehost dbserver slowdbserver for all dbservers and then bos addhost debserver slowdbserver -clone on all dbservers and restart everywhere the database instances. Also, we quite often have to move user volumes between different offices - we've got quite a bit of rotation between them, say some 10-20 ppl per week. Now, I've been assigned to improve AFS performance in any way possible. It was very bad, then I changed server parameters to tune it to large server options - that yield enormous speedup, but I still believe I can get much more from the system. There are two things that are, ahem, not as fast as one would like. The worse one is directory traversal - moving between levels of directories can take 5-10 seconds (on a workstation with 1 Gbit link to AFS server in its location). The other one is the upload/download speed itself - last time I measured, windows client d/u was 2/5 MB/s - I think I can get more than that. As I'm currently making my way through Managing AFS by Richard Campbell, I'm not yet fully up-to-speed on OpenAFS inner workings and such. Right now I only want to ask: is the design of our AFS system correct? Or did the guy introducing it made some short-sighted projections which don't hold water in current environment (as described). I'm talking here about single-cell design - although I'm not sure it's easy to move volumes between different cells. Other thing I'm worried about: can it be that having the sync site on slowest uplink causes everything to slow down? Is there any way to get some measurements for this? Thanks for reading all of this and not falling asleep :-) And waiting for you comments, Stan ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Large files with 1.6.0pre2
Ryan C. Underwood wrote: I am having trouble copying a large file (6GB) from a volume located on one server to a volume located on another server. After about 2GB (2147295232 bytes to be exact), the volume gets offlined and marked needs salvage. I have reproduced this reliably several times. This large file and the volume it sits on was created with 1.4.x, while the destination volume was created with 1.6.x, has something changed with large file support on 32-bit builds perhaps? What says the VolserLog on the receiving side? There were at least architectures (AIX) where you need explicitly to allow large files with ulimit. - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Problem with Off-line volumes...unable to bring On-line
Looks like a crash of the salvager. The SalvageLog should end differently with the summary line for the RW-volume. Are there any core files in /usr/afs/logs? If not, make sure ulimit for core file size isn't set to 0 and retry. You also could run the salvager by hand under gdb to see why it crashes. You need then to add the -debug flag to prevent it from forking. E.g. gdb /usr/afs/bin/salvager ... (gdb) run /vicepb 536871656 -debug Good luck, Hartmut McKee, Shawn wrote: Hi Everyone, I am having a problem with one of my OpenAFS file servers. About ½ of the volumes are “Off-line” and I am unable to bring them online. First some system info and then I will list problem details and what I have tried. The system is running Scientific Linux 5.5/x86_64 (basically CentOS 5.5 64-bit). The openafs rpms are: [atums2:~]# rpm -qa | grep openafs openafs-kpasswd-1.4.12-6.cern openafs-client-1.4.12-6.cern kernel-module-openafs-2.6.18-194.3.1.el5-1.4.12-5.cern openafs-1.4.12-6.cern kernel-module-openafs-2.6.18-194.8.1.el5-1.4.12-5.cern openafs-krb5-1.4.12-6.cern kernel-module-openafs-2.6.18-238.1.1.el5-1.4.12-6.cern openafs-server-1.4.12-6.cern The version of ‘e2fsprogs’ is 1.39 The system has an ext3 1TB partition for AFS: [atums2:~]# df /vicepb Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 1007931664 635382472 321349196 67% /vicepb The system has 931 volumes and only 470 are On-line while 461 are Off-line: [atums2:~]# vos listvol atums2 Total number of volumes on server atums2 partition /vicepb: 931 chamber.OLD_eml4a07 536872814 RW 8634169 K Off-line chamber.OLD_eml4a07.readonly 536872815 RO 8634169 K On-line chamber.OLD_eml4a09 536872817 RW 702642 K Off-line chamber.OLD_eml4a09.readonly 536872818 RO 702642 K On-line … Total volumes onLine 470 ; Total volumes offLine 461 ; Total busy 0 I have run ‘bos salvage’ on the partition multiple times. I have restarted the system. I have run a force fsck.ext3 check on the underlying partition (no problems found). Only RW volumes are Off-line. All RO volumes are On-line. There are a few RW volumes On-line (8 out of 469) but the rest won’t come On-line. Here is a particular volume which is Off-line: [atums2:~]# vos examine chdata.sn chdata.sn 536871656 RW 598 K Off-line atums2.cern.ch /vicepb RWrite 536871656 ROnly 0 Backup 0 MaxQuota 1000 K Creation Fri May 26 04:02:49 2006 Copy Wed Oct 11 12:35:42 2006 Backup Sun Jun 11 00:30:10 2006 Last Access Fri Jan 7 16:38:32 2011 Last Update Wed Apr 4 15:29:42 2007 0 accesses in the past day (i.e., vnode references) RWrite: 536871656 ROnly: 536871657 RClone: 536871657 number of sites - 3 server atums1.cern.ch partition /vicepi RO Site -- Old release server atums2.cern.ch partition /vicepb RW Site -- New release server atums2.cern.ch partition /vicepb RO Site -- New release Try to bring online: [atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn The FileLog shows: Sun Jan 23 22:57:03 2011 GetBitmap: addled vnode index in volume chdata.sn; volume needs salvage Sun Jan 23 22:57:03 2011 VAttachVolume: error getting bitmap for volume (/vicepb//V0536871656.vol) Try to Salvage: [atums2:~]# bos salvage atums2 /vicepb chdata.sn Starting salvage. bos: salvage completed The SalvageLog shows: [atums2:~]# tail /usr/afs/logs/SalvageLog @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656 01/23/2011 22:58:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepb 536871656) 01/23/2011 22:58:19 2 nVolumesInInodeFile 64 01/23/2011 22:58:19 CHECKING CLONED VOLUME 536871657. 01/23/2011 22:58:19 chdata.sn.readonly (536871657) updated 04/04/2007 15:29 01/23/2011 22:58:19 Partially allocated vnode 2 deleted. Try again: [atums2:~]# vos online -server atums2 -partition /vicepb -id chdata.sn FileLog has the same message: Sun Jan 23 22:59:05 2011 GetBitmap: addled vnode index in volume chdata.sn; volume needs salvage Sun Jan 23 22:59:05 2011 VAttachVolume: error getting bitmap for volume (/vicepb//V0536871656.vol) Salvage attempt again: [atums2:~]# bos salvage atums2 /vicepb chdata.sn Starting salvage. bos: salvage completed [atums2:~]# tail /usr/afs/logs/SalvageLog @(#) OpenAFS 1.4.12 built 2010-12-13 1928681 19919656 01/23/2011 23:00:07 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepb 536871656) 01/23/2011 23:00:07 2 nVolumesInInodeFile 64 01/23/2011 23:00:07 CHECKING CLONED VOLUME 536871657. 01/23/2011 23:00:07 chdata.sn.readonly (536871657) updated 04/04/2007 15:29 01/23/2011 23:00:07 Partially allocated vnode 2 deleted. Same result as if the prior salvage didn’t do anything. This is exactly what happens on other volumes I have tried to bring online. So how would I fix this? Any suggestions for how to get the rest of these volumes On-line? Let me know if you need further details. Thanks, Shawn -- - Hartmut Reuter e-mail reu
Re: [OpenAFS] Locked Volume
Angela Hilton wrote: Hi I don't normally make requests of the lists but my colleague (Owen leBlanc) is on leave. I have a volume that has become locked. I'm not certain of the reason for this or even how to trace the problem. There are actually 2 locked volumes volume.name and volume.name.readonly. There is a cron job that usually runs to releases the read/write to the read only (I've paused this for the time being) I realise that I can issue vos unlock -id volume.name BUT, I am unsure if there are any potential problems that this could cause. Can anyone offer any advice? In this case I normally do an 'vos examine' for the volume and then e 'vos status ' for the fileserver vos exameine showed me that the RW-volume lives on. If that server doesn't show any activity on behalf of the volume I think it's save to unlock the volume. Such a situation may be caused by a volserver crash or a killed vos command or a network outage during a vos command. Hartmut Reuter TIA Angela Thanks Angela -- -- Angela Hilton Web Infrastructure Coordinator Infrastructure Applications IT Services Division The University of Manchester G50 Kilburn Building, Oxford Rd Manchester M13 9PL t: +44(0)161 275 8335 e: angela.hil...@manchester.ac.uk ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Proposed changes for server log rotation
chas williams - CONTRACTOR wrote: On Thu, 02 Dec 2010 22:22:11 -0500 Michael Meffiemmef...@sinenomine.net wrote: The key point is that currently some sites may be relying on weekly restarts and the current rename from FileLog to FileLog.old to avoid filling a disk partition. I think a more sensible approach in long term, for sites that choose to log to regular files, is to just let the server append, and let the modern log rotate tool of your choice deal with the log rotation. perhaps the absolute minimum would be to implement a signal that causes the log files to be closed and reopened just like a restart. this could be issued weekly via bosserver to emulate the restart behavior. people want new behavior like syslog, would need opt in and change command line params (eventually switch to this as the default). This signal already exists: kill -HUP is executed by a cron job at our site each midnight to close all server logs. If you set mrafsStyleLogs you also get nice date and time suffixes instead of .old. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
Simon Wilkinson wrote: Yep, this is what's happening in the trace Achim provided, too. Every 4k we write the chunk. I'm not sure how that's possible unless something is closing the file a lot, or the cache is full of stuff we can't kick out. Actually, it's entirely possible. Here's how it all goes wrong... When the cache is full, every call to write results in us attempting to empty the cache. On Linux the page cache means that we only call write once for each 4k chunk. However, our attempts to empty the cache are a little pathetic. We just attempt to store all of the chunks of the file currently being written back to the fileserver. If it's a new file there is only one such chunk - the one that we are currently writing. As chunks are much larger than pages, and when a chunk is dirty we flush the whole thing to the server, this is why we see repeated writes of the same data. The process goes something like this: *) Write page at 0k, dirties first chunk of file. *) Discover cache is full, flush first chunk (0-1024k) to the file server *) Write page at 4k, dirties first chunk of file *) Cache is still full, flush first chunk to file server *) Write page at 8k, dirties first chunk of file ... and so on. The problem is that we don't make good decisions when we decide to flush the cache. However, any change to flush items which are less active will be a behaviour change - in particular, on a multi-user system it would mean that one user could break write-on-close for other users simply by filling the cache. The problem here ist that afs_DoPartialWrite is called with each write. Normally it gets out without doing anything, but if the percentage of dirty chunks is to high it triggers a background store. However, this can happen multiple times before the background job starts executing. Therefore I introduced in AFS/OSD a new flag bit CStoring which is switched on when the background task is submitted and switched off when it's done. And during that time no new background stores are scheduled for this file. Hartmut Cheers, Simon. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: bonnie++ on OpenAFS
Achim Gsell wrote: On Nov 23, 2010, at 12:15 AM, Simon Wilkinson wrote: On 22 Nov 2010, at 23:06, Achim Gsell wrote: 3.) But if I first open 8 files and - after this is done - start writing to these files sequentially, the problem occurs. The difference to 1.) and 2.) is, that I have these 8 open files while the test is running. This simulates the putc-test of bonnie++ more or less: AFS is a write-on-close filesystem, so holding all of these files open means that it is trying really hard not to flush any data back to the fileserver. However, at some point the cache fills, and it has to start writing data back. In 1.4, we make some really bad choices about which data to write back, and so we end up thrashing the cache. With Marc Dionne's work in 1.5, we at least have the ability to make better choices, but nobody has really looked in detail at what happens when the cache fills, as the best solution is to avoid it happening in the first place! Sounds reasonable. But I have the same problem with a 9GB disk-cache, a 1GB disk-cache, 1GB mem-cache and a 256kB mem-cache: I can write 6 GB pretty fast then performance drops to 3MB/s ... We are always using memcache with only 64 or 256 MB, but I have seen this problem, too. I think it's on the server side: Today's server have a lot of memory and the data are written into the buffers first. Only when the buffers rach the limit the operating system starts to really sync them out to the disks. And with this huge amount of buffers you regularly see for some time the performance going down. I suppose that during the sync the fileserver's writes are hanging. Hartmut So long Achim ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] pts: Permission denied ; unable to create user admin
You are probably creating a new cell, right? If so, you may run bos setauth localhost -authrequired off -localauth on your database server(s). If you do that in a living cell it's a security risk, off course. With bos setauth machine -authrequired on you can then go back to the secure mode. Hartmut Reuter fosiul alam wrote: He every one, I need help to solved this issue I am following bellow link to afs server http://docs.openafs.org/QuickStartUnix/ch02s15.html I believed i followed every steps but when i am trying to create user pts createuser -name admin -noauth pts: Permission denied ; unable to create user admin i cant create user admin What i am missing ?? I was comparing with another website.. where i saw few commands are different then openafs website example : http://redflo.de/tiki-index.php?page=Configure+AFS+Server bos createdopey.redflo.de http://dopey.redflo.de buserver simple /usr/lib64/openafs/buserver -cellredflo.de http://redflo.de -noauth bos createdopey.redflo.de http://dopey.redflo.de ptserver simple /usr/lib64/openafs/ptserver -cellredflo.de http://redflo.de -noauth bos createdopey.redflo.de http://dopey.redflo.de vlserver simple /usr/lib64/openafs/vlserver -cellredflo.de http://redflo.de -noauth but openafs documentation say : **# ./bos createmachine name buserver simple /usr/afs/bin/buserver -noauth # ./bos createmachine name ptserver simple /usr/afs/bin/ptserver -noauth # ./bos createmachine name vlserver simple /usr/afs/bin/vlserver -noauth So openafs website does not use -cell cellname paremeter .. anyway .. i know what ever in openafs website have to true. so can any one please tel me what i am doing wrong ?? thanks for your help and patiences Fosiul * * -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule
Russ Allbery schrieb: I'm aware of the following (largish) things that we want to deprecate or remove: * --enable-fast-restart and --enable-bitmap-later are earlier attempts to solve the problem that is solved in a more complete way by demand attach. Demand attach will be available in 1.6 but not enabled by default. These two options will conflict with demand-attach; in other words, you won't be able to enable either of them and demand attach at the same time. At the point at which we make demand attach the default, rather than optional behavior, I believe we should remove the code for these two flags. I think that should be for either 1.10 or 2.0 based on experience with running 1.6 in production. In the meantime, please be aware that most of the developers don't build with those flags by default and the code is not heavily tested. This code is not enabled by default, so if you're not compiling yourself and passing those flags to configure, you're not using this and don't need to worry about it. Without --enable-fast-restart after a fileserver crash the salvager used to salvage all volumes in all partitions before the start of the fileserver. On large fileservers this could take hours and sometimes the salvager went out of memory and crashed himself leaving still volumes not attachable. With the Demand Attach Fileserver (DAFS) this initial salvage is not necessary any more, however, each volume which was not cleanly detached before gets salvaged in the background. This is a nice feature which allows the most demanded volumes to come up soonly, I hope, but still salvaging will take hours because it's the same amount of work that has to be done. When I looked into the SalvageLog after a fileserver or machine crash I found out that except the increment of the next uniquifier nearly never anything important happened. Therefore I wrote many years ago the code to skip the automatic salvage and sent it to openafs in 2001. Right now I am working on the integration of rxosd into 1.5.74 (which represents more or less the actual git master). I enabled for us again to have both options (fast restart and demand attach) in parallel because my feeling is that a crash of a heavily used large fileserver with only demand attach still will be a pain for a rather long time. Hartmut - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Crash in volserver when restoring volume from backup.
Anders Magnusson wrote: Hi, I have a problem that I need some advice on how to go on with. I have a volume dump file, but when trying to read it back volserver crashes. The dump was generated under 1.4.8, and the volserver segv appears with both 1.4.8 and 1.4.11. VolserLog.old says: Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver -p 16) Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421 (students.waqazi-4) created BosLog says: Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11 Any hints where to go from here? I can provide the dump file on request, but since it's a student home directory I don't want it to be public. -- Ragge Attach the volserver with gdb before running the command. Then you may be able to see where it crashes and why. You also could try to analyze the dump file with dumptool which is built under sudirectory src/tests. Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Crash in volserver when restoring volume from backup.
Anders Magnusson wrote: Hartmut Reuter wrote: Anders Magnusson wrote: Hi, I have a problem that I need some advice on how to go on with. I have a volume dump file, but when trying to read it back volserver crashes. The dump was generated under 1.4.8, and the volserver segv appears with both 1.4.8 and 1.4.11. VolserLog.old says: Wed Aug 26 14:56:34 2009 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver -p 16) Wed Aug 26 15:00:12 2009 1 Volser: CreateVolume: volume 537998421 (students.waqazi-4) created BosLog says: Wed Aug 26 15:00:14 2009: fs:vol exited on signal 11 Any hints where to go from here? I can provide the dump file on request, but since it's a student home directory I don't want it to be public. -- Ragge Attach the volserver with gdb before running the command. Then you may be able to see where it crashes and why. Done: # gdb /usr/afs/bin/volserver GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu... (gdb) symb Discard symbol table from `/usr/afs/bin/volserver'? (y or n) y No symbol file now. (gdb) help file Use FILE as program to be debugged. It is read for its symbols, for getting the contents of pure memory, and it is the program executed when you use the `run' command. If FILE cannot be found as specified, your execution directory path ($PATH) is searched for a command of that name. No arg means to have no executable file and no symbols. (gdb) help symb Load symbol table from executable file FILE. The `file' command can also load symbol tables, as well as setting the file to execute. (gdb) symb volserver.debug Reading symbols from /usr/afs/bin/volserver.debug...done. (gdb) attach 30720 Attaching to program: /usr/afs/bin/volserver, process 30720 Reading symbols from /lib64/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 0x2ab6a9d3be90 (LWP 30720)] [New Thread 0x4dd02940 (LWP 30740)] [New Thread 0x4d301940 (LWP 30739)] [New Thread 0x4c900940 (LWP 30738)] [New Thread 0x4beff940 (LWP 30737)] [New Thread 0x4b4fe940 (LWP 30736)] [New Thread 0x4aafd940 (LWP 30735)] [New Thread 0x4a0fc940 (LWP 30734)] [New Thread 0x496fb940 (LWP 30733)] [New Thread 0x48cfa940 (LWP 30732)] [New Thread 0x482f9940 (LWP 30731)] [New Thread 0x478f8940 (LWP 30730)] [New Thread 0x46ef7940 (LWP 30729)] [New Thread 0x464f6940 (LWP 30728)] [New Thread 0x45af5940 (LWP 30727)] [New Thread 0x450f4940 (LWP 30726)] [New Thread 0x446f3940 (LWP 30725)] [New Thread 0x43cf2940 (LWP 30724)] [New Thread 0x432f1940 (LWP 30723)] [New Thread 0x428f0940 (LWP 30722)] [New Thread 0x41eef940 (LWP 30721)] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libresolv.so.2...done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x003c13c0a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) c Continuing. [New Thread 0x4e703940 (LWP 30748)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x4aafd940 (LWP 30735)] 0x003c13078d60 in strlen () from /lib64/libc.so.6 (gdb) bt #0 0x003c13078d60 in strlen () from /lib64/libc.so.6 #1 0x00430092 in afs_vsnprintf (p=0x4aafc3ba 4BF+0, avail=999, fmt=value optimized out, ap=0x4aafc7c0) at ../util/snprintf.c:395 #2 0x00416a60 in vFSLog ( format=0x467838 1 Volser: ReadVnodes: IH_CREATE: %s - restore aborted\n, This is the message the volserver wanted to write into the VolserLog. Unfortunately afs_error_message(errno) didn't return a usable string so it came to the crash. However, if you repeat that experiment you may say up 4 to get to dumpstuff.c:1214 and then you can do a print *vnode and print vnodeNumber to see which vnode it is. A possible reason why IH_CREATE could fail is that you already tried this so many times that in the linktable of the volume all tags for this vnode number are already in use. Because of the crashes the volserver didn't remove the remains of the unsuccessful restores. args=0x4) at ../util/serverLog.c:135 #3 0x0042550e in Log ( format=0x1311e7ec Address 0x1311e7ec out of bounds) at ../vol/common.c:41 #4 0x0040e60c in RestoreVolume (call=value optimized out, avp=0x2c03e780, incremental=value optimized out, cookie=value optimized out) at ../volser/dumpstuff.c:1214 #5 0x004067f4 in VolRestore (acid=0x41ff510, atrans=value optimized out, aflags=1, cookie=0x4aafd000
Re: [OpenAFS] Resilience
Wheeler, JF (Jonathan) wrote: One of our (3) AFS servers has a mounted read-write volume which must be available 24x7 to our batch system. The server is as resilient is we can make it, but still it may fail outside normal working hours for some reason. For technical reasons related to the software installed on the volume it is not possible to use read-only volumes mounted from our other servers (the software must be installed and served from the same directory name), so I have devised the following plan in the event of a failure: a) create read-only volumes on the other 2 servers, but do not mount them; use vos release whenever the software is updated b) in the event of a failure of server1 (which has the rw volume), drop the existing mount and mount one of the read-only volumes (we can live with the read-only copy whilst server1 is being repaired/replaced) in its place. Can anyone see problems with that scenario ? We could use vos convertROtoRW; how would that affect the process ? The problem with convertROtoRW is that a dying fileserver doesn't send callbacks to the client as would happen when you move the RW-volume to another place. So you will have to do a fs checkvol on all clients to make sure they don't wait forever for the broken server, but use instead the newly created RW-volume. Our backup strategy is completely based on the possibility to do convertROtoRW. CRON jobs on the batch worker do the fs checkvol once in a while... Hartmut Jonathan Wheeler e-Science Centre Rutherford Appleton Laboratory -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] solaris 10 versions supporting inode fileservers
David R Boldt wrote: We use Solaris 10 SPARC exclusively for our AFS servers. After upgrading to 1.4.10 from 1.4.8 we had a very few volumes that started spontaneously going off-line, recovering, and then going off-line again until they needed to be salvaged. Hearing that this might be related to inode, we moved these volumes to a set of little use fileservers that were running namei at 1.4.10. It made no discernible difference. Two volumes in particular accounted for 90% of our off-line volume issues. FileLog: Mon Apr 27 10:56:09 2009 Volume 2023867468 now offline, must be salvaged. Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged. Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged. Mon Apr 27 10:56:22 2009 fssync: volume 2023867469 restored; breaking all call backs (restored vol above being R/O for R/W in need of salvage) That's interesting: I saw similar behavior on some of our volumes, however, with AFS/OSD fileservers. I then made the ViceLog messages more eloquent and found out that this always happened when IH_OPEN failed. This can fail if the handle in the vnode is missing. To prevent that I added some lines in VGetVnode_r when an already existing vnode structure is found to check whether the handle is in place and if not do a new IH_INIT (and write a message into the log). I found about 100 cases per day in our cell, but not all of them would have ended in taking the volume off-line because in many cases the handle never would have been used (All the GetStatus RPCs). Since then I never again saw volumes going off-line. Hartmut Both of the volumes most frequently impacted have content completely rewritten roughly every 20 minutes while being on an automated replication schedule of 15 minutes. One of them 25MB, the other 95MB, both at about 80% quota. We downgraded just the fileserver binary to 1.4.8 on all of our servers and have not seen a single off-line message in 36 hours. -- David Boldt dbo...@usgs.gov -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Todd DeSantis wrote: Hi Rainer - Hi Hartmut :: Yes, of course, but what error changed the MaxVolumeId in the vleserver is still completely unclear. BTW also we had a giant jump in the volume ids some years ago, but fortunately it was not big enough to reach the sign bit. The MaxVolumeId can be changed several ways, via a vos restore and I believe a vos syncvldb or syncserv. Most likely, the initial jump was via the vos restore command. [src] vos restore -h Usage: vos restore -server machine name -partition partition name -name nam e of volume to be restored [-file dump file] [-id volume ID] [-overwrite a bort | full | incremental] [-cell cell name] [-noauth] [-localauth] [-verbose ] [-timeout timeout in seconds ] [-help] If you use the [-id volume ID] and have a typo in the volume ID, the volumeID for the volume will be out of normal sequence and this will set the MaxVolumeID to this large number. Also, I believe that a vos syncvldb or syncserv will check the volumeIDs it is playing with and will check it against the MaxVolumeID and raise MaxVolumeID if necessary. I think when we saw this happen to an AFS cell, we gave the customer a tool to reset the MaxVolumeID to a more manageable number and they restored the volumes and gave them lower IDs. Thanks Todd DeSantis Thank you Todd, when this happened the 1st time I hexedited a copy of the vldb and reset the maxVolumeId. Then having seen that the database version was still the same I just copied my modified database over the actual one. But the second time it happened I had already so many volumes with high numbers that I resigned. Since then we live with these numbers... I always had suspected ubik to have produced the jump but what you say vos restore or vos sync looks much more probable. Hartmut -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Jeffrey Altman wrote: Giovanni Bracco wrote: I want to point out that in the past the issue of volumes with too large ID emerged also in our cell (enea.it). At that time (2002) we still had AFS Transarc and the support provided us with a patched AFS version, able to operate with volumes having too large IDs. Before migrating to OpenAFS we had to recover the normal AFS behaviour and the procedure we did at that time (2005) was described at AFS Kerberos Best Practices Workshop 2005 in Pittsburgh http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsburgh2005.pdf. At that time it was not clear the reason of the initial problem. Do have I to assume that now it has been identified? Giovanni This just goes to show that giving a talk at a workshop is not equivalent to submitting a bug report to openafs-b...@openafs.org. If this issue had been submitted to openafs-bugs, it would have been addressed a long time ago. The problem is quite obvious. Some of the volume id variables are signed and others are unsigned. A volume id is a volume id and the type used to represent it must be consistent. Yes, of course, but what error changed the MaxVolumeId in the vleserver is still completely unclear. BTW also we had a giant jump in the volume ids some years ago, but fortunately it was not big enough to reach the sign bit. Hartmut Jeffrey Altman ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
What says the VolserLog on the source server? -Hartmut McKee, Shawn wrote: Hi Everyone, I am having a problem trying to 'vos move' volumes after losing/restoring an AFS file server. The server that was lost has been restored on new hardware. The old RW volumes were moved to other servers (convertROtoRW) and now I want to use the 'vos move' command to move them back. Here is what happens (I have tokens as 'admin'. Linat07 is the current RW home for OSGWN and Linat08 is the new server): vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose Starting transaction on source volume 536874901 ... done Allocating new volume id for clone of volume 536874901 ... done Cloning source volume 536874901 ... done Ending the transaction on the source volume 536874901 ... done Starting transaction on the cloned volume 2681864210 ... Failed to start a transaction on the cloned volume2681864210 Volume not attached, does not exist, or not on line vos move: operation interrupted, cleanup in progress... clear transaction contexts Recovery: Releasing VLDB lock on volume 536874901 ... done Recovery: Accessing VLDB. move incomplete - attempt cleanup of target partition - no guarantee Recovery: Creating transaction for destination volume 536874901 ... Recovery: Unable to start transaction on destination volume 536874901. Recovery: Creating transaction on source volume 536874901 ... done Recovery: Setting flags on source volume 536874901 ... done Recovery: Ending transaction on source volume 536874901 ... done Recovery: Creating transaction on clone volume 2681864210 ... Recovery: Unable to start transaction on source volume 536874901. Recovery: Releasing lock on VLDB entry for volume 536874901 ... done cleanup complete - user verify desired result [linat08:local]# vos examine 2681864210 Could not fetch the entry for volume number 18446744072096448530 from VLDB I am assuming the large cloned volume ID is causing the problem as opposed to an inability to create a cloned volume. I can make replicas on linat08 for existing volumes without a problem. NOTE: The hex representations of the cloned volume from the move attempt above and the 'vos examine': [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]# 18446744072096448530 = 0x 9FDA0012 Any suggestions? This seems like a 64 vs 32 bit issue. Here is the information on servers and versions: We have 3 AFS DB servers: Linat02 - RHEL5/x86_64 - OpenAFS 1.4.7 Linat03 - RHEL4/i686- OpenAFS 1.4.6 Linat04 - RHEL5/x86_64 - OpenAFS 1.4.7 We have 3 AFS file servers: Linat06 - RHEL4/x86_64 - OpenAFS 1.4.6 Linat07 - RHEL4/x86_64 - OpenAFS 1.4.6 Linat08 - RHEL5/x86_64 - OpenAFS 1.4.8 Info on OSGWN volume: [linat08:~]# vos examine OSGWN OSGWN 536874901 RW 505153 K On-line linat07.grid.umich.edu /vicepf RWrite 536874901 ROnly 18446744072096448530 Backup 0 MaxQuota200 K CreationTue Mar 3 03:43:06 2009 Copy Mon Dec 3 16:39:21 2007 Backup Never Last Update Sat Feb 21 15:18:05 2009 0 accesses in the past day (i.e., vnode references) RWrite: 536874901 ROnly: 536874902 number of sites - 2 server linat07.grid.umich.edu partition /vicepf RW Site server linat06.grid.umich.edu partition /vicepe RO Site Let me know if there is other info required to help resolve this. Thanks, Shawn McKee University of Michigan/ATLAS Group ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Connection timed out?
Robbert Eggermont wrote: L.S., We are evaluating OpenAFS for use with 50 clients. One of the tests is a kernel build on 50 clients at the same time. During this test we encounter 'Permission denied' errors, which seem to coincide with 'kernel: afs: failed to store file (110)' entries in /var/log/messages. 110=Connection timed out. The fileserver is busy but responsive, about 25 builds (out of 50) complete normally. We are running 1.4.8 client server, kernel 2.6.18 64-bits. Currently all server processes run on the same server. Fileserver settings: /usr/afs/bin/fileserver -p 128 -b 512 -l 3072 -s 3072 -vc 3072 -cb 65536 -busyat 1536 -rxpck 1024 -nojumbo What are we doing wrong (except for the way we test;-))? Regards, Robbert My feeling is that here the famous new (with 1.4.8) idleDead mechanism plays a role. It would be interesting whether the same happens on 1.4.7 clients or not. Hartmut smime.p7s Description: S/MIME Cryptographic Signature
Re: [OpenAFS] move RO volume
Chaz Chandler wrote: You could do it another way: vos dump from source partition and vos restore to destination. But generally the easiest way is addsite/remsite. Note that vos remsite is the command to remove an RO volume, not vos remove. This is not true: vos remsite lets the vldb forget about the RO, but it doesn't remove it from the partition. If you end up with the same RO on more than one partition on your server only the first one (depending on the order in which the partitions get attached) will come on-line. The other one(s) remain off-line. Hartmut The order doesn't matter, but you would vos addsite the volume on the destination partition and (optionally) vos remsite the volume on the source partition. It's generally best to keep your RW and RO volume on the same partition if disk space is an issue. Also, since you can have more than one RO volume per server, depending on what you're trying to accomplish you may not even need to do this. -Original Message- From: l...@lwilke.de Sent: Mon, 2 Feb 2009 11:03:55 +0100 To: openafs-info@openafs.org Subject: [OpenAFS] move RO volume Hi, Just a quick question, is it correct, that it is not possible to move a RO volume from one partition to another partition on the *same* server without doing a vos remove, addsite, release? Using OAFS 1.4.7. Thanks --lars ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: [OpenAFS-devel] interface for vos split
Why wouldn't vos do the same thing that fs getfid currently does to get the vnode? fs getfid is in the cache manager. Since vos is user-space, it doesn't have access to the same routines. An RPC to the volserver was suggested as the best way to handle that. My follow-up question on #3 is primarily to ask if that's how we want it handled, and if we want to expose that interface via a vos command. In other words, would a command like: vos getvnode -volume $vol -relative_path path be useful. Of course I had thought a little about the best user interface when designing the current syntax of vos split. In order to not mix volserver interfaces and cache manager/fileserver interfaces I decided to do it the most simple way. You always can write a wrapper running on a machine with AFS mounted like this: #!/bin/sh # # split volume newvolume splitpath # vnode=`fs getfid $3 | awk -F\. '{print $2}'` vos split $1 $2 $vnode I think before reinventing a cache manager within the volserver this is a much easier approach. -Hartmut It should, of course, double-check that the directory name given is within the volume that one is splitting. Definitely. -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AFS without Kerberos headache
Harald Barth wrote: In fact what I need ideally is a file system like NFS just with the added features needed to use it in a Metropolitan Network setup, i.e. local caching of files. As an added feature, I hope you want to have control who wrote a file. AFS seems to do this in a good way, but Kerberos is a constant annoyance to it. I do have machines that generate simulation data and have to work for weeks. If I like to do this with the current OpenAFS setup, I'll have to log in once a day and refresh the damn Kerberos token :-(. You can have longer timed tickets and tokens. You can save tickets in keytabs. If your hosts have keytabs, you can use them to generate tickets from. You can have system:anyuser write if you want to mimic NFS ;) And you can create pts groups based on IP-addresses and give such a group permissions in the ACL. That's less horrible than giving system:anyuser write access. But after you have done this you have to wait quite a while until the fileserver has re-evaluated those IP-groups (typically 2 hours) before they work. Hartmut Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Add new device to the cell
Jesus arteche wrote: hey, Someone knows how can i add a new device to the same volumme?, i mean... I had /afs/mycell/my_directory...created and mounted in /vicepa...and I added a hd in /vicepb...and I want to mount IN /afs/mycell/my_directory...so if my 1º hd was 90GB and my 2º hd is 90GB...the result would be /afs/mycell/my_directory 180Gb thanks You can't. You can create a new volume in the new partition and mount that volume inside your other one. Typically the data in a cell consist of thousands of volumes mounted as a tree. Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] replicas and mount points
Vladimir Konrad wrote: Hello, When I was creating mount points in our cell, I did not ask specifically for -rw (read/write) mount point. Not understanding (at the beginning) how the afs client works (preference for read only volumes), after adding replicas and releasing the volumes, the clients could not write any-more. Is there a way (other than re-doing the mount points with -rw) to make client prefer the read/write volumes (when replicas exist)? No, as long as all volumes in the path have RO-replicas Also, is it true, that if I specify -rw to fs mkmount, this will stick (not change) and the mount point will remain read/write even when replication sites for the volume exist? Its true: mounted with -rw you always get the RW-volume Kind regards, Vlad Please access the attached hyperlink for an important electronic communications disclaimer: http://www.lse.ac.uk/collections/secretariat/legal/disclaimer.htm ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS groups distrubution under windows client
Has nothing to do with Windows. Generally the membership of a user in groups is evaluated only once when the connection to the fileserver gets established. A new token enforces a new connection. So whenever you add some one to a group or remove some one from a group that has an effect only after the user has reauthenticated unless he didn't have any connection to the fileserver before. Hartmut Lars Schimmer wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi! Just to ask/be sure: User a is online under windows, OpenAFS client 1.5.51 and got a token, browsing the OpenAFS filespace. User a try to access a directory without the propper right, got no access and mourn at me. I set the User a into the correct group to access that directory. But even after 1h or 2h, User a still cannot access that directory. But if User a destroy token 10 min after I added him to the right group and obtain a new token, he could access the dir right afterwards. How long does it take under windows til the right group information is distributed? Or is this a bug? MfG, Lars Schimmer - -- - - TU Graz, Institut für ComputerGraphik WissensVisualisierung Tel: +43 316 873-5405 E-Mail: [EMAIL PROTECTED] Fax: +43 316 873-5402 PGP-Key-ID: 0x4A9B1723 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkiSwMsACgkQmWhuE0qbFyNmDwCdG/XVzrkuaunP62HBMIGErj8b j6EAn1tmAf/tQcsjrT++9ekiSsSALa4h =+IdP -END PGP SIGNATURE- ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Serious trouble, mounting /afs, ptserver, database rebuilding
kanou wrote: My logs on the second machine tell me: == /var/log/openafs/FileLog.old == Wed Jul 23 19:03:37 2008 File server starting Wed Jul 23 19:03:37 2008 afs_krb_get_lrealm failed, myserver2. Wed Jul 23 19:03:37 2008 VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=4) code 5376 means no quorum elected. Are you sure your database servers are all running? Try udebug server 7002 for the ptserver and udebug server 7003 for the vldb Wed Jul 23 19:03:37 2008 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=267275. == /var/log/openafs/SalvageLog == 07/23/2008 19:08:27 SALVAGING OF PARTITION /vicepa COMPLETED and aklog gives me: aklog: Couldn't get hrf.uni-koeln.de AFS tickets: aklog: Cannot contact any KDC for requested realm while getting AFS tickets damn! i did not do anything on that second one! Just to make sure you're working on the correct file: As I understand you first deleted the file /var/lib/openafs/db/ prdb.DB0. This file was then probably recreated when you restarted the ptserver. Run this command on the backupfile you made first (or better on a copy of the backup file). T/Christof From: [EMAIL PROTECTED] [openafs-info- [EMAIL PROTECTED] On Behalf Of kanou [EMAIL PROTECTED] Sent: Wednesday, July 23, 2008 6:46 PM To: openafs-info@openafs.org Subject: Re: [OpenAFS] Serious trouble, mounting /afs, ptserver, database rebuilding Thanks for your answer. Well I found the file prdb_check. It doesnt print any errors. Only thing I can find is with ./prdb_check -database /var/lib/openafs/db/prdb.DB0 -uheader -verbose this line: Ubik header size is 0 (should be 64) So there are no errors! I can start the server and everything runs fine but the machine wont mount /afs! kanou Am 23.07.2008 um 17:26 schrieb Steven Jenkins: On Wed, Jul 23, 2008 at 10:51 AM, kanou [EMAIL PROTECTED] wrote: Hello, well, there is a file called db_verify.c in the folder /usr/src/modules/openafs/ptserver but I don' know how to build it. If I recall correctly, db_verify gets renamed to 'prdb_check' during the install, so you should check for the existence of that file. If you can't find it, you'll need to build it from source code: the directions on the AFSLore wiki are a good place to start: http://www.dementia.org/twiki/bin/view/AFSLore/HowToBuildOpenAFSFromSource If you have problems building openafs-stable-1_4_x, you could get openafs-stable-1_4_7 instead, as that is the latest official release. Once you have built the tree, src/ptserver/db_verify should get built, so you can simply copy it out of the source tree for your use. If it doesn't get built automatically for you, you can cd into src/ptserver and do a 'make db_verify' manuall. Also, feel free to ask for help here or on the irc channel. Steven Jenkins End Point Corporation http://www.endpoint.com/ ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos dump (different afs server versions)
Vladimir Konrad wrote: Hello, Is it possible to dump volumes on debian woody and restore them on debian etch? Vlad Please access the attached hyperlink for an important electronic communications disclaimer: http://www.lse.ac.uk/collections/secretariat/legal/disclaimer.htm ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info All OpenAFS fileservers should be interoperable. Only if you have some servers with large file support and others without it you may be unable to move volumes from those with large file support to the others. The operating system, hardware, endianess, and what else should make no difference. Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Speed difference between OpenAFS 1.4.x on Debian and CentOS
Michał Droździewicz wrote: Derrick Brashear pisze: Not what I expected. When you self-compiled 1.4.6 on Debian, I assume you downloaded a tarfile from OpenAFS and did ./configure; make, yes? What options, if any, to configure? I've build a debian package using default debian options (1.4.6) and I've compiled from source with no options for ./configure except from --prefix In both cases the result was the same - slow speed around 8-12MiB (copying from local disk to AFS structure) Are you sure your network interface is used in GBit/s mode with Debian and not just 100MBit-mode? This could easily explain the low throughput. Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Jeffrey Altman wrote: Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! What the AssertionFailed means is that no one has written code to deal with a case where this error has occurred. It can't be fixed with Salvager until someone writes the missing code. Of course, but for the user it might be better to skip handling of this error and to continue with the next vnode. So he could get back at least the damaged volume and copy whatever is still accessible. So John, ifdef line 3175 and recompile. If this was a single bad vnode your volume may come online again, otherwise it's probably lost anyway. Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Jeffrey Altman wrote: Hartmut Reuter wrote: Jeffrey Altman wrote: Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! What the AssertionFailed means is that no one has written code to deal with a case where this error has occurred. It can't be fixed with Salvager until someone writes the missing code. Of course, but for the user it might be better to skip handling of this error and to continue with the next vnode. So he could get back at least the damaged volume and copy whatever is still accessible. So John, ifdef line 3175 and recompile. If this was a single bad vnode your volume may come online again, otherwise it's probably lost anyway. Hartmut I disagree. The reason that assert is there is that continuing will cause more damage to the data. We do not know based upon the available data whether this is a single bad vnode or whether perhaps the wrong file is being reference for the SmallVnodeFile. What is known is that one vnode, perhaps the first vnode examined has completely valid data except for the fact that it is in the wrong file. There are several issues that are worth pursuing here. Especially because whatever the problem is has begun occurring on multiple machines: 1. what is the actual damage that has taken place? 2. can the damage be correct? 3. can the damage be avoided in the first place? What is the cause? Jeffrey Altman Of course we should not remove the assert() forever, but just for the test of this volume which otherwise probably will be lost anyway. In MR-AFS we had a -nowrite option to do just a dry-run. I admit that it's a lot work to implement this, but some times it is very helpful. Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Jeffrey Altman wrote: Hartmut Reuter wrote: Jeffrey Altman wrote: Hartmut Reuter wrote: So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! What the AssertionFailed means is that no one has written code to deal with a case where this error has occurred. It can't be fixed with Salvager until someone writes the missing code. Of course, but for the user it might be better to skip handling of this error and to continue with the next vnode. So he could get back at least the damaged volume and copy whatever is still accessible. So John, ifdef line 3175 and recompile. If this was a single bad vnode your volume may come online again, otherwise it's probably lost anyway. Hartmut I disagree. The reason that assert is there is that continuing will cause more damage to the data. We do not know based upon the available data whether this is a single bad vnode or whether perhaps the wrong file is being reference for the SmallVnodeFile. What is known is that one vnode, perhaps the first vnode examined has completely valid data except for the fact that it is in the wrong file. There are several issues that are worth pursuing here. Especially because whatever the problem is has begun occurring on multiple machines: 1. what is the actual damage that has taken place? 2. can the damage be correct? 3. can the damage be avoided in the first place? What is the cause? Jeffrey Altman Of course we should not remove the assert() forever, but just for the test of this volume which otherwise probably will be lost anyway. In MR-AFS we had a -nowrite option to do just a dry-run. I admit that it's a lot work to implement this, but some times it is very helpful. I just saw -nowrite exists also in OpenAFS only that the bos command claims it would be possible only in MR-AFS. So one could at least run the salvager under the debugger with -nowrite Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos convertROtoRW requires salvage ?
eastside.cs 83 % vos listvldb root.cell /usr/afsws/etc/vos: Connection timed out eastside.cs 84 % /usr/afs/bin/vos listvldb root.cell root.cell RWrite: 536870915 ROnly: 536870916 number of sites - 2 server solomons.cs.uwm.edu partition /vicepa RO Site -- Not released server eastside.cs.uwm.edu partition /vicepa RW Site eastside.cs 85 % /usr/afs/bin/vos addsite eastside a root.cell Added replication site eastside /vicepa for volume root.cell eastside.cs 86 % /usr/afs/bin/vos release root.cell Released volume root.cell successfully eastside.cs 87 % fs checkv usage: /usr/openwin/bin/xfs [-config config_file] [-port tcp_port] eastside.cs 88 % /usr/afsws/bin/fs checkv /usr/afsws/bin/fs: Connection timed out eastside.cs 89 % /usr/afs/bin/fs checkv All volumeID/name mappings checked. eastside.cs 90 % /usr/afsws/bin/fs checks All servers are running. eastside.cs 91 % vos listvldb root.cell root.cell RWrite: 536870915 ROnly: 536870916 number of sites - 3 server solomons.cs.uwm.edu partition /vicepa RO Site server eastside.cs.uwm.edu partition /vicepa RW Site server eastside.cs.uwm.edu partition /vicepa RO Site ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)
Jeffrey Altman wrote: John Tang Boyland wrote: OK I compiled the salvager with debugging and without optimization. filip# /opt/SUNWspro/bin/dbx salvager.debug For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc Reading salvager.debug Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libintl.so.1 Reading libdl.so.1 Reading libc.so.1 (dbx) run /vicepa -debug -parallel 1 Running: salvager.debug /vicepa -debug -parallel 1 (process id 3491) [after three hours, I pressed return] Thu Apr 3 14:14:20 2008: Assertion failed! file vol-salvage.c, line 3175. signal ABRT (Abort) in __lwp_kill at 0xfee21157 0xfee21157: __lwp_kill+0x0007: jae __lwp_kill+0x15[ 0xfee21165, .+0xe ] Current function is AssertionFailed 48 abort(); (dbx) where [1] __lwp_kill(0x1, 0x6), at 0xfee21157 [2] _thr_kill(0x1, 0x6), at 0xfee1e8c9 [3] raise(0x6), at 0xfedcd163 [4] abort(0x804694a, 0x47f52c8c, 0x6854, 0x70412075, 0x33202072, 0x3a343120), at 0xfedb0ba9 =[5] AssertionFailed(file = 0x808b724 vol-salvage.c, line = 3175), line 48 in assert.c [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c [7] SalvageVolume(rwIsp = 0x9ab0130, alinkH = 0x9ac0de8), line 3346 in vol-salvage.c [8] DoSalvageVolumeGroup(isp = 0x9ab0130, nVols = 1), line 2104 in vol-salvage.c [9] SalvageFileSys1(partP = 0x80bacd8, singleVolumeNumber = 0), line 1357 in vol-salvage.c [10] SalvageFileSys(partP = 0x80bacd8, singleVolumeNumber = 0), line 1192 in vol-salvage.c [11] handleit(as = 0x80a9340), line 687 in vol-salvage.c [12] cmd_Dispatch(argc = 6, argv = 0x80aaba8), line 902 in cmd.c [13] main(argc = 5, argv = 0x8047650), line 845 in vol-salvage.c (dbx) up Current function is DistilVnodeEssence 3175 assert(class == vLarge); (dbx) list 3170,3180 3170 vep-type = vnode-type; 3171 vep-author = vnode-author; 3172 vep-owner = vnode-owner; 3173 vep-group = vnode-group; 3174 if (vnode-type == vDirectory) { 3175 assert(class == vLarge); 3176 vip-inodes[vnodeIndex] = VNDISK_GET_INO(vnode); 3177 } 3178 } 3179 } 3180 STREAM_CLOSE(file); So what is the value of 'class' if not vLarge? As you can see from that line above it's vSmall: [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino = 21977313U, maxu = 0x8046bc4), line 3175 in vol-salvage.c So there might be really some thing wrong with the SmallVnodeFile, but to do an AssertionFailed is not the best way to repair it! Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Max partition and Volume size
Rich Sudlow wrote: I wanted to double check - is the max partition and volume size both 2 TB? Are there any plans on increasing the partition size in the near future? Thanks, Rich Partition size doesn't matter too much any more when you use OpenAFS + object storage (see best practice workshop in May). But maximum volume size right now is limited by the int32 to count the volume's blocks and also the disk quota. If you don't set a disk quota you may also today end up with bigger volumes. Our biggest has 6.3 TB, but the number of blocks you see with vos examine is modulo 4 TB, of course. I admit: this is not what we wanted, but it happened. Hartmut ~: vos ex aug-shotfiles.archive aug-shotfiles.archive 536892900 RW -1728398890 K On-line afs4.bc.rzg.mpg.de /vicepx 23830 files RWrite 536892900 ROnly 536892901 Backup 0 MaxQuota 0 K, osd flag1 CreationFri Oct 11 10:43:06 1996 CopyWed Feb 6 11:32:01 2008 Backup Never Last Update Tue Apr 1 10:13:09 2008 160693 accesses in the past day (i.e., vnode references) RWrite: 536892900 ROnly: 536892901 number of sites - 3 server afs4.bc.rzg.mpg.de partition /vicepx RW Site server afs16.rzg.mpg.de partition /vicepx RO Site server afs4.bc.rzg.mpg.de partition /vicepx RO Site ~:vos traverse afs4 -id aug-shotfiles.archive servers: afs4 File Size RangeFiles % run % Data % run % 0 B - 4 KB 240 1.01 1.01 399.045 KB 0.00 0.00 4 KB - 8 KB9 0.04 1.04 50.339 KB 0.00 0.00 8 KB - 16 KB 31 0.13 1.17 395.975 KB 0.00 0.00 16 KB - 32 KB 79 0.33 1.511.851 MB 0.00 0.00 32 KB - 64 KB 87 0.37 1.873.821 MB 0.00 0.00 64 KB - 128 KB 92 0.39 2.268.427 MB 0.00 0.00 128 KB - 256 KB 105 0.44 2.70 17.480 MB 0.00 0.00 256 KB - 512 KB 137 0.57 3.27 53.902 MB 0.00 0.00 512 KB - 1 MB 189 0.79 4.07 143.429 MB 0.00 0.00 1 MB - 2 MB 1281 5.38 9.441.816 GB 0.03 0.03 2 MB - 4 MB 1154 4.84 14.283.210 GB 0.05 0.08 4 MB - 8 MB 1567 6.58 20.868.989 GB 0.14 0.22 8 MB - 16 MB 1647 6.91 27.77 18.796 GB 0.29 0.50 16 MB - 32 MB 1869 7.84 35.61 42.181 GB 0.64 1.15 32 MB - 64 MB 2161 9.07 44.68 96.362 GB 1.47 2.62 64 MB - 128 MB 1997 8.38 53.06 181.517 GB 2.77 5.40 128 MB - 256 MB 3272 13.73 66.79 606.521 GB 9.27 14.66 256 MB - 512 MB 3754 15.75 82.551.319 TB 20.66 35.32 512 MB - 1 GB 2261 9.49 92.041.496 TB 23.42 58.74 1 GB - 2 GB 1898 7.96 100.002.635 TB 41.26 100.00 Totals:23830 Files6.389 TB Storage usage: --- 1 local_disk965 files 226.115 MB arch. Osd 4 raid63898 objects13.796 GB arch. Osd 5 tape22858 objects 6.387 TB Osd 8 afs16-a47 objects32.475 GB Osd10 afs4-a260 objects 206.205 GB arch. Osd13 hsmgpfs 510 objects 463.637 GB Osd19 mpp-fs10-a 8 objects 7.143 GB Osd32 afs8-a 31 objects21.075 GB --- Total 28577 objects 7.115 TB Data without a copy: --- if !replicated: 1 local_disk965 files 226.115 MB arch. Osd 4 raid6 4 objects 3.625 MB arch. Osd 5 tape18413 objects 5.879 TB Osd10 afs4-a 3 objects 1.682 GB --- Total 19385 objects 5.881 TB ~: -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Lost RW Volume Recovery?
Robert Sturrock wrote: Hi all. I'm not sure how, but we have lost the RW volume for our cell.user (a structural volume under which live user home areas). After a bit of searching, I found this thread that describes a possible recovery method involving dump/restoring from an RO and then salvaging: http://www.openafs.org/pipermail/openafs-info/2002-December/007228.html I tried this method and it _seemed_ to work, but I'm still having problems accessing the volume after remounting it. A quick rundown on what I did: $ vos dump cell.user.readonly cell.user.dump $ vos restore hermes2 a cell.user -verbose cell.user.dump Restoring volume cell.user Id 536870918 on server hermes2.its.unimelb.edu.au partition /vicepa .. done Updating the existing VLDB entry --- Old entry --- cell.user ROnly: 536870919 number of sites - 2 server hermes1.its.unimelb.edu.au partition /vicepa RO Site server telos.its.unimelb.edu.au partition /vicepa RO Site --- New entry --- cell.user RWrite: 536870918 ROnly: 536870919 number of sites - 3 server hermes1.its.unimelb.edu.au partition /vicepa RO Site server telos.its.unimelb.edu.au partition /vicepa RO Site server hermes2.its.unimelb.edu.au partition /vicepa RW Site Restored volume cell.user on hermes2 /vicepa $ bos salvage hermes2 a cell.user Starting salvage. bos: salvage completed $ bos salvage hermes2 a cell.user Starting salvage. bos: salvage completed .. but now the problem is as follows: $ fs mkmount /afs/.athena.unimelb.edu.au/user cell.user [ so far, so good .. but .. ] $ ls -ld user ls: user: Connection timed out $ ls -l total 14 drwxrwxrwx 5 root root 2048 Nov 14 15:36 arch drwxrwxrwx 5 root root 2048 Feb 19 09:33 devlp drwxrwxrwx 2 root root 2048 Jan 23 14:44 group drwxrwxrwx 3 root root 2048 Oct 25 12:15 project drwxrwxrwx 4 root root 2048 Oct 15 11:13 pub drwxrwxrwx 2 root root 2048 Feb 20 12:38 tmp ?- ? ?? ?? user drwxrwxrwx 2 root root 2048 Oct 4 21:06 www Any pointers as to where I go from here? The only thing I can think of is that there may be some caching going on which in some way is still looking for the old RW volume. One alternative might be to vos convertROtoRW, but I suspect that would leave me with the same problem to solve. Regards, Robert. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info You need a fs checkvol on the client because the disappearing of the old volume didn't the callbacks needed to provoke a new vldb lookup on the clients. The same problem you have after a vos convertROtoRW Hartmut -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Rebuild vldb?
Jerry Normandin wrote: Hi My directories in /afs/dafca.com/home Are giving me a connection timed out for accessing most of them. Before this happened access to vldb through vos was slowing down. Will a sync fix this or any ideas? connection timed out looks like one of your fileservers could be down. If the volume wouldn't be in the vldb you would get no such device. Only then a sync between filserver and vldb can help. If you do vos listvldb volume name does it give the correct answer? -Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Server crash
Steve Devine wrote: We had a server crash this morning and I after bringing it back up I am unable to get a vos listvol back from it. Or it is taking a very long time. These partitions are greater than 150 G . Its been running for 30 minutes now and nothing back yet. The server (Version 1.4.2 ) is compiled with fast restart and I am trying to see what vols are Off-line. Is there any other way to find out what vols are off line? /sd What says the FileLog like? -Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Migration of DB servers
Xavier Canehan wrote: We'll have a complete site shutdown next week. I'm planning to use this opportunity to physically replace 2 afsdb servers. Same name and IP but host and OpenAFS version upgrades. Considering that clients and fileservers will be down, I guess that I'll just have to pass through the steps of managing the change on the afsdb servers. Am I right or is there some tracking of DB hosts by the fileservers that I should care of ? Should be ok. The fileservers don't have any tracking of DB servers. As logn as you don't change the IP-addresses nobody should see any difference. Hartmut Thanks, Xavier -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.5 namei on solaris 9 sparc requires AlwaysAttach for vice partitions
I also was today surprised when I started the freshly compiled 1.4.5 fileserver on Solaris and it didn't attach any partition. There was a change between 1.4.4 and 1.4.5 in favour of zfs, but a unfortunately broken: /* Ignore non ufs or non read/write partitions */ if ((strcmp(mnt.mnt_fstype, ufs) != 0) || (strncmp(mnt.mnt_mntopts, ro,ignore, 9) == 0)) #else (strcmp(mnt.mnt_fstype, ufs) != 0) #endif || (strncmp(mnt.mnt_mntopts, ro,ignore, 9) == 0)) continue; was changed to /* Ignore non ufs or non read/write partitions */ /* but allow zfs too if we're in the NAMEI environment */ if ( #ifdef AFS_NAMEI_ENV (((!strcmp(mnt.mnt_fstype, ufs) strcmp(mnt.mnt_fstype, zfs || (strncmp(mnt.mnt_mntopts, ro,ignore, 9) == 0)) continue; } #else continue; #endif The ! in front of strcmp in the new version lets him exactly ignore ufs. Just remove it! Hartmut Jason Edgecombe wrote: Hi all, In my sordid saga to get a Sun fibre channel array working well with AFS, I found the following: When I upgraded the server to 1.4.5 namei, the fileserver would not mount the /vicep? partitions without doing a touch /vicep?/AlwaysAttach first. These are dedicated partitions on separate hard drives. I'm using a source-compiled openafs on solaris 9 sparc. openafs was compiled with the following options: CC=/opt/SUNWspro/bin/cc YACC=yacc -vd ./configure \ --enable-transarc-paths \ --enable-largefile-fileserver \ --enable-supergroups \ --enable-namei-fileserver \ --with-krb5-conf=/usr/local/krb5/bin/krb5-config We're using MIT kerberos 1.4.1 on the clients fileservers with a 1.6.x KDC # mount | grep vicep /vicepa on /dev/dsk/c0t0d0s6 read/write/setuid/intr/largefiles/logging/xattr/onerror=panic/dev=1d80006 on Thu Nov 29 13:03:15 2007 /vicepd on /dev/dsk/c0t3d0s6 read/write/setuid/intr/largefiles/logging/xattr/onerror=panic/dev=1d80016 on Thu Nov 29 13:03:15 2007 /vicepc on /dev/dsk/c0t2d0s6 read/write/setuid/intr/largefiles/logging/xattr/onerror=panic/dev=1d8001e on Thu Nov 29 13:03:15 2007 /vicepb on /dev/dsk/c0t1d0s6 read/write/setuid/intr/largefiles/xattr/onerror=panic/dev=1d8000e on Thu Nov 29 13:03:15 2007 # grep vicep /etc/vfstab /dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /vicepa ufs 3 yes - /dev/dsk/c0t1d0s6 /dev/rdsk/c0t1d0s6 /vicepb ufs 3 yes - /dev/dsk/c0t2d0s6 /dev/rdsk/c0t2d0s6 /vicepc ufs 3 yes - #cat SalvageLog @(#) OpenAFS 1.4.5 built 2007-11-28 11/29/2007 09:52:59 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager) 11/29/2007 09:52:59 No file system partitions named /vicep* found; not salvaged Does anyone know why this would be happening? Thanks, Jason ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] File systems on Linux, again.
Smith, Matt wrote: After the recent thread openafs upgrade from 1.4.1 to 1.5.7, and a review of a thread[1] from July, I'm wondering if there is a definitive recommendation for which file system to use on Linux AFS file servers. Ext3, XFS, JFS, something else? Thanks all, -Matt [1] http://www.openafs.org/pipermail/openafs-info/2007-July/026798.html We are using exclusively xfs since many years. It is performant and you can enlarge partitions on the fly doing lvextend and xfs_growfs. Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Best practice: inode or namei fileserver?
Jason Edgecombe wrote: Hi all, We are currently running inode-based fileservers on solaris 9. I stumbled across the fact that solaris 9 -9/05HW makes logging the default on UFS. I know that the AFS finode-based fileserver cannot work with a logging filesystem. Does the namei filesystem play nice with logging filesystems? Yes Going forward, which format is recommended, inode or namei? Namei has another advantage: if you salvage a single volume it's not necessary to read all inodes, but only those pseudo-inodes (file names) under the subdirectory belonging to the volume group. This is much faster. An overhead traversing the AFSIDat-tree to open a file certainly exists, but I suppose it is neglectible compared to the advantages. -Hartmut I'm wondering if I should slowly migrate to namei. Thanks, Jason ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] VL_RegisterAddrs rpc failed (code=5376, err=22)
[EMAIL PROTECTED] wrote: cannot get around this problem - see error messages from FileLog: Tue Sep 25 16:15:21 2007 File server starting Tue Sep 25 16:15:21 2007 afs_krb_get_lrealm failed, using freakout.de. Tue Sep 25 16:15:21 2007 VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=22) Tue Sep 25 16:15:21 2007 Set thread id 14 for FSYNC_sync Tue Sep 25 16:15:21 2007 VInitVolumePackage: beginning single-threaded fileserver startup Tue Sep 25 16:15:21 2007 VInitVolumePackage: using 1 thread to attach volumes on 1 partition(s) Tue Sep 25 16:15:21 2007 Partition /vicepa: attaching volumes Tue Sep 25 16:15:22 2007 Partition /vicepa: attached 4 volumes; 0 volumes not attached Tue Sep 25 16:15:22 2007 Set thread id 15 for 'FiveMinuteCheckLWP' Tue Sep 25 16:15:22 2007 Set thread id 16 for 'HostCheckLWP' Tue Sep 25 16:15:22 2007 Set thread id 17 for 'FsyncCheckLWP' Tue Sep 25 16:15:22 2007 Getting FileServer name... Tue Sep 25 16:15:22 2007 FileServer host name is 'bongo' Tue Sep 25 16:15:22 2007 Getting FileServer address... Tue Sep 25 16:15:22 2007 FileServer bongo has address 192.168.7.68 (0x4407a8c0 or 0xc0a80744 in host byte order) Tue Sep 25 16:15:22 2007 File Server started Tue Sep 25 16:15:22 2007 i tried to follow the tips from different articles using NetInfo and NetRestrict - none worked. i tried to check code=5376, err=22 from the source-code but i cannot get enough information to find the source of the problem. use translate_et 5376 : ~: translate_et 5376 5376 (u).0 = no quorum elected ~: So your database servers haven't head a quorum when the fileserver started. Registration of a fileserver requires a write operation on the vlserver which is only possible on the sync site. Without a quorum the sync site can't be elcted... Do a udebug database-server 7003 to all your database servers to find out what happens! Hartmut Reuter Please help i'm lost on this one. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Tuning openafs write speed
Kai Moritz wrote: Hi folks! I would like to try tuning the speed of my openafs installation, but the only information I could google is this rather old thread (http://www.openafs.org/pipermail/openafs-info/2003-June/009753.html) and the hint to use a big cache-partition. For comparison I've created files with random data and different size (1MB, 2MB, 4MB, 8MB, 16MB, 32MB, 64MB and 128MB) on my local disk. I copied them into AFS and then I copied then to the same disk on the same host via scp (without compression). I've done that 10 times and computed the average. For the 1MB file AFS ist slightly faster then scp (factor 0,89). For the 2 and the 4MB file AFS needs about 1,4 of the time scp needs. For the 8, 16 and 32MB the factor is about 2,7 and for the 64 and the 128MB file it is about 3,3. I've already tried bigger cache-partitions, but it does not make a difference. Are there tuning parameters, which tell the system a threshold for the size of files, beyond which data won't be written to the cache? Greetings Kai Moritz What are your data rates in MB/s? If you are on a fast network (Gbit Ethernet, Inifiband ...) a disk cache may be remarkably slower than the network. In this case memory cache can help. Another point is chunk size. The default (64 KB) is bad for reading where each chunk is fetched in a separate RPC. with disk cache bigger chunks (1 MB) can be recommanded, anyway. For memory cache of, say, 64 MB you would limit the number of chunks to only 64 which is certainly too low. Here ramdisks can help because many of the chunks are filled with short contents, such as directories and symbolic links. The additional overhead to go through the filesystem layer may be less than what you can earn from bigger chunks. With ramdisk 1 MB chunks aren't too bad. Hartmut -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Tuning openafs write speed
Kai Moritz wrote: What are your data rates in MB/s? scp says: 4.6MB/s Isn't great either. So may be you have some other problems in your network? When I do a scp of a 100 MB file to my laptop I get ~ 8 MB/s and there is in parallel running a remote rsync with about another .7 MB/s in both directions (rsyncd and AFS). So I normally get the full 100 Mbit/s bandwidth when I write into AFS or read from AFS: = 10 MB/s. If you are on a fast network (Gbit Ethernet, Inifiband ...) a disk cache may be remarkably slower than the network. In this case memory cache can help. I haven't tried that yet, becaus in the file /etc/openafs/afs.conf of my Debian Etch installation there is a comment that says: # Using the memory cache is not recommended. It's less stable than the disk # cache and doesn't improve performance as much as it might sound. We are using here in our Linux clusters and on the high performance AIX power 4/5 machines memcache without problems. It's my special OpenAFS-1.4.4 with OSD support which is expected to arrive soonly in the OpenAFS CVS. But I suppose also the normal OpenAFS-1.4.4 should work without problems with memcache Hartmut Another point is chunk size. The default (64 KB) is bad for reading where each chunk is fetched in a separate RPC. with disk cache bigger chunks (1 MB) can be recommanded, anyway. For memory cache of, say, 64 MB you would limit the number of chunks to only 64 which is certainly too low. With automatically choosen values writing a 128 MB file in AFS takes about 44-45 seconds. On that machine I have a 3 GB cache. With the following options, which a have taken from an example in a Debian Configfile, writing the 128 MB file takes about 48 seconds :( -chunksize 20 -files 8 -dcache 1 -stat 15000 -daemons 6 -volume s 500 -rmtsys Greetings kai -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Full disk woes
Steve Devine wrote: I committed the cardinal sin of letting a server partition fill up. I have tried vos remove and vos zap .. I can't get rid of any vols.Volume management fails on this machine. Its the old style (non namei) fileserver. It doesn't seem like I can just rm the V#.vol can I? Any help? To remove the small V#.vol files doesn't help, they are really only 76 bytes long. What happens if you do a vos remove or a vos zap? Go the volumes away and the free space seems as low as before? This can happen, if you only removed readonly and backup volumes which typically can free only the space used by their metadata while the space used by their files and directories is shared between them and the RW volume. But, of course, you don't want to remove your RW-volumes. May be, if you have removed all RO- and BK- volumes you have enough free space for the temporary volume being created when you try to move your smallest RW-volume to another partition/server. There is also a -live option for the vos move command which should doe the move without creating a clone. I suppose it has been written for such cases. Good luck, Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Full disk woes
Steve Devine wrote: Hartmut Reuter wrote: Steve Devine wrote: I committed the cardinal sin of letting a server partition fill up. I have tried vos remove and vos zap .. I can't get rid of any vols.Volume management fails on this machine. Its the old style (non namei) fileserver. It doesn't seem like I can just rm the V#.vol can I? Any help? To remove the small V#.vol files doesn't help, they are really only 76 bytes long. What happens if you do a vos remove or a vos zap? both commands fail. Even when I use force. What says the VolserLog? Go the volumes away and the free space seems as low as before? This can happen, if you only removed readonly and backup volumes which typically can free only the space used by their metadata while the space used by their files and directories is shared between them and the RW volume. But, of course, you don't want to remove your RW-volumes. May be, if you have removed all RO- and BK- volumes you have enough free space for the temporary volume being created when you try to move your smallest RW-volume to another partition/server. There is also a -live option for the vos move command which should doe the move without creating a clone. I suppose it has been written for such cases. Good luck, Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Full disk woes
I tried a /afs/ipp/backups: vos listvldb 1938590434 -cell msu.edu vsu_ClientInit: Could not get afs tokens, running unauthenticated. svc.ml.mdsolids.31 RWrite: 1938590433ROnly: 1938590434RClone: 1938590434 number of sites - 3 server afsfs7.cl.msu.edu partition /vicepa RW Site server afsfs9.cl.msu.edu partition /vicepa RO Site -- Old release server afsfs7.cl.msu.edu partition /vicepa RO Site -- New release /afs/ipp/backups: and found out it's your machine afsfs9.cl.msu.edu which does the trouble. Then I did a vos status to this machine which did not respond. rxdebug afsfs9.cl.msu.edu 7005 shows a lot of connections in state precall with source ports != 7005. That means you have a lot vos commands running anywhere. Those you should stop first! Then perhaps restart your fileserver to get rid of the old transactions and then hopefully everthing is OK again. Hartmut Steve Devine wrote: Hartmut Reuter wrote: Steve Devine wrote: Hartmut Reuter wrote: Steve Devine wrote: I committed the cardinal sin of letting a server partition fill up. I have tried vos remove and vos zap .. I can't get rid of any vols.Volume management fails on this machine. Its the old style (non namei) fileserver. It doesn't seem like I can just rm the V#.vol can I? Any help? To remove the small V#.vol files doesn't help, they are really only 76 bytes long. What happens if you do a vos remove or a vos zap? both commands fail. Even when I use force. What says the VolserLog? Go the volumes away and the free space seems as low as before? This can happen, if you only removed readonly and backup volumes which typically can free only the space used by their metadata while the space used by their files and directories is shared between them and the RW volume. But, of course, you don't want to remove your RW-volumes. May be, if you have removed all RO- and BK- volumes you have enough free space for the temporary volume being created when you try to move your smallest RW-volume to another partition/server. There is also a -live option for the vos move command which should doe the move without creating a clone. I suppose it has been written for such cases. Good luck, Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info Lot of lines like this .. Fri Jul 6 10:05:18 2007 trans 3811071 on volume 1938590434 is older than 29730 seconds Fri Jul 6 10:05:48 2007 trans 3811072 on volume 1937192577 is older than 28530 seconds Fri Jul 6 10:05:48 2007 trans 3811071 on volume 1938590434 is older than 29760 seconds Fri Jul 6 10:06:18 2007 trans 3811072 on volume 1937192577 is older than 28560 seconds Fri Jul 6 10:06:18 2007 trans 3811071 on volume 1938590434 is older than 29790 -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos listvol and vos exam
Steve Devine wrote: We have switched some of our servers over to binaries compiled with fast restart. This morning we had to bring a server down for maint and when we brought it back up several of the vols where in need of salvage. I ran vos listvol thinking I would get a list of vols that were offline. Instead it showed all vols as On-line yet I had to salvage over a dozen that I have found so far. I get waiting for busy volume errors. Any advice how I can locate vols on a fileserver that need salvaging short of just biting the bullet and salvaging the whole partition.? /sd There should be messages in the FileLog such as VAttachVolume: Error attaching volume ; volume needs salvage; error=XXX - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos listvol and vos exam
Steve Devine wrote: Well from what I can see I am not sure I can count on the FileLog. Is there anyway to make the fileserver keep more than FileLog and FileLog.old ? /sd If you start the fileserver with -mrafslogs it renames old FileLogs to FileLog.date-time, for instance FileLog.20070627235901 - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] My salvager was cored by my volume.
Harald Barth wrote: Yesterday I had a server crash after a HW-RAID box decided to go out for lunch wihout even trying to have a reason. After I restarted with fast-restart and then salvaged everything. First pass with orphans ignore: + /usr/openafs/bin/bos salvage -server ruffe -partition a -volume pdc.vol.module -showlog -orphans ignore -localauth Starting salvage. bos: salvage completed SalvageLog: @(#) OpenAFS 1.4.4 built 2007-04-25 06/27/2007 20:07:27 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans ignore) 06/27/2007 20:07:28 2 nVolumesInInodeFile 64 06/27/2007 20:07:28 CHECKING CLONED VOLUME 537045986. 06/27/2007 20:07:28 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10 06/27/2007 20:07:28 SALVAGING VOLUME 537045984. 06/27/2007 20:07:28 pdc.vol.module (537045984) updated 06/01/2005 14:10 06/27/2007 20:07:28 totalInodes 3019 06/27/2007 20:07:29 dir vnode 451: ??/.. (vnode 449): unique changed from 6629 to 11697 -- deleted 06/27/2007 20:07:29 dir vnode 455: ??/.. (vnode 453): unique changed from 6631 to 7491 -- deleted 06/27/2007 20:07:29 Vnode 449: link count incorrect (was 2, now 1) 06/27/2007 20:07:29 Vnode 453: link count incorrect (was 9, now 8) 06/27/2007 20:07:29 Found 2 orphaned files and directories (approx. 4 KB) 06/27/2007 20:07:29 Salvaged pdc.vol.module (537045984): 3012 files, 25862 block Second pass with orphans attach: + /usr/openafs/bin/bos salvage -server ruffe -partition a -volume pdc.vol.module -showlog -orphans attach -localauth Starting salvage. bos: salvage completed SalvageLog: @(#) OpenAFS 1.4.4 built 2007-04-25 06/28/2007 15:57:26 STARTING AFS SALVAGER 2.4 (/usr/openafs/libexec/openafs/salvager /vicepa 537045984 -orphans attach) 06/28/2007 15:57:27 2 nVolumesInInodeFile 64 06/28/2007 15:57:27 CHECKING CLONED VOLUME 537045986. 06/28/2007 15:57:27 pdc.vol.module.backup (537045986) updated 06/01/2005 14:10 06/28/2007 15:57:27 SALVAGING VOLUME 537045984. 06/28/2007 15:57:27 pdc.vol.module (537045984) updated 06/01/2005 14:10 06/28/2007 15:57:27 totalInodes 3019 06/28/2007 15:57:28 The dir header alloc map for page 0 is bad. 06/28/2007 15:57:28 Directory bad, vnode 451; salvaging... 06/28/2007 15:57:28 Salvaging directory 451... 06/28/2007 15:57:28 Checking the results of the directory salvage... 06/28/2007 15:57:28 The dir header alloc map for page 0 is bad. 06/28/2007 15:57:28 Directory bad, vnode 455; salvaging... 06/28/2007 15:57:28 Salvaging directory 455... 06/28/2007 15:57:28 Checking the results of the directory salvage... 06/28/2007 15:57:28 Salvage volume group core dumped! How unhappy is my volume or my salvager and where is that core? Yes, I can access the volume and no, it is not written very often. [EMAIL PROTECTED] /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ ls amd64_fc3 i386_fc3 ia64_deb30 man rs_aix43 bini386_rh9 initmodulefiles src [EMAIL PROTECTED] /afs/pdc.kth.se/pdc/vol/module/3.1.6 $ fs lq . Volume Name Quota Used %Used Partition pdc.vol.module5 25862 52% 69% # vos exa pdc.vol.module -local pdc.vol.module537045984 RW 25862 K On-line ruffe.pdc.kth.se /vicepa RWrite 537045984 ROnly 0 Backup 537045986 MaxQuota 5 K CreationFri May 16 10:20:22 2003 CopyWed May 2 21:42:08 2007 Backup Thu Jun 28 02:18:52 2007 Last Update Wed Jun 1 14:10:44 2005 4874 accesses in the past day (i.e., vnode references) RWrite: 537045984 Backup: 537045986 number of sites - 1 server ruffe.pdc.kth.se partition /vicepa RW Site Tips and tricks how to proceed? The best would certainly be to find out why and where it core-dumped. Compile the salvager with -g and without -O and run it under gdb with -debug (to avoid it forks) or gdb the core file. Hartmut Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Salvaging an RO-Volume
You need to specify the RW-volumeId for salvage even if there is no RW volume in the partition! Hartmut Frank Burkhardt wrote: Hi, a broken RO-volume resides on one of my fileserver: $ vos listvol [fileserver] a [...] Could not attach volume 536877628 Total volumes onLine 352 ; Total volumes offLine 1 ; Total busy 0 I don't need it, so I want to remove it: # vos remove [heilbutt] a 536877628 Transaction on volume 536877628 failed Volume needs to be salvaged Volume needs to be salvaged Error in vos remove command. Volume needs to be salvaged Ok - let's salvage it: # bos salvage [fileserver] a 536877628 -showlog Starting salvage. bos: salvage completed SalvageLog: @(#) OpenAFS 1.4.4 built 2007-04-23 06/13/2007 09:10:23 STARTING AFS SALVAGER 2.4 (/usr/lib/openafs/salvager /vicepa 536877628) 06/13/2007 09:10:23 536877628 is a read-only volume; not salvaged That doesn't work :-( . What is the best way to handle this? Regards, Frank ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Illegal instruction
Karen L Eldredge wrote: We are trying to configure our initial AFS server on AIX 5.3, and we are running the bos addkey server -knvo 0 -cell cellname -noauth, and it will ask for the password for afs twice. After the entering the password the second time we enter and it core dumps and give the Illegal instruction message. Could you help us figure out what we are doing wrong. We have tried several different things without success. Sometimes these errors disappear if you switch of optimization and switch on -g during your build. Do a make clean and edit src/config/Makefile.config to remove all -O and to add -g to XCFLAGS. Hartmut Karen Eldredge PSD (AIX/Linux System Support) AIX Certified Advanced Technical Expert Certified System Expert - Enterprise Technical Support Linux Professional Institute Certified External: 303-924-5767 Tie line: 263-5767 Internal: 5-5767 [EMAIL PROTECTED] -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] How to replicate files on different machines
[EMAIL PROTECTED] wrote: Hi. I'm using OpenAFS 1.4.2 on Fedora 5. I want to replicate file(s) on 2 machines (both Fedora 5). How could this be achieved? Do I need to install OpenAFS server on both the machines, and if this is the requirement, how could the servers be synchronized? Yes you need a fileserver on the second machine and you need to define the replication side for each volume by using vos addsite The synchronisation is acchieved by vos release for the volume. This doesn't happen automatically, but you start some script via cron. Write now I'm facing one other issue. I have installed server on 1st machine and client on 2nd machine (both Fedora 5). I have given the cell information for the server on 2nd machine in /usr/vice/etc/CellServDB, CellServDB.dist and ThisCell. However, when I start the client, the cell under /afs/ is not displayed as a directory. # ls -l /afs/ total 0 ?- 0 root root 0 Jan 1 1970 ps2750.pspl.co.in # Hence I could not do any further file operations. Am I missing something? Did you start the afsd on the client with the option -dynroot? Do you have a volume root.cell in your cell? If both is true and your CellServDB and ThisCell information are correct you should see your cell. If you don't start afsd with -dynroot you need a volume root.afs and inside it a mountpoint for your root.cell under the name of your cell. Thanks and Regards, Shailesh Joshi ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Volume numbers
Jakub Witkowski wrote: Hello! From my observation of AFS volume ID numbers, it appears that they are always large, pseudo-random numbers unique to a given cell. I'd like to ask if there any lower/upper bounds to that value? Does volume number zero can exist? Or is that perhaps a special case? The volume numbers in a new cell start with what the vldb database initialization has but in as starting number. This is 0x2000 or decimal 536870912. Hartmut Thank you, Jakub Witkowski. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] disaster recovery
Dimitris Zilaskos wrote: Hi, One 1.3.87 linux fileserver died today. After a reboot, the filesystem check on vicepa spitted out numerous errors, it fixed them filling lost+found with data, and then after salavage I ended up with half the volumes missing or corrupted. I had one backup a few days old which I used to restore the volumes. I also have a copy of the /vicepa contents from yesterday, when the server started to behave strangely. Is there a way to use the /vicepa contents in order to access certain files/directories? Unfortunately I do not have a copy of the db files. The db-files do not matter. If you have a copy of your /vicepa with correct modebits, ownership, and group settings for the files you may use this instead of your old /vicepa. It is possible tar/untar vicep-partitions and to use them after that again. If you do that on another fileserver you should stop the corrupted one, start the new one and do a vos syncvldb newserver in order to update the volume location database. This will overwrite the location of each volume found on the new server. If this doesn't work try vos syncserv newserver (I never understood which one of those does what, but one of them does the job). You need then probably a fs checkvol on the client that he gets the new location. You should also think about having RO-volumes of your RW-volumes on other servers in the future. Then you easily can do a vos convertROtoRW ... to get again working cell. Good luck! Hartmut Cheers, -- Dimitris Zilaskos Department of Physics @ Aristotle University of Thessaloniki , Greece PGP key : http://tassadar.physics.auth.gr/~dzila/pgp_public_key.asc http://egnatia.ee.auth.gr/~dzila/pgp_public_key.asc MD5sum : de2bd8f73d545f0e4caf3096894ad83f pgp_public_key.asc ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: help, salvaged volume won't come back online, is it corrupt? [trimmed log]
Juha Jäykkä wrote: I was wonderin, somewhat off the topic, about the threat posed by this issue. Suppose, I lost the root vnode of a replicated volume. What happens to the replicas? Are they still fine (except perhaps for those on the same site as the rw volume), or does this corruption destroy all replicas as well? This is quite important to know for us... -Juha In my understanding you easily can propagate this error to your readonly replicas by vos releaseing the corrupt volume. The volserver on the receiving side would remove any data not mentioned in the dump stream. Better you do a vos convertROtoRW on the RO-site as soon as possible to regain a valid RW-volume in this case. Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: help, salvaged volume won't come back online, is it corrupt? [trimmed log]
Juha Jäykkä wrote: In my understanding you easily can propagate this error to your readonly replicas by vos releaseing the corrupt volume. The volserver on the receiving side would remove any data not mentioned in the dump stream. This is frightening. Can I actually vos release the corrupt volume? From the posts on the list, I'd gather it cannot even be attached - how could it be released, then? If it can't be attached anymore, probably not. But I don't know whether it really won't come online after the salvager has thrown away the root directory! Better you do a vos convertROtoRW on the RO-site as soon as possible to regain a valid RW-volume in this case. Except that I'm unlikely to notice the corruption before it's released, which happens automatically. Sounds like we need to change our backup policy... The best way to prevent the salvager from corrupting volumes is not to run it automatically. If you configure your OpenAFS with with --enable-fast-restart then the fileserver will not salvage automatically after a crash. So if you find after a ccrash volumes which couldn't be attached you salvage them by bos salvage server partition volume and examine the SalvageLog. I suppose in the case he throws the root-directory away you will see some thing in the log. Hartmut -Juha -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Building OpenAFS 1.4.1 on SLES 9.0
David Werner wrote: The configure-script issues still a warning: Cannot determine sys_call_table status. assuming it isn't exported I hope i don't need to patch the kernel. Does anyone knows something about it? In March and April there were some messages on this list regarding sys_call_table-issues, but i dont really know whether they apply in my case. We are running a client derived from OpenAFS 1.4.1 on sles9 wihtout problems since long. In /var/log/messages a start looks like: Jul 10 09:11:10 mpp-fs10 syslogd 1.4.1: restart.Jul 10 09:11:12 mpp-fs10 kernel: libafs: module not supported by Novell, setting U taint flag. Jul 10 09:11:12 mpp-fs10 kernel: libafs: module license 'http://www.openafs.org/dl/license10.html' taints kernel. Jul 10 09:11:12 mpp-fs10 kernel: libafs: no version for sys_close found: kernel tainted. Jul 10 09:11:12 mpp-fs10 kernel: Found system call table at 0xc03a7a00 (scan: close+wait4) Jul 10 09:11:12 mpp-fs10 kernel: Starting AFS cache scan...found 0 non-empty cache files (0%%). Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: About partition size limitations again
Adam Megacz wrote: Serge Torres [EMAIL PROTECTED] writes: A 2 Tb partition worked without a glitch. Out of curiosity, what is the largest file you've tried creating? Largest files here are about 80 GB. Hartmut One of the labs here at Berkeley asked if AFS could handle creation/access of their 200gb simulation data files and I couldn't find any anecdotal evidence. Aside from vos_move/vos_dump not being very useful with 200gb volumes I can't see a problem with this - a ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] rxkad patch
Here once again my patch for rxkad to allocate only as much space as necessary for the security object and not always 12K. This patch is based on the 1.4.1 version. For some unknown reason my 1st patch didn't make it into the CVS and stable releases. Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - --- private_data.h.orig 2003-07-16 01:16:42.345588002 +0200 +++ private_data.h 2005-12-16 11:52:19.527178509 +0100 @@ -48,15 +48,17 @@ afs_int32 ipAddr; /* or an approximation to it */ }; +#define PDATA_SIZE(l) (sizeof(struct rxkad_cprivate) - MAXKTCTICKETLEN + (l)) + /* private data in client-side security object */ struct rxkad_cprivate { afs_int32 kvno;/* key version of ticket */ -afs_int32 ticketLen; /* length of ticket */ +afs_int16 ticketLen; /* length of ticket */ +rxkad_type type; /* always client */ +rxkad_level level; /* minimum security level of client */ fc_KeySchedule keysched; /* the session key */ fc_InitializationVector ivec; /* initialization vector for cbc */ char ticket[MAXKTCTICKETLEN]; /* the ticket for the server */ -rxkad_type type; /* always client */ -rxkad_level level; /* minimum security level of client */ }; /* Per connection client-side info */ --- rxkad_client.c.orig 2006-02-28 01:19:20.107241106 +0100 +++ rxkad_client.c 2006-04-25 09:41:37.955757683 +0200 @@ -181,7 +181,7 @@ struct rx_securityClass *tsc; struct rxkad_cprivate *tcp; int code; -int size; +int size, psize; size = sizeof(struct rx_securityClass); tsc = (struct rx_securityClass *)rxi_Alloc(size); @@ -189,15 +189,15 @@ tsc-refCount = 1; /* caller gets one for free */ tsc-ops = rxkad_client_ops; -size = sizeof(struct rxkad_cprivate); -tcp = (struct rxkad_cprivate *)rxi_Alloc(size); -memset((void *)tcp, 0, size); +psize = PDATA_SIZE(ticketLen); +tcp = (struct rxkad_cprivate *)rxi_Alloc(psize); +memset((void *)tcp, 0, psize); tsc-privateData = (char *)tcp; tcp-type |= rxkad_client; tcp-level = level; code = fc_keysched(sessionkey, tcp-keysched); if (code) { - rxi_Free(tcp, sizeof(struct rxkad_cprivate)); + rxi_Free(tcp, psize); rxi_Free(tsc, sizeof(struct rx_securityClass)); return 0; /* bad key */ } @@ -205,7 +205,7 @@ tcp-kvno = kvno; /* key version number */ tcp-ticketLen = ticketLen;/* length of ticket */ if (tcp-ticketLen MAXKTCTICKETLEN) { - rxi_Free(tcp, sizeof(struct rxkad_cprivate)); + rxi_Free(tcp, psize); rxi_Free(tsc, sizeof(struct rx_securityClass)); return 0; /* bad key */ } --- rxkad_common.c.orig 2006-02-28 01:19:20.361083608 +0100 +++ rxkad_common.c 2006-04-25 09:43:04.572665345 +0200 @@ -68,7 +68,7 @@ #include strings.h #endif #endif - +#include afs/afsutil.h #endif /* KERNEL */ #include des/stats.h @@ -311,7 +311,8 @@ tcp = (struct rxkad_cprivate *)aobj-privateData; rxi_Free(aobj, sizeof(struct rx_securityClass)); if (tcp-type rxkad_client) { - rxi_Free(tcp, sizeof(struct rxkad_cprivate)); + afs_int32 psize = PDATA_SIZE(tcp-ticketLen); + rxi_Free(tcp, psize); } else if (tcp-type rxkad_server) { rxi_Free(tcp, sizeof(struct rxkad_sprivate)); } else {
Re: [OpenAFS] Problem with vos move
Wheeler, JF (Jonathan) wrote: I have attempted to move a large (about 80 Gb) AFS volume from one partition to another using the command: vos move VOLUME MACHINE i MACHINE l I left this running overnight and found the following error messages when I checking this morning: Failed to move data for volume 536871896 rxk: sealed data inconsistent vos move: operation interrupted, cleanup in progress... clear transaction contents FATAL: VLDB access error: abort cleanup cleanup complete - user verify desired result The current situation is: 1. The volume is on-line on the source partition, but the volume (VLDB entry) is locked. Here is the result of the command vos listvol MACHINE i: Total number of volumes on server wallace partition /vicepi: 1 bfactory.vol2 536871896 RW 83886080 K On-line Total volumes onLine 1 ; Total volumes offLine 0 ; Total busy 0 and the result of the command vos listvldb bfactory.vol2: bfactory.vol2 RWrite: 536871896 number of sites - 1 server wallace.cc.rl.ac.uk partition /vicepi RW Site Volume is currently LOCKED 2. The volume is off-line on the destination partition (same size and same volume number). Here is the result of the command vos listvol MACHINE l: Total number of volumes on server wallace partition /vicepl: 1 bfactory.vol2 536871896 RW 83886080 K Off-line Total volumes onLine 0 ; Total volumes offLine 1 ; Total busy 0 Please note that the server is running IBM/Transarc version 3.6 (though it may not matter in this case). My questions are: a) What went wrong ? b) What chance is there that the volumes are identical ? In other words, is it possible that I can complete the move manually ? c) Is there anyway to compare the 2 volumes ? Any help would be very much appreciated I suppose your token expired over night. The vos move command 1st creates a clone of the volume, then dumps this clone over to the new partition. This probably takes the long time and in this phasis your token expired. Then the still off-line volume on the sink side should have been updated by an incremental dump of the RW-volume on the source side. This either could not happen any more because of the expired token or it happened on behalf of the source side volserver whithout any new rpc from your vos-command. But then the vldb must be updated by a ubik-call from your vos command and that one failed. The next rpc would have brought on-line the volume on the sink side and other rpcs would hvae removed the volume and the clone and backup. If the RW-volume has not been modified since the begin of the vos move the off-line version on the sink side should be complete. Hartmut Reuter Jonathan Wheeler e-Science Centre Rutherford Appleton Laboratory (cell rl.ac.uk) ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Inode file server on Linux?
Frederic Gilbert wrote: Hi, I have seen in the 1.4.0 announcement: Cache chunk locking has been refined, and native operating system vnodes on Linux and MacOS X are now supported. This has nothing to do with the fileserver, only with the client. Unfortunately the term vnode is used in different sense on the client and server: while on the client the vnode is a structure in the kernel on the server a vnode is a line in either the small or large vnode-file of the volume (and of course also a structure in the fileserver's memory). Hartmut When I have read this, I understood that now the inode file server was supported on Linux. But I don't see any other reference to this, so maybe I am wrong? So, is it possible to use the inode file server with Linux/ext2, and if so how can I activate it? Thanks, Frederic Gilbert. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] VOS commands
Juha Jäykkä wrote: I was wondering, what do the commands vos changeloc and vos convertROtoRW actually do? The vos help on these is rather scarce and openafs.org docs have nothing at all. Obviously new additions. Are they documented somewhere? Don't tell me in the defunct wiki. =) Luckily it's on its way back up... -Juha vos converRotoRW can be used if you have lost the partition where the RW-volume was but still have a RO-volume somewhere else. In this case you can convert the RO-volumes to the new RW-volume. This is much faster than a vos dump | vos restore because it only changes some fields in the volinfo-file and renames some files in /vicepx/AFSIDat//special This is part of a backup strategy: The mount points for our home directories are always made with the -rw option. We then have in the same partition a RO-volume and another one on a different server. The first RO-volume keeps the time needed for the reclone during vos release short and doesn't really consume much disk space because only changed files and some metadata files are duplicate. The remote RO-volumes are our real backup system. Even if you have lost a TB partition you can be back again in production after half an hour. Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AFS clients and IP address 192.168.67.1
Wheeler, JF (Jonathan) wrote: Whilst investigating a network problem, one of our network gurus noticed that our AFS client systems are sending packets to IP address 192.168.67.1 (I confirmed this by using tcpdump). Issuing the command: fs getserverprefs -numeric | grep 192 gave the reply: 192.168.67.1 40006 Please would someone explain what is happening here; is it related to dynroot ? I expect that it is a FAQ. No. Probably one the fileservers contacted by this client has also a secondary private interface which is not masked by a NetInfo or NetRestrict file. The server preferences of 40006 indicates it is a fileserver not belonging to your own network, so it's some work to find out where it belongs to. If you do a vos listaddr -cell XXX for all cells in your CellServDB you should be able to indentify this server. Hartmut Jonathan Wheeler e-Science Centre Rutherford Appleton Laboratory (AFS cell rl.ac.uk) ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] TSM and AFS
Tony Derry wrote: Is anyone using Tivoli Storage Manager with AFS filesystems? This is a terribly slow solution and we havnt been able to find an acceptable alternative. We are using a P650 as a server and LTO2 tape drives. We recently did a restore of 20 GB that took 27 hours. Non AFS data can be restored at about 50 GB per hour. Any ideas would really be appreciated. Thanks. t. We are using also TSM for file-based backup. This, however, is used only to restore old versions of single files, not in case of broken filesystems. With today's huge /vicep-partitions we use RO-volumes on separate servers as backup solution. These RO-volumes can be converted to RW-volumes within few minutes. Hartmut Anthony Derry, Manager University of Maryland, OIT Enterprise Storage Services Building 224, Rm. 1317 College Park, Md. 20742 [EMAIL PROTECTED] 301-405-3059 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unusable empty partition
Derek Atkins wrote: Last I checked, Vice Partitions were only allowed a one character name, meaning /vicepcy is an invalid Vice Partition. Unless it's been changed recently, you can only have /vicepa through /vicepz That's not true: since long you can have up to 255 partitions, that is from /vicepa to /vicepju or /vicepjv. Hartmut -derek Hans-Gunther Borrmann [EMAIL PROTECTED] writes: Hello, I have one partition on a server, which is empty: Total number of volumes on server localhost partition /vicepcy: 0 Total volumes onLine 0 ; Total volumes offLine 0 ; Total busy 0 But I cannot move any volumes to this partition: [EMAIL PROTECTED]:~ vos move www.natscan.kirsche7 servera by serverb cy -verbose Starting transaction on source volume 537085072 ... done Cloning source volume 537085072 ... done Ending the transaction on the source volume 537085072 ... done Starting transaction on the cloned volume 537091168 ... done Creating the destination volume 537085072 ... done Dumping from clone 537091168 on source to volume 537085072 on destination ...Failed to move data for the volume 537085072 VOLSER: Problems encountered in doing the dump ! vos move: operation interrupted, cleanup in progress... clear transaction contexts access VLDB move incomplete - attempt cleanup of target partition - no guarantee cleanup complete - user verify desired result The VolserLog shows: Wed Jun 1 10:18:03 2005 VAttachVolume: Failed to open /vicepcy/V0537085072.vl (errno 2) Wed Jun 1 10:18:03 2005 1 Volser: CreateVolume: volume 537085072 (www.natscan.kirsche7) created unable to allocate inode: File exists Wed Jun 1 10:18:03 2005 1 Volser: ReadVnodes: Restore aborted Wed Jun 1 10:18:03 2005 1 Volser: Delete: volume 537085072 deleted and df -k gives: [EMAIL PROTECTED]:logs]# df -k /vicepcy Filesystem1024-blocks Free %UsedIused %Iused Mounted on /dev/vicepcy262144000 202566836 23% 2191 1% /vicepcy which means that about 60 GB are occupied. What to do? The server is a namei server. My idea is therefore to simply remove all files and directories except ./lost+found ./Lock ./Lock/vicepcy ./AFSIDat ./AFSIDat/README Will this be safe? -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] can not change a backup or readonly volume
You are certainly in your readonly-tree. What shows fs lq? To get into the RW-path do a cd /afs/.dma_lab.ecs.syr.edu/usr/bob then it will work. Later do an vos release for the volume where you created the mount point. Hartmut [EMAIL PROTECTED] wrote: hi, one step forward, At least the dot in front of the cell name could be a showstopper. Do you need the -cell option at least? Mine vos create works without. vos create works; [EMAIL PROTECTED]/afs/computer_lab.edu/usr/bob% klog admin Password: [EMAIL PROTECTED]/afs/computer_lab.edu/usr/bob% vos create -server addedserver.edu -partition /vicepe -name addedserver-afs Volume 536871114 created on partition /vicepe of addedserver.edu [EMAIL PROTECTED]/afs/computer_lab.edu/usr/bob% vos listvol -server addedserver.edu Total number of volumes on server addedserver.edu partition /vicepe: 1 addedserver-afs536871114 RW 2 K On-line Total volumes onLine 1 ; Total volumes offLine 0 ; Total busy 0 - but this command doesn't work ... [EMAIL PROTECTED]/afs/dma_lab.ecs.syr.edu/usr/bob% fs mkm -dir /afs/computer_lab.edu/added-afs -vol 536871114 fs: You can not change a backup or readonly volume __ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Please help! Error in server synchronisation
[EMAIL PROTECTED] wrote: Hello everybody, I've got two servers running AFS. Both of them are running the fileserver and database processes. But there is a problem whenever I want to synchronize the VLDB with the vos syncvldb command. The error it shows is always the no quorum elected-error. I've read a lot about no quorum elected-errors in the archives of this mailing lists. No I've already setup one server as ntp-server and the other as client so I think it isn't the synchronisation problem of Ubik. Is there someone who has experience with this error, or someone who can help me. Big THX Greetz Have you restarted the databse servers on the older machine after adding the new host by bos addhost? If not, the old node doesn't know about the new node. What say udebug node1 7003 and udebug node2 7003? Hartmut ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Please help! Error in server synchronisation
[EMAIL PROTECTED] wrote: What say rxdebug node1 7003 -version and rxdebug node2 7003 -version? Sorry, should be udebug instead of rxdebug!! rxdebug SCM 7003 gives: Trying 10.1.202.20 (port 7003): Free packets: 634, packet reclaims: 4, calls: 1094, used FDs: 6 not waiting for packets. 0 calls waiting for a thread 16 threads are idle Done. rxdebug secondserver 7003 gives: Trying 10.1.202.6 (port 7003): Free packets: 634, packet reclaims: 0, calls: 385, used FDs: 6 not waiting for packets. 0 calls waiting for a thread 16 threads are idle Done. Hope someone can help me, because this problem is taking too much time :-( THX for your response! Greetz ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] afs partition on nfs ?
HM wrote: Hi all, I;m just starting the afs adventure, checked the archives but couldn't find anything about this. I would like to mount_nfs the vicepx partition on a nas. I was wondering if this is possible at all ? Do i need to use special builds etc ? (like using --enable-namei-fileserver) with --enable-namei-fileserver should it be no problem to use a NFS-mounted partition as vicep-partition. I would recommand using the namei server any way. Hartmut Thanks, Hans ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to move volumes
Hans-Gunther Borrmann wrote: On Wednesday 09 March 2005 17:21, Derrick J Brashear wrote: On Wed, 9 Mar 2005, Hans-Gunther Borrmann wrote: On the destination the VolserLog contains: Wed Mar 9 16:52:11 2005 VAttachVolume: Failed to open /vicepcp/V0537085440.vl (errno 2) Wed Mar 9 16:52:11 2005 1 Volser: CreateVolume: volume 537085440 (usr.md0) created unable to allocate inode: File exists Namei or inode? I'm unsure why the file would exist, but if it's namei, I suppose you could use a syscall tracer and see what's getting EEXIST, I'd be curious to hear what you have that's in the way. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info Sorry. I forgot the VolserLog: Fri Mar 11 14:38:57 2005 VAttachVolume: Failed to open /vicepcp/V0537085534.vl (errno 2) Fri Mar 11 14:38:57 2005 1 Volser: CreateVolume: volume 537085534 (usr.sperling) created unable to allocate inode: File exists Fri Mar 11 14:38:57 2005 1 Volser: ReadVnodes: Restore aborted Fri Mar 11 14:38:57 2005 1 Volser: Delete: volume 537085534 deleted Gunther If you look into /vicepcp/AFSIDat/S=/SNo+U/special is there anything? If so remove the subtree /vicepcp/AFSIDat/S=/SNo+U and - if existent - /vicepcp/V0537085534.vl and try again. The message unable to allocate inode: File exists looks like there is some volume special file from an earlier try around. Hartmut -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Second AFS-server problems!
You forgot to tell what problems you have! How can we guess? Hartmut [EMAIL PROTECTED] wrote: Hello Everyboby, First maybe a little introduction because this is my first mail to this mailing list. I'm a 23 jear old student from Belgium and since this year an official Linux user. For this last year all the students have to do a big project, my project is to do research about clustering filesystems. Now I am trying to get an OpenAFS system running, I have succeeded in getting one SCM up and running. But I'm having trouble with the second server. I hope someone can help me with this problem because I'm running out of time. The purpose is to get the exact same server as the first running. To set up the first, I used the Gentoo Manual for AFS. I allmost read the whole manual on OpenAFS while trying to get the second server up and running, but still I didn't succeed. Hope someone can help a bit closer to my objective and give me the wright commands to get the second server up and running and synchronizing with the first. Thx Loretto ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AFS+Large file support
The current unstable release 1.3.79 has large file support on client and server side. At least if you compile from source you can configure with --enable-largefile-support and that compiles the fileserver with large file support. I am not sure whether this is possible only with the so called namei-fileserver (which stores the data visible in the /vicep partition) or also with the classical interface which hides the files in the /vicep partition. Hartmut Manel Euro wrote: Hello, I am studying the implementation of OpenAFS in the company I work for. I have reading about OpenAFS' large file support and the fact that AFS has 2gb file size limite is a problem to my implementation. We have several processes that need to use files larger than 7 Gb and produce large log files. I have read that support to Large file and volumes is projected but it is not yey in progress. Is there any way I can add support for large files with the latest sable OpeanAfs version for linux? Or, do I need to wait for the next versions with large file support? Regards, FM _ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: fs process doesn't exit until I send a signal 9
Mike Polek wrote: On Thu, 24 Feb 2005, Derrick J Brashear wrote: On Thu, 24 Feb 2005, Gabe Castillo wrote: I had to shut down one of my AFS servers to replace a disk. When I issue a bos shutdown command, all the processes seem to shutdown, except for the fs process. When I vos status the server, it says that the fileserver has been disabled, and the sub-status is in the process of shutting down. Is there Wait while it breaks callbacks. you can watch the status in /usr/afs/logs/FileLog --- For what it's worth, I have servers that have thousands of volumes on each partition. (Ok... maybe a poor design choice, but I didn't know the single threaded volume server would be an issue when I did the design...) After 30 minutes, the bosserver assumes that the fileserver isn't going to stop, and does a kill -9 to stop it. I'm pretty sure it's just because of the sheer number of volumes to unmount. 1) Is there an easy way to change the timeout value? I'm not sure yet if it's faster to do the kill -9 one minute into the shutdown and just let the salvager do it's thing, or if it's better to let the shutdown take an hour. I can say that it would be helpful to have an emergency procedure that won't corrupt volumes for when the shutdown is triggered by a power failure. :-) I think it's unsane if the shutdown takes that long. There must be a problem with your clients, perhaps switched off PCs, that the callback has to wait for timeouts. The writing of the volume information to disk never should take that long even if you have 1 volumes on a server. If you have compiled with --enable-fast-restart you can kill your fileserver after a while (after all active RPCs have finished) and the only disadvantage at restart may be that the uniquifier is too low. 2) I noticed that in the 1.3 branch, the volume server is multi- threaded. (THANK YOU!!!) Does anybody know how this affects shutdown/startup time? Should I still be looking for a way to reduce the number of volumes on a server? The volserver has nothing to do with the time needed by the fileserver to shutdown. The volserver only does volume operations such as move, backup or release. 3) I've seen references to a NoSalvage option. Is that also new in 1.3? or is it some sort of patch? Anybody have a really good way of dealing with lots of volumes on a server? We currently have almost 60T of storage, and it's growing. I like the idea of having things well organized into finite volumes... it works for our setup. Is your NoSalvage option the same as --enable-fast-restart? if so, this I introduced to avoid hours of salvaging after a crash. My experieance was that the log contained nearly never a real error message. I think it's better to let the fileserver automatically take a volume off-line when he detects an inconsistency than to have to wait hours for a restart. Hartmut Any assistance is appreciated. Thanks, Mike ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AIX client panics machine if ThisCell is invalid
Have you started afsd with -afsdb ? This at least crashed my client under AIX 5.2 regularly. Hartmut Ben Staffin wrote: So ... I discovered the hard way that (on AIX at least) if you put a non-existent cellname in /usr/vice/etc/ThisCell and start the AFS client, the system will panic as soon as afsd is started. Weird, huh? Pretty obscure error condition, but still ... kernel panic? Yow. -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] releasing volumes automatically
Marco Spatz wrote: Hi, I've set up an OpenAFS und a 4 machines cluster, and have made replicas of the user's home volumes. I know, that generally, this is a bad idea, but I want to use this as backup the user can access without my help (I've mounted the readonly volumes to another mountpoint). And now I want the AFS system to release this volumes at night, but I don't know how to this. Thought about writing a cronjob, but I don't know how do gain access as admin to get the rights to execute 'vos release'. Is there any possibility to tell OpenAFS to release certain (or all) changed volumes at a certain time? Would be a great help. We run on our fileserver machines cronjobs under root which use vos release xxx -localauth Hartmut Thanks for your help, Marco ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] crash on AIX 5.2
I am in the process of tracking down all differences between my good version and 1.3.77. I am now not very distant from 1.3.77, and at least one problem seems to be the new code in afs_pioctl.c for get and set tokens along with the huge ticket size introduced for compatibilty with active directory. Keeping the old ticket size and the old code for tokens in afs_pioctl.c results in a fairly stable client. At least I can get a token, make clean in the openafs-tree and make dest without crashing the system. This is certainly not enough testing for putting it into production, but a hint where the problem may be hidden. Hartmut Michael Niksch wrote: I have compiled 1.3.77 under AIX 5.1 and see the same problem. In my case the machine crashes after getting a token. It seems to work before. I am seeing that same problem with 1.3.77 on all AIX 4.3.3, AIX 5.1, and AIX 5.2, both 32 and 64 bit kernel. In contrast to previous versions of 1.3, I can at least load the kernel extensions now. Obtaining a token with 1.3.77 'klog' from a kaserver causes a core dump, and trying to use a token obtained with 1.2.10 'klog' results in a system crash. Yes, I understand that 'klog' and 'kaserver' are considered more or less deprecated, and we are in fact planning to change to a Kerberos 5 setup, but this migration will take us quite some time to complete. Also, we will need continued interoperability with legacy IBM Transarc AFS cells for quite some time to come. So it would be great if 'kaserver' support wasn't silently dropped out of OpenAFS already at this point. So far, I haven't been able to compile ANY version of OpenAFS client to work on AIX 5. The server code might be less problematic as it doesn't involve kernel extensions, but I am also still running 1.2.11 server binaries compiled on AIX 4.3.3 on my AIX 5.2 servers. -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] crash on AIX 5.2
Jeffrey Altman wrote: Hartmut Reuter wrote: I am in the process of tracking down all differences between my good version and 1.3.77. I am now not very distant from 1.3.77, and at least one problem seems to be the new code in afs_pioctl.c for get and set tokens along with the huge ticket size introduced for compatibilty with active directory. Keeping the old ticket size and the old code for tokens in afs_pioctl.c results in a fairly stable client. At least I can get a token, make clean in the openafs-tree and make dest without crashing the system. This is certainly not enough testing for putting it into production, but a hint where the problem may be hidden. Hartmut We know the problem is in the set/get token code on AIX. More then likely the stack is too small to support a 12000 byte object and it is getting blown away on AIX. The question is: * where is this object that is located on the stack? If you can find that, then you will have solved the bug. Does not look like stack overflow. The crash always happens in xmalloc1: (0) f pvthread+00A500 STACK: [006021F0]xmalloc1+0007AC (0200, F1E00C22E000, , F1E00C22E000, 0400, F1E03B964269, 0002, 003E4338 [??]) [00606B70]xmalloc+000208 (??, ??, ??) [08E41978]afs_osi_Alloc+5C (??) [08EBC6DC]afs_HandlePioctl+0003D4 (, 800C5608800C5608, F0002FF3A400, , F0002FF3A438) [08EC74F8]afs_syscall_pioctl+000294 (, 800C5608800C5608, 2FF21FC0, ) [08E46000]syscall+0001A0 (00140014, , 800C5608800C5608, 2FF21FC02FF21FC0, , 2E6D70672E6D7067, 00800080) [08E45DB8]lpioctl+50 (, 800C5608800C5608, 2FF21FC0, ) [379C]sc_msr_2_point+28 () Not a valid dump data area @ 2FF21CF0 (0) So there probably storage on the kernel heap was overwritten. Hartmut Jeffrey Altman -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] SuSE 9.2: anyone?
Derek Atkins wrote: Sergio Gelato [EMAIL PROTECTED] writes: * Sensei [2004-12-23 16:54:08 +0100]: Has anyone got AFS working on suse 9.2 using their afs client? I had to fix a script (it searched for kernel module libafs, actually the one shipped with suse is called kafs) but anyway, afsd isn't starting: Is kafs the OpenAFS implementation or something else? kafs is the (broken) Linux Kernel AFS implementation. It is not OpenAFS, does not work with OpenAFS' afsd, and if you build kafs you will lose badly. -derek SuSE 9.2 has the openafs client in its distribution. After running YOU you also schould get a working version. You should run it with -memcache to avoid the bug which fills up the cache partition, I think it's not corrected there. I replaced in /etc/sysconfig/afs-client the following: XXLARGE=-stat 4000 -daemons 6 -volumes 256 -fakestat -blocks 262144 XLARGE=-stat 3600 -daemons 5 -volumes 196 -fakestat -blocks 133072 LARGE=-stat 2800 -daemons 5 -volumes 128 -fakestat -blocks 65536 MEDIUM=-stat 2000 -daemons 3 -volumes 70 -fakestat -blocks 32768 SMALL=-stat 300 -daemons 2 -volumes 50 -fakestat -blocks 16384 MEMCACHE=yes You may also add -chunksize 20 if you have enough memory because it makes the client much faster (otherwise it does an rpc for each 8 K for read). Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] memcache or diskcache on ramdisk
EC wrote: Hi, Do using memory cache for AFSD 'better' (faster, more stable, etc..) than disk cache over RAMDISK or TMPFS ? EC. My experience with 1.3.74 is that memcache is really fast. We reach 70 MB/s for write and 48 MB/s for read of an 8 GB file which is about 20 MB/s faster than with ramdisk. This was on SuSE SLES-9 with kernel 2.6.5-7. With the 2.6 kernel there is still a problem that the disk cache gets full which makes the use of ramdisk nearly impossible. Of course disk cache still makes sense if your network is slower than your local disk. But in production environments such as blade centres the network typically is much faster than the internal disk. Hartmut ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Question about RO/RW Volumes...
use the -rw option for fs mkm to force use of the RW volume. We do the same: all user volumes are mounted with -rw and have 2 RO copies one in the same partition to make the reclone fast and another one on another fileserver as a backup in case the 1st partition gets lost. We also have another tree where the RO-volumes are mounted to allow users to get back their files from yesterday. The automatic release of volumes theat have changed is done in a cron job on each fileserver machine during the night. Hartmut Lars Schimmer wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi! Just a question about RO/RW copies. We have set up 3 volumes for every user (home, work, ftp) and few others with CVS, svn, data,... For easy backup we've made RO copies of nearly all volumes. But now, with 2 database servern and all the RO copies, we run into a problem not thought about before. With the 2nd database server in the cellservdb, most machine use the RO copies of the volumes. With some volumes (archive, cdonline) that's OK for working (but hey, these data isn't really small to hold a RO copy), but with CVS, svn or home dirs, a RO copy-mount isn't really nice. How can we be sure, to have RW Access to these volumes? It would be nice, if OpenAFS would loadbalance the read to all the RW RO volumes, but write only to the RW volume and than automaticly release this volume... The only dirty solution I found is to mount the root.cell volume RW as /afs/.url.to.domain to have guranteed RW access to the volumes. Cya Lars - -- - - Technische Universität Braunschweig, Institut für Computergraphik Tel.: +49 531 391-2109E-Mail: [EMAIL PROTECTED] PGP-Key-ID: 0xB87A0E03 -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Debian - http://enigmail.mozdev.org iD8DBQFBbOo/VguzrLh6DgMRAjcjAKCZOu57oAGC4UCu7uiVgMCCjg5OnwCeP6hn wLaX2jZOksBZfo7iA6bI+40= =GIK6 -END PGP SIGNATURE- ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] can't mount afs on /afs
[EMAIL PROTECTED] wrote: hi, i'm trying to work this out for a couple of weeks with no success. i'm using RedHat 9 and openafs 1.2.11. Everything goes ok until i install the client. At bootup, and after starting afs daemon i always got Can't mount AFS on /afs(22) Lost contact with file server at 127.0.0.1 in cell ... What says vos ex root.afs? I guess that the vldb contains the wrong address and your client therefore tries to contact the fileserver 127.0.0.1 which is he himself. But there is no filserver running. if the vldb for some reason got the 127.0.0.1 associated with this volume you may try vos changeaddr 127.0.0.1 correct ip adress to solve the problem. Hartmut i read about a problema about hostname and fqdn but i have no idea what to do. my hostname is set to fileserver, that's what hostname says, and my /etc/hosts resolves fileserver to the local ip (but not to 127.0.0.1). I read about it somewhere in the list but was not good to me any clues? thx a lot claudio ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] ACLs not working on afs volumes! Help!
This is intended behaviour. It may be discussed whether it's really a good idea, but the code in src/viced/afsfileprocs.c in the routine Check_PermissionRights (line 835 ff) shows if (CallingRoutine == CHK_STOREACL) { if (!(rights PRSFS_ADMINISTER) !VolumeOwner(client, targetptr)) return (EACCES); } else { That means if the client user is the owner of the volume (the owner of the volume's root directory) he doesn't get EACCES. -Hartmut matt cocker wrote: Hi We are having a weird problem with some afs volumes in that if a user has had admin access to a volume and we remove admin access from the acl list for that user (or remove the user from the acl list completely) the user can just add themselves back. Is this intended behavior? All our user volumes are prefixed with user. i.e user.username We have tested other volumes but it only seems to be volumes the user has had full access to. The problem (same for linux and windows) $ fs listacl /afs/ec.auckland.ac.nz/users/t/ctcoc006 Access list for tcoc006 is $ fs listacl /afs/.ec.auckland.ac.nz/users/t/c/tcoc006 Access list for /afs/.ec.auckland.ac.nz/users/t/c/tcoc006 is $ ls /afs/ec.auckland.ac.nz/users/t/ctcoc006 ls: tcoc006: Permission denied $ fs setacl -dir /afs/ec.auckland.ac.nz/users/t/c/tcoc006 -acl tcoc006 all $ fs listacl /afs/.ec.auckland.ac.nz/users/t/c/tcoc006 Access list for /afs/.ec.auckland.ac.nz/users/t/c/tcoc006 is Normal rights: tcoc006 rlidwka $ fs listacl /afs/ec.auckland.ac.nz/users/t/c/tcoc006 Access list for tcoc006 is Normal rights: tcoc006 rlidwka We are looking into other effected volumes but at the moment I just want to know if we have miss understood how acls work but users can't even view the acls of volume mount points that the don't have acl entries for i.e. fs: You don't have the required access rights on 'tcle012' Access list for tcoc006 is Confused Cheers Matt ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Redundant cell
You don't need another cell, just other database and - may be - fileserver on your other floor. If the connection is cut off the sync-site database server (the one with the lower ip-address) will work as before and the other one should still be sufficient to reply to read-olnly requests such as volume location or pts membership requests. Most cells are built with this redundancy because it is one of the main feautures of AFS. -Hartmut Sensei wrote: Hi. I'm back and I have a question, maybe not so common. :) I've built an openafs cell, on debian stable. It authenticates over kerberos 5, and gains a token from openafs_session, so no kaserver and no passwords anywhere other than kerberos db. Good it works. Now, my question about it is: how to make it redundant? We have a quite unreliable network. The server is on one floor and I'm thinking about having a second server on the second floor. I need these two cells to work cooperatively but ``independent'' one from each other. In other words, if the link between the two servers goes down, each floor keep to authenticate and work. Login can work fine, even without the home directory, which can reside on the other server. How can I do this? Do not bother about krb5. I'm dealing now with all the afs issues. -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Volumes lost on fileserver / vldb post fs crash
Is this a namei-fileserver? If so you could cd as root into the /vicep-partitions and look what remained. If the fileserver doesn't report any volumes probably the volume header files are gone, but not necessarily all the data in AFSIDat. You may do a du to see what remained. What says the SalvageLog. It should report about any volumes the salvager so an why he deleted them. Forrest D Whitcher wrote: I had a fileserver crash yesterday with apparently bad consequences. The volumes are no longer listed in the vldb (I have a listing of ID's names etc but I'm not sure how much that helps. command vos listvol fileserver gives: Total number of volumes on server thing partition /vicepa: 0 Total volumes onLine 0 ; Total volumes offLine 0 ; Total busy 0 Total number of volumes on server thing partition /vicepb: 0 Total volumes onLine 0 ; Total volumes offLine 0 ; Total busy 0 Total number of volumes on server thing partition /vicepd: 0 Total volumes onLine 0 ; Total volumes offLine 0 ; Total busy 0 After the crash I stopped, then restarted all services on the fs, tho I have not yet done any restart on the database server. post restarting services I ran a salvage, which ran fairly quick (3-4 min. on 2 4g and 1 8g partitions on a k7/600 system). I fear the speed with which this finished may well indicate the fs's view of what volumes it houses are well and truly lost. the vldb showed the correct entries for a few hours after this fs crashed and restarted, I've tried to do the following to restore volumes that had been on the fileserver: vos syncv fileserver /vicepd 536870970 So the question is, do I have any reasonable chance of recovering the currently invisible volumes on this fs? If so, how should I be going about it? thanks for any ideas forrest ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] File Size Limitations
With the stable releases (which are not compiled with large file support) the maximum filesize is (2GB - chunksize). With instable 1.3.52 you have large file support on the client and (as far as I know) on the server side. However, on the client side it's available only for AIX (4.2 - 5.2), Linux 2.4 and (Solaris =8 with 64bit kernel), unless someone else has done the work. Hartmut Penney Jr., Christopher (C.) wrote: How exactly does the file size limitation show up in OpenAFS (ie. 2GB per file maximum)? I've been told that the maximum file size depends on a couple of factors and I'm trying to figure out what they are. Right now I have a test environment with a couple of Solaris 9 boxes (one the file server and one a client) and I'm seeing a 2GB per file size limitation. Chris -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Questions, vol. 2.
Stephen Bosch wrote: More questions! -Volumes and volume sizes -- what do you use as a typical volume size/quota? The default is 5 Mb, which is ridiculously small (and points toward an assumption that AFS will be used largely for user home directories). What is too big? For example, I have just created a volume with a 4 Gb quota, as that will comfortably fit on a DVD-R. We have many home-directory volumes with ~ 5 GB, but the larger a volume is the longer takes it to move it to another server or partition. -Volume granularity -- at a minimum, a volume must correspond to one directory, correct? In other words, I can't concatenate volumes invisibly. correct. -Another partition question -- on a /vicepxx partition, where does the data actually reside? If you have a namei-fileserver (under Linux always) they are under /vicepxx/AFSIDat there is a tree of subdirectories where data belonging to a volume are in a common subtree. -Unix/AFS user account synchronization: We have two existing workstations that are heavily used. These workstations will also use AFS, but we don't want to move their local home directories to the AFS cell. Do we have to? All the docs seemed geared to that, but all we want is an AFS cell where we can save critical data and then replicate it or back it up. You don't have to synchronize uids and AFS-ids. It's only nicer to see the file ownership correctly because it is translated by /etc/passwd. The docs leave me with the understanding that a client workstation will treat the mounted AFS filespace the same as a mounted local disk. That is, a file owned by user ID 501 in AFS will appear the same as a file on a local disk owned by user ID 501. If I want to create a new user in the cell, does this mean that I have to first create a user in AFS create a user on the user's workstation with the same UID/GID as the new AFS user? If you use uss to create the user that may be true. But if you create the ptserver entry by hand you can give the afs user his unix uid by specifying pts createuser name -id uid -Group IDs -- AFS uses negative group ID numbers. The Linux machines have no idea what to do with that -- they just read the group ID's as 0 -afs-modified login, etc. The documentation recommends using the afs modified login. In our case, that essentially means using pam for afs authentication, but as one poster has just pointed out, some applications like openssh don't always function properly with the afs pam module. What do you use in your installations? Is it better to just put klog in the login script? We use pam and also a special slogin which transfers tokens from one machine to another. -Hartmut Thanks, -Stephen- ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] blank view of an slavaged volume(offline-online) through its mount point
If this is a namei fileserver you can have a look into the vicep-Partition. If you give me the number of the RW-volume I can tell you the path where the files belonging to this volume can be found. Then you can do there a du to find out how much of the data still exist. If you have luck it's only the root directory of the volume which is gone. You also should run volinfo or MR-AFS's traverser to find out which files and directories are expected from meta-data view of your volume. Hartmut Hongliang Gai wrote: Hi Hartmut, The dump + restore is done. I made a mount point in my home directory for this new volume. But it is still has nothing in that, even I restart both client and server machine. All the other partions in the server are fine, never have this problem. I examine the volume, it appears ok. any further hint? Thanks, -Hongliang On Fri, 16 Jan 2004, Hartmut Reuter wrote: What says the SalvageLog? Try to dump and restore the volume vos dump volume 0 | vos restore server partition new volume If it's not an AFS bug it easily could also be a bad disk! Hartmut Hongliang Gai wrote: Hi All: I had experienced a problem with afs 1.2.2a on Redhat 7.0. One of my user suddenly could not access its afs home directory.After I examined the afs server, I found his volume appeared offline, though its backup volume is online. I followed openafs doc to salvage the offline volume, then it's back online. However, after the user cd to his home director, the directory is empty. I tried remount the volume to the directory. It does say that dir is the mount point to the volume. The size of the volume shows as before, 3G. Looks like the volume is ok. But cannot let user see it. How to fix it? Thanks in advance, -Hongliang ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos changeaddr -remove isnt removing an unreachableserver
vos remsite afsb2 a tools.java.sun131 will do it. It does only a change in the database without contacting the server afsb2. Hartmut Tim Prendergast schrieb: It looks like all of our volumes have all of our afs servers listed. How do I remove only the one from all of them so our releases work properly again? I am trying to remove afsb2 from all of our volumes because our releases all error out and do not complete to all the clients, citing the fact that they cannot reach that server. Example (we have 211 volumes): tools.java.sun131 RWrite: 536871140 ROnly: 536871443 Backup: 536871718 number of sites - 4 server afsp partition /vicepa RW Site server afsp partition /vicepa RO Site server afsb partition /vicepa RO Site server afsb2 partition /vicepa RO Site Regards, Tim Prendergast -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hartmut Reuter Sent: Monday, March 24, 2003 11:41 PM To: Tim Prendergast Cc: [EMAIL PROTECTED] Subject: Re: [OpenAFS] vos changeaddr -remove isnt removing an unreachable server there is probably still a volume entry in the vldb which points to this server. Try a vos listvldb -server id to find out. Hartmut Tim Prendergast schrieb: Hi, I am trying to remove a server from our afs cell, but upon issuing the following: vos changeaddr x.x.x.x -remove I get the following response: Could not remove server x.x.x.x from the VLDB VLDB: volume Id exists in the vldb This command is listed in the admin's guide as the way to remove obsolete servers. Since this system is no longer reachable and gone forever, how can I remove it effectively? Thanks in advance. Regards, Tim Prendergast -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Upgrading Transarc servers
Depends on the architectures your fileservers are running on. For Linux and NT Transarc had implemented the NAMEI-interface where you don't need a special fsck. Here everything should be ok. On the other architectures I suggest to use also the NAMEI-interface for some reasons: 1) You get rid of the special fsck (may be a problem with software RAIDs etc) 2) salvage of single volumes is much faster because all files of a volume group are under the same directory 3) you can dump or tar and restore partitions and you can see the files. If you switch between the traditional mechanism and NAMEI you have to move the volumes by vos move because the NAMEI-fileserver does not understand the traditional partition and vice versa. If you keep using the traditional mechanism you schould be able to just start the new binaries with the old partitions. -Hartmut Kevin Coffman wrote: I think this is the case, but wanted to verify. When upgrading from Transarc fileserver binaries to OpenAFS, there are no disk format changes? Just swap out the binaries and go. Correct? Also, there is no need to change from the Transarc fsck program. Correct? Thanks! Kevin ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Question about Large Files
Sven Oehme wrote: has anybody ever created a file 2 G in a AFS Volume ? IBM AFS is not able to so . is this possible with a Namei Fileserver on a Filesystem with Large file Support ? Yes, we have files up to 20 GB Presently this is possible only with the combination of MR-AFS servers and clients built from OpenAFS CVS. I know that rpi.edu is working on an OpenAFS fileserver with large file support. For details about the status ask R. Lindsay Todd [EMAIL PROTECTED]. Hartmut Reuter i have seen a lot of discussions already , but no clear answer .. Sven -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS volumes filesystem
For the fileserver (with NAMEI-interface which is obligatory for Linux) you may take whatever you want. We are using reiserfs, other people ext3. ext2 has the disadvantage of the slow fsck if for some reason your system should crash. Hartmut yam wrote: Hello, I'm starting up an OpenAFS installation, and I've arrived to my first dilema... What filesystem to use for openafs volumes? Ext2? Ext3? ReiserFS? XFS? Any hint? Shouldnt use any of the above? beter performance with any of those? Only ext2 is the way to go? Thanks in advance. PD: Haven't found information about this anywhere. /Yam ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AIX 5.1
Not yet, but I am working on it and I am nearly through. Hartmut Mark Campbell wrote: Is AIX 5.1 currently supported? Thanks Mark -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 'vos dump VOLUME.backup' locks VOLUME, not VOLUME.backup!
Turbo Fredriksson wrote: I wrote a script to either backup a specific volume, or all volumes. It create the backup volume 'VOLUME.backup', and mounts that on 'MOUNTPOINT/OldFiles'. But it locks the VOLUME, not the expected VOLUME.backup... And I can't seem to unlock it. 'vos unlock VOLUME' don't seem to work, and neither does 'vos unlockvldb'... The idea is to backup the user(s) volume, but still allowing rw access to the volume (incase a user is logged in). This is something I got from the 'manual', but it don't work as I had expected, what am I missing? It just released the lock, seems to be timed, took about 10 minutes or so... In the first step namely cloning or recloning the backup volume the RW volume is busy and cannot be accessed. The second step namely dumping the backup volume lets the RW-volume on-line, so access to the RW-volume should be possible. The lock of the volume during both operations (clone/reclone and dump) doesn't stop access to the RW volume. It is only necessary to prevent concurrent volserver activities on this volume group. Hartmut -- munitions Legion of Doom iodine North Korea Cocaine World Trade Center fissionable bomb Khaddafi Cuba Serbian PLO explosion assassination Waco, Texas [See http://www.aclu.org/echelonwatch/index.html for more about this] ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 'vos dump VOLUME.backup' locks VOLUME, not VOLUME.backup!
Turbo Fredriksson wrote: Hartmut == Hartmut Reuter [EMAIL PROTECTED] writes: Hartmut In the first step namely cloning or recloning the backup Hartmut volume the RW volume is busy and cannot be accessed. It's a small volume (163Mb), but it locks for about 10 minutes... Shouldn't the lock be ONLY for the number of seconds it takes for 'vos backup' to finish? The being busy and locked ends when the command vos backup has finished. Hartmut The second step namely dumping the backup volume lets the Hartmut RW-volume on-line, so access to the RW-volume should be Hartmut possible. That don't seem to happen on my system... OR, the lock while cloning the volume isn't released when the volume is finished cloning... Are you sure you are dumping really the backup volume not the RW-volume? Hartmut - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] CopyOnWrite failed - orphaned files
orphaned files and directories (approx. 587713 KB) 02/28/2002 16:02:30 Salvaged cs.usr0.naveen (536877621): 4785 files, 587714 blocks SalvageLog: @(#) OpenAFS 1.2.3 built 2002-02-01 02/28/2002 16:05:06 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepc 536877621 -orphans attach) 02/28/2002 16:05:07 SALVAGING VOLUME 536877621. 02/28/2002 16:05:07 cs.usr0.naveen (536877621) updated 02/28/2002 15:46 02/28/2002 16:05:07 Attaching orphaned directory to volume's root dir as __ORPHANDIR__.3.207280 02/28/2002 16:05:07 Attaching orphaned directory to volume's root dir as __ORPHANDIR__.11.46709 02/28/2002 16:05:07 Attaching orphaned directory to volume's root dir as __ORPHANDIR__.13.7 [ similar lines deleted ] 02/28/2002 16:05:07 Attaching orphaned file to volume's root dir as __ORPHANFILE__.7846.198967 02/28/2002 16:05:07 Attaching orphaned file to volume's root dir as __ORPHANFILE__.9320.219854 02/28/2002 16:05:07 Vnode 1: link count incorrect (was 2, now 45) 02/28/2002 16:05:07 Vnode 3: link count incorrect (was 2, now 3) 02/28/2002 16:05:07 Vnode 11: link count incorrect (was 1, now 2) 02/28/2002 16:05:07 Vnode 13: link count incorrect (was 7, now 8) 02/28/2002 16:05:07 Vnode 15: link count incorrect (was 2, now 3) [ similar lines deleted ] 02/28/2002 16:05:07 Vnode 9320: link count incorrect (was 0, now 1) 02/28/2002 16:05:07 Salvaged cs.usr0.naveen (536877621): 4785 files, 587715 blocks 02/28/2002 16:05:07 The volume header file V0536877622.vol is not associated with any actual data (deleted) The FileLog says errno 4, which is Interrupted System Call. Could this be a clue? We have a lot of older machines of many architectures here running older client software. Could that cause this problem? Is there anything I can do to instrument the servers to help find the root cause of the problem? Another problem that may or may not be related: On these same machines, the fileserver processes can get into a state such that a bos restart, shutdown or stop will not kill them and neither will a simple kill pid. You must do a kill -9 on each of the fileserver processes to make them go away. In the case of a bos restart, new fileserver processes are started without the old ones having been killed, the new processes fail, causing the salvager to start. When the salvager finishes, the fileserver processes are started again and fail again, leading to an endless fileserver-salvager-fileserver-salvager cycle. This has led to us disabling the 4:00 AM Sunday restarts. Thanks in advance for your help. ---Bob. -- Bob Hoffman, N3CVL University of Pittsburgh Tel: +1 412 624 8404 [EMAIL PROTECTED] Department of Computer Science Fax: +1 412 624 8854 ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Duplicate special inodes in volume header
Martin Schulz wrote: Hartmut Reuter [EMAIL PROTECTED] writes: Is there any way to copy the RO over the RW to get a working RW volume again? You can copy the RO to a RW volume by vos dump 536870916 0 | vos restore otherserver partition name -id 536870915 -overwrite full Nice idea, but does not work: $ vos restore iwrsun1 /vicepa -name root.afs -id 536870915 -file /tmp/working_root_afs_RO The volume root.afs 536870915 already exists in the VLDB Do you want to do a full/incremental restore or abort? [fia](a): f Volume exists; Will delete and perform full restore Restoring volume root.afs Id 536870915 on server iwrsun1.mathematik.uni-karlsruhe.de partition /vicepa ..Failed to start transaction on 536870915 Volume needs to be salvaged Error in vos restore command. Volume needs to be salvaged The salvager should be able to rebuild the volume-header in the case he finds all the volume special files. I have no idea how it was possible to get duplicate special inodes in the header, Nor do I; but - after you have dumped the RO-volume - you could try to remove the volume-header and then rerun the salvager. Hmm. I got a little closer to the problem. In the early days of this installation, I had some problems with the first server and switched over to another platform. This could be the reason for the following: In fact, the output from vos listvol and the vos listvldb were not consistent. vldb showed me: root.afs RWrite: 536870915 ROnly: 536870916 Backup: 536870917 but listvol tells: root.afs 536870912 RW 4 K On-line root.afs.backup 536870917 BK 4 K On-line root.afs.readonly 536870916 RO 5 K On-line Could not attach volume 536870915 Did you the restore to another server or to the one with the bad volume? I don't know whether the volserver and/or the salvager are clever enough to compare the names of the volumes they find. If so the 536870912 could have caused that 536870915 with the same name could not be attached. In such a case you should use vos zap server partition volume-id to get rid of the wrong volume. To do a remove using the volume name is dangerous because vos asks the vldb to translate the volume name to the volume-id and then proceeds with this one (removing possibly the wrong volume). One more reason to use the namei-Interface because you can easily see what is really there and what is missing! But if you want to migrate to namei you will have to move the volumes to another server, of course. Hartmut Reuter By know, I was able to remove the 536870912 volume, and run vos syncvldb without errors. By now, the listvol does not mention this anymore but no other root.afs as well (the root.afs.backup and .readonly are shown and working, however). A try to restore as above yields: -- Volume exists; Will delete and perform full restore Restoring volume root.afs Id 536870915 on server iwrsun1.mathematik.uni-karlsruhe.de partition /vicepa ..Failed to start transaction on 536870915 Volume needs to be salvaged Error in vos restore command. -- I cannot remove that volume: - $ vos remove iwrsun1 /vicepa 536870915 Transaction on volume 536870915 failed Volume needs to be salvaged Error in vos remove command. - And trying to salvage yield the same messages as above. Any hints greatly appreaciated, Yours, -- Martin Schulz [EMAIL PROTECTED] Uni Karlsruhe, Institut f. wissenschaftliches Rechnen u. math. Modellbildung Engesser Str. 6, 76128 Karlsruhe ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone +49-89-3299-1328 RZG (Rechenzentrum Garching) fax +49-89-3299-1301 Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info