Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
On Sunday 22 March 2009 21:46, Todd DeSantis wrote: Hi Rainer - Hi Hartmut :: Yes, of course, but what error changed the MaxVolumeId in the vleserver is still completely unclear. BTW also we had a giant jump in the volume ids some years ago, but fortunately it was not big enough to reach the sign bit. The MaxVolumeId can be changed several ways, via a vos restore and I believe a vos syncvldb or syncserv. Most likely, the initial jump was via the vos restore command. [src] vos restore -h Usage: vos restore -server machine name -partition partition name -name nam e of volume to be restored [-file dump file] [-id volume ID] [-overwrite a bort | full | incremental] [-cell cell name] [-noauth] [-localauth] [-verbose ] [-timeout timeout in seconds ] [-help] If you use the [-id volume ID] and have a typo in the volume ID, the volumeID for the volume will be out of normal sequence and this will set the MaxVolumeID to this large number. Also, I believe that a vos syncvldb or syncserv will check the volumeIDs it is playing with and will check it against the MaxVolumeID and raise MaxVolumeID if necessary. I think when we saw this happen to an AFS cell, we gave the customer a tool to reset the MaxVolumeID to a more manageable number and they restored the volumes and gave them lower IDs. Hello Todd, that was not the case when the problem arised in our cell, but nowadays that detail is not so important, while I think that the important thing is that the problem is solved for the future due to the patches that Jeffrey has announced in another posting in this threads! Giovanni -- Giovanni Bracco ENEA FIM (Servizio Informatica e Reti) Via E. Fermi 45 I-00044 Frascati (Roma) Italy phone 00-39-06-9400-5597 FAX 00-39-06-9400-5735 E-mail bra...@frascati.enea.it WWW http://www.afs.enea.it/bracco ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
On Sunday 22 March 2009 20:39, Jeffrey Altman wrote: Giovanni Bracco wrote: As I wrote in my posting, at that time (2002) my institution was using the Transarc version of AFS and the reaction from Transarc team was ...to provide us with a patched version of AFS, not to correct the issue. That version of course was not compatible with OpenAFS due to the large value of the VolIDs existing at that point in our cell. The patched version of AFS fixed the issue. The issue is that in some locations in the source a Volume Id is an unsigned 32-bit value and in others (most notably clone ids) the value is a signed 32-bit value. If a signed value is increased beyond 2^31-1 it will wrap and become a negative value. There is no condition under which a negative value will be greater than Max Volume Id. I'm sure that the fix that IBM implemented for you in 2002 was to change all of the Volume Id fields so that they are unsigned 32-bit values. IBM does not provide their internal bug reports and patches to OpenAFS so we never knew about the issue. To perform the migration to OpenAFS 3 years later we had to go through a volume renumbering campaign (more than 1000 volumes) plus an ad-hoc modification of the vl database to reset the MAxVolID to a value supported by OpenAFS. At that point do you think we should have submitted a bug on misterious event happened three years before on the Transarc AFS version? You had to do this because OpenAFS did not have the patch that IBM created and we didn't know that we needed to implement it ourselves. From the follow-up of the thread (postings by Hartmut Reuter and Rainer Toebbicke ) I see that the strange big jump in the VolID still happens and surely the issue should be solved. There are several locations where unsigned and signed 32-bit variables containing volume ids are mixed either for comparison or computation. The computation of the new maxvolid value is one such place where this takes place. It is quite likely that the mixture of signed and unsigned values resulted in signed 32-bit overflow which in turn resulted in an incorrect comparison and then assignment. This in turn would result in the big jump. I have a patch attached to ticket 124510 which will (I hope) make all references to volume ids unsigned (except in the cache manager) and avoid the problems with unsigned overflow conditions. I suspect this patch is similar to what IBM applied to their source tree in 2002. Jeffrey Altman OK, it is nice to know that hopefully the problem will be solved in the next OpenAFS release! Giovanni -- Giovanni Bracco ENEA FIM (Servizio Informatica e Reti) Via E. Fermi 45 I-00044 Frascati (Roma) Italy phone 00-39-06-9400-5597 FAX 00-39-06-9400-5735 E-mail bra...@frascati.enea.it WWW http://www.afs.enea.it/bracco ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
On Friday 20 March 2009 13:56, Jeffrey Altman wrote: Giovanni Bracco wrote: I want to point out that in the past the issue of volumes with too large ID emerged also in our cell (enea.it). At that time (2002) we still had AFS Transarc and the support provided us with a patched AFS version, able to operate with volumes having too large IDs. Before migrating to OpenAFS we had to recover the normal AFS behaviour and the procedure we did at that time (2005) was described at AFS Kerberos Best Practices Workshop 2005 in Pittsburgh http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsbur gh2005.pdf. At that time it was not clear the reason of the initial problem. Do have I to assume that now it has been identified? Giovanni This just goes to show that giving a talk at a workshop is not equivalent to submitting a bug report to openafs-b...@openafs.org. If this issue had been submitted to openafs-bugs, it would have been addressed a long time ago. The problem is quite obvious. Some of the volume id variables are signed and others are unsigned. A volume id is a volume id and the type used to represent it must be consistent. As I wrote in my posting, at that time (2002) my institution was using the Transarc version of AFS and the reaction from Transarc team was ...to provide us with a patched version of AFS, not to correct the issue. That version of course was not compatible with OpenAFS due to the large value of the VolIDs existing at that point in our cell. To perform the migration to OpenAFS 3 years later we had to go through a volume renumbering campaign (more than 1000 volumes) plus an ad-hoc modification of the vl database to reset the MAxVolID to a value supported by OpenAFS. At that point do you think we should have submitted a bug on misterious event happened three years before on the Transarc AFS version? From the follow-up of the thread (postings by Hartmut Reuter and Rainer Toebbicke ) I see that the strange big jump in the VolID still happens and surely the issue should be solved. GIovanni Jeffrey Altman ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- Giovanni Bracco ENEA FIM (Servizio Informatica e Reti) Via E. Fermi 45 I-00044 Frascati (Roma) Italy phone 00-39-06-9400-5597 FAX 00-39-06-9400-5735 E-mail bra...@frascati.enea.it WWW http://www.afs.enea.it/bracco ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Giovanni Bracco wrote: As I wrote in my posting, at that time (2002) my institution was using the Transarc version of AFS and the reaction from Transarc team was ...to provide us with a patched version of AFS, not to correct the issue. That version of course was not compatible with OpenAFS due to the large value of the VolIDs existing at that point in our cell. The patched version of AFS fixed the issue. The issue is that in some locations in the source a Volume Id is an unsigned 32-bit value and in others (most notably clone ids) the value is a signed 32-bit value. If a signed value is increased beyond 2^31-1 it will wrap and become a negative value. There is no condition under which a negative value will be greater than Max Volume Id. I'm sure that the fix that IBM implemented for you in 2002 was to change all of the Volume Id fields so that they are unsigned 32-bit values. IBM does not provide their internal bug reports and patches to OpenAFS so we never knew about the issue. To perform the migration to OpenAFS 3 years later we had to go through a volume renumbering campaign (more than 1000 volumes) plus an ad-hoc modification of the vl database to reset the MAxVolID to a value supported by OpenAFS. At that point do you think we should have submitted a bug on misterious event happened three years before on the Transarc AFS version? You had to do this because OpenAFS did not have the patch that IBM created and we didn't know that we needed to implement it ourselves. From the follow-up of the thread (postings by Hartmut Reuter and Rainer Toebbicke ) I see that the strange big jump in the VolID still happens and surely the issue should be solved. There are several locations where unsigned and signed 32-bit variables containing volume ids are mixed either for comparison or computation. The computation of the new maxvolid value is one such place where this takes place. It is quite likely that the mixture of signed and unsigned values resulted in signed 32-bit overflow which in turn resulted in an incorrect comparison and then assignment. This in turn would result in the big jump. I have a patch attached to ticket 124510 which will (I hope) make all references to volume ids unsigned (except in the cache manager) and avoid the problems with unsigned overflow conditions. I suspect this patch is similar to what IBM applied to their source tree in 2002. Jeffrey Altman ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Hi Rainer - Hi Hartmut :: Yes, of course, but what error changed the MaxVolumeId in the vleserver is still completely unclear. BTW also we had a giant jump in the volume ids some years ago, but fortunately it was not big enough to reach the sign bit. The MaxVolumeId can be changed several ways, via a vos restore and I believe a vos syncvldb or syncserv. Most likely, the initial jump was via the vos restore command. [src] vos restore -h Usage: vos restore -server machine name -partition partition name -name nam e of volume to be restored [-file dump file] [-id volume ID] [-overwrite a bort | full | incremental] [-cell cell name] [-noauth] [-localauth] [-verbose ] [-timeout timeout in seconds ] [-help] If you use the [-id volume ID] and have a typo in the volume ID, the volumeID for the volume will be out of normal sequence and this will set the MaxVolumeID to this large number. Also, I believe that a vos syncvldb or syncserv will check the volumeIDs it is playing with and will check it against the MaxVolumeID and raise MaxVolumeID if necessary. I think when we saw this happen to an AFS cell, we gave the customer a tool to reset the MaxVolumeID to a more manageable number and they restored the volumes and gave them lower IDs. Thanks Todd DeSantis
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Todd DeSantis wrote: Hi Rainer - Hi Hartmut :: Yes, of course, but what error changed the MaxVolumeId in the vleserver is still completely unclear. BTW also we had a giant jump in the volume ids some years ago, but fortunately it was not big enough to reach the sign bit. The MaxVolumeId can be changed several ways, via a vos restore and I believe a vos syncvldb or syncserv. Most likely, the initial jump was via the vos restore command. [src] vos restore -h Usage: vos restore -server machine name -partition partition name -name nam e of volume to be restored [-file dump file] [-id volume ID] [-overwrite a bort | full | incremental] [-cell cell name] [-noauth] [-localauth] [-verbose ] [-timeout timeout in seconds ] [-help] If you use the [-id volume ID] and have a typo in the volume ID, the volumeID for the volume will be out of normal sequence and this will set the MaxVolumeID to this large number. Also, I believe that a vos syncvldb or syncserv will check the volumeIDs it is playing with and will check it against the MaxVolumeID and raise MaxVolumeID if necessary. I think when we saw this happen to an AFS cell, we gave the customer a tool to reset the MaxVolumeID to a more manageable number and they restored the volumes and gave them lower IDs. Thanks Todd DeSantis Thank you Todd, when this happened the 1st time I hexedited a copy of the vldb and reset the maxVolumeId. Then having seen that the database version was still the same I just copied my modified database over the actual one. But the second time it happened I had already so many volumes with high numbers that I resigned. Since then we live with these numbers... I always had suspected ubik to have produced the jump but what you say vos restore or vos sync looks much more probable. Hartmut -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
On Thursday 19 March 2009 14:48, Jeffrey Altman wrote: McKee, Shawn wrote: Thanks Jeffrey, Yes, I agree...you have found the issue. I am surprised no one else has hit this before (maybe they have but I didn't find this problem in my searching). I guess this will take a patch to the code for existing versions to get this resolved. Thanks again, Shawn I have opened ticket 124510 for this issue. I want to point out that in the past the issue of volumes with too large ID emerged also in our cell (enea.it). At that time (2002) we still had AFS Transarc and the support provided us with a patched AFS version, able to operate with volumes having too large IDs. Before migrating to OpenAFS we had to recover the normal AFS behaviour and the procedure we did at that time (2005) was described at AFS Kerberos Best Practices Workshop 2005 in Pittsburgh http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsburgh2005.pdf. At that time it was not clear the reason of the initial problem. Do have I to assume that now it has been identified? Giovanni -- Giovanni Bracco ENEA FIM (Servizio Informatica e Reti) Via E. Fermi 45 I-00044 Frascati (Roma) Italy phone 00-39-06-9400-5597 FAX 00-39-06-9400-5735 E-mail bra...@frascati.enea.it WWW http://www.afs.enea.it/bracco ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Giovanni Bracco wrote: I want to point out that in the past the issue of volumes with too large ID emerged also in our cell (enea.it). At that time (2002) we still had AFS Transarc and the support provided us with a patched AFS version, able to operate with volumes having too large IDs. Before migrating to OpenAFS we had to recover the normal AFS behaviour and the procedure we did at that time (2005) was described at AFS Kerberos Best Practices Workshop 2005 in Pittsburgh http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsburgh2005.pdf. At that time it was not clear the reason of the initial problem. Do have I to assume that now it has been identified? Giovanni This just goes to show that giving a talk at a workshop is not equivalent to submitting a bug report to openafs-b...@openafs.org. If this issue had been submitted to openafs-bugs, it would have been addressed a long time ago. The problem is quite obvious. Some of the volume id variables are signed and others are unsigned. A volume id is a volume id and the type used to represent it must be consistent. Jeffrey Altman ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Jeffrey Altman wrote: Giovanni Bracco wrote: I want to point out that in the past the issue of volumes with too large ID emerged also in our cell (enea.it). At that time (2002) we still had AFS Transarc and the support provided us with a patched AFS version, able to operate with volumes having too large IDs. Before migrating to OpenAFS we had to recover the normal AFS behaviour and the procedure we did at that time (2005) was described at AFS Kerberos Best Practices Workshop 2005 in Pittsburgh http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsburgh2005.pdf. At that time it was not clear the reason of the initial problem. Do have I to assume that now it has been identified? Giovanni This just goes to show that giving a talk at a workshop is not equivalent to submitting a bug report to openafs-b...@openafs.org. If this issue had been submitted to openafs-bugs, it would have been addressed a long time ago. The problem is quite obvious. Some of the volume id variables are signed and others are unsigned. A volume id is a volume id and the type used to represent it must be consistent. Yes, of course, but what error changed the MaxVolumeId in the vleserver is still completely unclear. BTW also we had a giant jump in the volume ids some years ago, but fortunately it was not big enough to reach the sign bit. Hartmut Jeffrey Altman ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Hartmut Reuter schrieb: Yes, of course, but what error changed the MaxVolumeId in the vleserver is still completely unclear. BTW also we had a giant jump in the volume ids some years ago, but fortunately it was not big enough to reach the sign bit. Hartmut Me too! Here, here!! Last year we started a gigantic backup volrestore (not diskrestore) after a double disk failure in a RAID 5. Very early in the process the maximum volume ID went from the usual 530-odd million to 1.9 billion. No obvious bit pattern in sight, and no obvious place where this could have gone wrong, and of course backup volrestore has been seen getting that right already. I even knew *when* it had happened with a precision of about 1 minute, but that didn't help. Being a bit nostalgic I thought about zapping the VLDB to something in the traditional range again, apart from this bug the possibility that we ever overrun numbers of the new family is pretty remote... -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Rainer Toebbicke European Laboratory for Particle Physics(CERN) - Geneva, Switzerland Phone: +41 22 767 8985 Fax: +41 22 767 7155 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
What says the VolserLog on the source server? -Hartmut McKee, Shawn wrote: Hi Everyone, I am having a problem trying to 'vos move' volumes after losing/restoring an AFS file server. The server that was lost has been restored on new hardware. The old RW volumes were moved to other servers (convertROtoRW) and now I want to use the 'vos move' command to move them back. Here is what happens (I have tokens as 'admin'. Linat07 is the current RW home for OSGWN and Linat08 is the new server): vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose Starting transaction on source volume 536874901 ... done Allocating new volume id for clone of volume 536874901 ... done Cloning source volume 536874901 ... done Ending the transaction on the source volume 536874901 ... done Starting transaction on the cloned volume 2681864210 ... Failed to start a transaction on the cloned volume2681864210 Volume not attached, does not exist, or not on line vos move: operation interrupted, cleanup in progress... clear transaction contexts Recovery: Releasing VLDB lock on volume 536874901 ... done Recovery: Accessing VLDB. move incomplete - attempt cleanup of target partition - no guarantee Recovery: Creating transaction for destination volume 536874901 ... Recovery: Unable to start transaction on destination volume 536874901. Recovery: Creating transaction on source volume 536874901 ... done Recovery: Setting flags on source volume 536874901 ... done Recovery: Ending transaction on source volume 536874901 ... done Recovery: Creating transaction on clone volume 2681864210 ... Recovery: Unable to start transaction on source volume 536874901. Recovery: Releasing lock on VLDB entry for volume 536874901 ... done cleanup complete - user verify desired result [linat08:local]# vos examine 2681864210 Could not fetch the entry for volume number 18446744072096448530 from VLDB I am assuming the large cloned volume ID is causing the problem as opposed to an inability to create a cloned volume. I can make replicas on linat08 for existing volumes without a problem. NOTE: The hex representations of the cloned volume from the move attempt above and the 'vos examine': [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]# 18446744072096448530 = 0x 9FDA0012 Any suggestions? This seems like a 64 vs 32 bit issue. Here is the information on servers and versions: We have 3 AFS DB servers: Linat02 - RHEL5/x86_64 - OpenAFS 1.4.7 Linat03 - RHEL4/i686- OpenAFS 1.4.6 Linat04 - RHEL5/x86_64 - OpenAFS 1.4.7 We have 3 AFS file servers: Linat06 - RHEL4/x86_64 - OpenAFS 1.4.6 Linat07 - RHEL4/x86_64 - OpenAFS 1.4.6 Linat08 - RHEL5/x86_64 - OpenAFS 1.4.8 Info on OSGWN volume: [linat08:~]# vos examine OSGWN OSGWN 536874901 RW 505153 K On-line linat07.grid.umich.edu /vicepf RWrite 536874901 ROnly 18446744072096448530 Backup 0 MaxQuota200 K CreationTue Mar 3 03:43:06 2009 Copy Mon Dec 3 16:39:21 2007 Backup Never Last Update Sat Feb 21 15:18:05 2009 0 accesses in the past day (i.e., vnode references) RWrite: 536874901 ROnly: 536874902 number of sites - 2 server linat07.grid.umich.edu partition /vicepf RW Site server linat06.grid.umich.edu partition /vicepe RO Site Let me know if there is other info required to help resolve this. Thanks, Shawn McKee University of Michigan/ATLAS Group ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
McKee, Shawn wrote: Could not fetch the entry for volume number 18446744072096448530 from VLDB I am assuming the large cloned volume ID is causing the problem as opposed to an inability to create a cloned volume. I can make replicas on linat08 for existing volumes without a problem. NOTE: The hex representations of the cloned volume from the move attempt above and the 'vos examine': [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]# 18446744072096448530 = 0x 9FDA0012 Any suggestions? This seems like a 64 vs 32 bit issue. Its a signed vs unsigned int problem. afsint.h and afscbint.h include typedef afs_uint32 VolumeId; which is used throughout the src/vol package for volume ids. In volser/vos.c and volser/vsprocs.c the volume id variables are defined as signed 32-bit (afs_int32). There are also some signed vs unsigned issued in some of the protocol structures. I believe the type VolumeId should be used consistently to define types of volume id variables. In vldbint.xg: struct [nu]vldbentry uses afs_uint32 for the volumeId array but afs_int32 for the cloneId. Same for VldbUpdateEntry. VldbListByAttributes uses afs_int32 for the volumeid. This is not a full examination but I believe it shows where the problem lies. Jeffrey Altman smime.p7s Description: S/MIME Cryptographic Signature
RE: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Thanks for the quick reply Hartmut. Here are the relevant lines from the attempt below: Wed Mar 18 10:50:14 2009 1 Volser: Clone: Cloning volume 536874901 to new volume 2681864210 Wed Mar 18 10:50:15 2009 VAttachVolume: Failed to open /vicepf/V184467440720964485 (errno 2) Wed Mar 18 10:50:15 2009 VAttachVolume: Failed to open /vicepf/V184467440720964485 (errno 2) Shawn -Original Message- From: Hartmut Reuter [mailto:reu...@rzg.mpg.de] Sent: Thursday, March 19, 2009 9:01 AM To: McKee, Shawn Cc: openafs-info@openafs.org Subject: Re: [OpenAFS] Unable to 'move' volumevolume ID too large / cloned volume not findable? What says the VolserLog on the source server? -Hartmut McKee, Shawn wrote: Hi Everyone, I am having a problem trying to 'vos move' volumes after losing/restoring an AFS file server. The server that was lost has been restored on new hardware. The old RW volumes were moved to other servers (convertROtoRW) and now I want to use the 'vos move' command to move them back. Here is what happens (I have tokens as 'admin'. Linat07 is the current RW home for OSGWN and Linat08 is the new server): vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose Starting transaction on source volume 536874901 ... done Allocating new volume id for clone of volume 536874901 ... done Cloning source volume 536874901 ... done Ending the transaction on the source volume 536874901 ... done Starting transaction on the cloned volume 2681864210 ... Failed to start a transaction on the cloned volume2681864210 Volume not attached, does not exist, or not on line vos move: operation interrupted, cleanup in progress... clear transaction contexts Recovery: Releasing VLDB lock on volume 536874901 ... done Recovery: Accessing VLDB. move incomplete - attempt cleanup of target partition - no guarantee Recovery: Creating transaction for destination volume 536874901 ... Recovery: Unable to start transaction on destination volume 536874901. Recovery: Creating transaction on source volume 536874901 ... done Recovery: Setting flags on source volume 536874901 ... done Recovery: Ending transaction on source volume 536874901 ... done Recovery: Creating transaction on clone volume 2681864210 ... Recovery: Unable to start transaction on source volume 536874901. Recovery: Releasing lock on VLDB entry for volume 536874901 ... done cleanup complete - user verify desired result [linat08:local]# vos examine 2681864210 Could not fetch the entry for volume number 18446744072096448530 from VLDB I am assuming the large cloned volume ID is causing the problem as opposed to an inability to create a cloned volume. I can make replicas on linat08 for existing volumes without a problem. NOTE: The hex representations of the cloned volume from the move attempt above and the 'vos examine': [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]# 18446744072096448530 = 0x 9FDA0012 Any suggestions? This seems like a 64 vs 32 bit issue. Here is the information on servers and versions: We have 3 AFS DB servers: Linat02 - RHEL5/x86_64 - OpenAFS 1.4.7 Linat03 - RHEL4/i686- OpenAFS 1.4.6 Linat04 - RHEL5/x86_64 - OpenAFS 1.4.7 We have 3 AFS file servers: Linat06 - RHEL4/x86_64 - OpenAFS 1.4.6 Linat07 - RHEL4/x86_64 - OpenAFS 1.4.6 Linat08 - RHEL5/x86_64 - OpenAFS 1.4.8 Info on OSGWN volume: [linat08:~]# vos examine OSGWN OSGWN 536874901 RW 505153 K On-line linat07.grid.umich.edu /vicepf RWrite 536874901 ROnly 18446744072096448530 Backup 0 MaxQuota200 K CreationTue Mar 3 03:43:06 2009 Copy Mon Dec 3 16:39:21 2007 Backup Never Last Update Sat Feb 21 15:18:05 2009 0 accesses in the past day (i.e., vnode references) RWrite: 536874901 ROnly: 536874902 number of sites - 2 server linat07.grid.umich.edu partition /vicepf RW Site server linat06.grid.umich.edu partition /vicepe RO Site Let me know if there is other info required to help resolve this. Thanks, Shawn McKee University of Michigan/ATLAS Group ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- - Hartmut Reuter e-mail reu...@rzg.mpg.de phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
RE: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
Thanks Jeffrey, Yes, I agree...you have found the issue. I am surprised no one else has hit this before (maybe they have but I didn't find this problem in my searching). I guess this will take a patch to the code for existing versions to get this resolved. Thanks again, Shawn -Original Message- From: Jeffrey Altman [mailto:jalt...@secure-endpoints.com] Sent: Thursday, March 19, 2009 9:19 AM To: McKee, Shawn Cc: openafs-info@openafs.org Subject: Re: [OpenAFS] Unable to 'move' volumevolume ID too large / cloned volume not findable? McKee, Shawn wrote: Could not fetch the entry for volume number 18446744072096448530 from VLDB I am assuming the large cloned volume ID is causing the problem as opposed to an inability to create a cloned volume. I can make replicas on linat08 for existing volumes without a problem. NOTE: The hex representations of the cloned volume from the move attempt above and the 'vos examine': [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]# 18446744072096448530 = 0x 9FDA0012 Any suggestions? This seems like a 64 vs 32 bit issue. Its a signed vs unsigned int problem. afsint.h and afscbint.h include typedef afs_uint32 VolumeId; which is used throughout the src/vol package for volume ids. In volser/vos.c and volser/vsprocs.c the volume id variables are defined as signed 32-bit (afs_int32). There are also some signed vs unsigned issued in some of the protocol structures. I believe the type VolumeId should be used consistently to define types of volume id variables. In vldbint.xg: struct [nu]vldbentry uses afs_uint32 for the volumeId array but afs_int32 for the cloneId. Same for VldbUpdateEntry. VldbListByAttributes uses afs_int32 for the volumeid. This is not a full examination but I believe it shows where the problem lies. Jeffrey Altman smime.p7s Description: S/MIME cryptographic signature
Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?
McKee, Shawn wrote: Thanks Jeffrey, Yes, I agree...you have found the issue. I am surprised no one else has hit this before (maybe they have but I didn't find this problem in my searching). I guess this will take a patch to the code for existing versions to get this resolved. Thanks again, Shawn I have opened ticket 124510 for this issue. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info