Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-23 Thread Giovanni Bracco
On Sunday 22 March 2009 21:46, Todd DeSantis wrote:
 Hi Rainer - Hi Hartmut ::
  Yes, of course, but what error changed the MaxVolumeId in the vleserver
  is still completely unclear. BTW also we had a giant jump in the volume
  ids some years ago, but fortunately it was not big enough to reach the
  sign bit.

 The MaxVolumeId can be changed several ways, via a vos restore and I
 believe a vos syncvldb or syncserv.

 Most likely, the initial jump was via the vos restore command.

 [src] vos restore -h
 Usage: vos restore -server machine name -partition partition name -name
 nam
 e of volume to be restored [-file dump file] [-id volume ID]
 [-overwrite a
 bort | full | incremental] [-cell cell name] [-noauth] [-localauth]
 [-verbose
 ] [-timeout timeout in seconds ] [-help]

 If you use the [-id volume ID] and have a typo in the volume ID, the
 volumeID for the volume will be out of normal sequence and this will set
 the MaxVolumeID to this large number.

 Also, I believe that a vos syncvldb or syncserv will check the volumeIDs
 it is playing with and will check it against the MaxVolumeID and raise
 MaxVolumeID if necessary.

 I think when we saw this happen to an AFS cell, we gave the customer a
 tool to reset the MaxVolumeID to a more manageable number and they
 restored the volumes and gave them lower IDs.

Hello Todd,
that was not the case when the problem arised in our cell, but nowadays that 
detail is not so important, while I think that the important thing is that 
the problem is solved for the future due to the patches that Jeffrey has 
announced in another posting in this threads!

Giovanni

-- 
Giovanni Bracco
ENEA FIM
(Servizio Informatica e Reti)
Via E. Fermi 45
I-00044 Frascati (Roma) Italy
phone 00-39-06-9400-5597
FAX   00-39-06-9400-5735
E-mail  bra...@frascati.enea.it
WWW http://www.afs.enea.it/bracco
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-23 Thread Giovanni Bracco
On Sunday 22 March 2009 20:39, Jeffrey Altman wrote:
 Giovanni Bracco wrote:
  As I wrote in my posting, at that time (2002) my institution was using
  the Transarc version of AFS and the reaction from  Transarc team was
  ...to provide us with a patched version of AFS, not to correct the issue.
  That version of course was not compatible with OpenAFS due to the large
  value of the VolIDs existing at that point in our cell.

 The patched version of AFS fixed the issue.  The issue is that in some
 locations in the source a Volume Id is an unsigned 32-bit value and in
 others (most notably clone ids) the value is a signed 32-bit value.
 If a signed value is increased beyond 2^31-1 it will wrap and become a
 negative value.  There is no condition under which a negative value will
 be greater than Max Volume Id.

 I'm sure that the fix that IBM implemented for you in 2002 was to change
 all of the Volume Id fields so that they are unsigned 32-bit values.
 IBM does not provide their internal bug reports and patches to OpenAFS
 so we never knew about the issue.

  To perform the migration to OpenAFS 3 years later we had to go through a
  volume renumbering campaign (more than 1000 volumes) plus an ad-hoc
  modification of the vl database to reset the MAxVolID to a value
  supported by OpenAFS. At that point do you think we should have submitted
  a bug on misterious event happened three years before on the Transarc AFS
  version?

 You had to do this because OpenAFS did not have the patch that IBM
 created and we didn't know that we needed to implement it ourselves.

  From the  follow-up of the thread (postings by Hartmut Reuter and  Rainer
  Toebbicke )  I see that the strange big jump in the VolID still happens
  and surely the issue should be solved.

 There are several locations where unsigned and signed 32-bit variables
 containing volume ids are mixed either for comparison or computation.
 The computation of the new maxvolid value is one such place where this
 takes place.  It is quite likely that the mixture of signed and unsigned
 values resulted in signed 32-bit overflow which in turn resulted in an
 incorrect comparison and then assignment.  This in turn would result in
 the big jump.

 I have a patch attached to ticket 124510 which will (I hope) make all
 references to volume ids unsigned (except in the cache manager) and
 avoid the problems with unsigned overflow conditions.

 I suspect this patch is similar to what IBM applied to their source
 tree in 2002.

 Jeffrey Altman

OK, it is nice to know that hopefully the problem will be solved in the next 
OpenAFS release!

Giovanni


-- 
Giovanni Bracco
ENEA FIM
(Servizio Informatica e Reti)
Via E. Fermi 45
I-00044 Frascati (Roma) Italy
phone 00-39-06-9400-5597
FAX   00-39-06-9400-5735
E-mail  bra...@frascati.enea.it
WWW http://www.afs.enea.it/bracco
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-22 Thread Giovanni Bracco
On Friday 20 March 2009 13:56, Jeffrey Altman wrote:
 Giovanni Bracco wrote:
  I want to point out that in the past the issue of volumes with too
  large ID emerged also in our cell (enea.it). At that time (2002) we
  still had AFS Transarc and the support provided us with a patched  AFS
  version, able to operate with volumes having  too large IDs.
 
  Before migrating to OpenAFS we had to recover the normal AFS behaviour
  and the procedure we did at that time (2005) was described at
  AFS  Kerberos Best Practices Workshop 2005 in Pittsburgh
  http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsbur
 gh2005.pdf.
 
  At that time it was not clear the reason of the initial  problem. Do have
  I to assume that now it has been identified?
 
  Giovanni

 This just goes to show that giving a talk at a workshop is not
 equivalent to submitting a bug report to openafs-b...@openafs.org.
 If this issue had been submitted to openafs-bugs, it would have been
 addressed a long time ago.   The problem is quite obvious.  Some of the
 volume id variables are signed and others are unsigned.  A volume id is
 a volume id and the type used to represent it must be consistent.

As I wrote in my posting, at that time (2002) my institution was using the 
Transarc version of AFS and the reaction from  Transarc team was ...to 
provide us with a patched version of AFS, not to correct the issue. That 
version of course was not compatible with OpenAFS due to the large value of 
the VolIDs existing at that point in our cell. 

To perform the migration to OpenAFS 3 years later we had to go through a  
volume renumbering campaign (more than 1000 volumes) plus an ad-hoc 
modification of the vl database to reset the MAxVolID to a value supported by 
OpenAFS. At that point do you think we should have submitted a bug on 
misterious event happened three years before on the Transarc AFS version?

From the  follow-up of the thread (postings by Hartmut Reuter and  Rainer 
Toebbicke )  I see that the strange big jump in the VolID still happens and 
surely the issue should be solved.

GIovanni



 Jeffrey Altman


 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info

-- 
Giovanni Bracco
ENEA FIM
(Servizio Informatica e Reti)
Via E. Fermi 45
I-00044 Frascati (Roma) Italy
phone 00-39-06-9400-5597
FAX   00-39-06-9400-5735
E-mail  bra...@frascati.enea.it
WWW http://www.afs.enea.it/bracco
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-22 Thread Jeffrey Altman
Giovanni Bracco wrote:
 As I wrote in my posting, at that time (2002) my institution was using the 
 Transarc version of AFS and the reaction from  Transarc team was ...to 
 provide us with a patched version of AFS, not to correct the issue. That 
 version of course was not compatible with OpenAFS due to the large value of 
 the VolIDs existing at that point in our cell. 

The patched version of AFS fixed the issue.  The issue is that in some
locations in the source a Volume Id is an unsigned 32-bit value and in
others (most notably clone ids) the value is a signed 32-bit value.
If a signed value is increased beyond 2^31-1 it will wrap and become a
negative value.  There is no condition under which a negative value will
be greater than Max Volume Id.

I'm sure that the fix that IBM implemented for you in 2002 was to change
all of the Volume Id fields so that they are unsigned 32-bit values.
IBM does not provide their internal bug reports and patches to OpenAFS
so we never knew about the issue.

 To perform the migration to OpenAFS 3 years later we had to go through a  
 volume renumbering campaign (more than 1000 volumes) plus an ad-hoc 
 modification of the vl database to reset the MAxVolID to a value supported by 
 OpenAFS. At that point do you think we should have submitted a bug on 
 misterious event happened three years before on the Transarc AFS version?

You had to do this because OpenAFS did not have the patch that IBM
created and we didn't know that we needed to implement it ourselves.

 From the  follow-up of the thread (postings by Hartmut Reuter and  Rainer 
 Toebbicke )  I see that the strange big jump in the VolID still happens and 
 surely the issue should be solved.

There are several locations where unsigned and signed 32-bit variables
containing volume ids are mixed either for comparison or computation.
The computation of the new maxvolid value is one such place where this
takes place.  It is quite likely that the mixture of signed and unsigned
values resulted in signed 32-bit overflow which in turn resulted in an
incorrect comparison and then assignment.  This in turn would result in
the big jump.

I have a patch attached to ticket 124510 which will (I hope) make all
references to volume ids unsigned (except in the cache manager) and
avoid the problems with unsigned overflow conditions.

I suspect this patch is similar to what IBM applied to their source
tree in 2002.

Jeffrey Altman


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-22 Thread Todd DeSantis
Hi Rainer - Hi Hartmut ::


 Yes, of course, but what error changed the MaxVolumeId in the vleserver
 is still completely unclear. BTW also we had a giant jump in the volume
 ids some years ago, but fortunately it was not big enough to reach the
 sign bit.


The MaxVolumeId can be changed several ways, via a vos restore and I
believe a vos syncvldb or syncserv.

Most likely, the initial jump was via the vos restore command.

[src] vos restore -h
Usage: vos restore -server machine name -partition partition name -name
nam
e of volume to be restored [-file dump file] [-id volume ID]
[-overwrite a
bort | full | incremental] [-cell cell name] [-noauth] [-localauth]
[-verbose
] [-timeout timeout in seconds ] [-help]

If you use the [-id volume ID] and have a typo in the volume ID, the
volumeID for the volume will be out of normal sequence and this will set
the MaxVolumeID to this large number.

Also, I believe that a vos syncvldb or syncserv will check the volumeIDs
it is playing with and will check it against the MaxVolumeID and raise
MaxVolumeID if necessary.

I think when we saw this happen to an AFS cell, we gave the customer a
tool to reset the MaxVolumeID to a more manageable number and they
restored the volumes and gave them lower IDs.

Thanks

Todd DeSantis


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-22 Thread Hartmut Reuter
Todd DeSantis wrote:
 Hi Rainer - Hi Hartmut ::
 

 Yes, of course, but what error changed the MaxVolumeId in the vleserver
 is still completely unclear. BTW also we had a giant jump in the volume
 ids some years ago, but fortunately it was not big enough to reach the
 sign bit.

 
 The MaxVolumeId can be changed several ways, via a vos restore and I
 believe a vos syncvldb or syncserv.
 
 Most likely, the initial jump was via the vos restore command.
 
 [src] vos restore -h
 Usage: vos restore -server machine name -partition partition name
 -name nam
 e of volume to be restored [-file dump file] [-id volume ID]
 [-overwrite a
 bort | full | incremental] [-cell cell name] [-noauth] [-localauth]
 [-verbose
 ] [-timeout timeout in seconds ] [-help]
 
 If you use the [-id volume ID] and have a typo in the volume ID, the
 volumeID for the volume will be out of normal sequence and this will set
 the MaxVolumeID to this large number.
 
 Also, I believe that a vos syncvldb or syncserv will check the volumeIDs
 it is playing with and will check it against the MaxVolumeID and raise
 MaxVolumeID if necessary.
 
 I think when we saw this happen to an AFS cell, we gave the customer a
 tool to reset the MaxVolumeID to a more manageable number and they
 restored the volumes and gave them lower IDs.
 
 Thanks
 
 Todd DeSantis
 
Thank you Todd,

when this happened the 1st time I hexedited a copy of the vldb and reset
the maxVolumeId. Then having seen that the database version was still
the same I just copied my modified database over the actual one. But the
second time it happened I had already so many volumes with high numbers
that I resigned. Since then we live with these numbers...

I always had suspected ubik to have produced the jump but what you say
vos restore or vos sync looks much more probable.

Hartmut

-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-20 Thread Giovanni Bracco
On Thursday 19 March 2009 14:48, Jeffrey Altman wrote:
 McKee, Shawn wrote:
  Thanks Jeffrey,
 
  Yes, I agree...you have found the issue.  I  am surprised no one else has
  hit this before (maybe they have but I didn't find this problem in my
  searching).
 
  I guess this will take a patch to the code for existing versions to get
  this resolved.
 
  Thanks again,
 
  Shawn

 I have opened ticket 124510 for this issue.

I want to point out that in the past the issue of volumes with too large ID
emerged also in our cell (enea.it). At that time (2002) we still had AFS 
Transarc and the support provided us with a patched  AFS version, able to 
operate with volumes having  too large IDs.

Before migrating to OpenAFS we had to recover the normal AFS behaviour and the 
procedure we did at that time (2005) was described at 
AFS  Kerberos Best Practices Workshop 2005 in Pittsburgh
http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsburgh2005.pdf.

At that time it was not clear the reason of the initial  problem. Do have I to 
assume that now it has been identified?

Giovanni




-- 
Giovanni Bracco
ENEA FIM
(Servizio Informatica e Reti)
Via E. Fermi 45
I-00044 Frascati (Roma) Italy
phone 00-39-06-9400-5597
FAX   00-39-06-9400-5735
E-mail  bra...@frascati.enea.it
WWW http://www.afs.enea.it/bracco
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-20 Thread Jeffrey Altman
Giovanni Bracco wrote:
 I want to point out that in the past the issue of volumes with too large ID
 emerged also in our cell (enea.it). At that time (2002) we still had AFS 
 Transarc and the support provided us with a patched  AFS version, able to 
 operate with volumes having  too large IDs.
 
 Before migrating to OpenAFS we had to recover the normal AFS behaviour and 
 the 
 procedure we did at that time (2005) was described at 
 AFS  Kerberos Best Practices Workshop 2005 in Pittsburgh
 http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsburgh2005.pdf.
 
 At that time it was not clear the reason of the initial  problem. Do have I 
 to 
 assume that now it has been identified?
 
 Giovanni

This just goes to show that giving a talk at a workshop is not
equivalent to submitting a bug report to openafs-b...@openafs.org.
If this issue had been submitted to openafs-bugs, it would have been
addressed a long time ago.   The problem is quite obvious.  Some of the
volume id variables are signed and others are unsigned.  A volume id is
a volume id and the type used to represent it must be consistent.

Jeffrey Altman


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-20 Thread Hartmut Reuter
Jeffrey Altman wrote:
 Giovanni Bracco wrote:
 I want to point out that in the past the issue of volumes with too large ID
 emerged also in our cell (enea.it). At that time (2002) we still had AFS 
 Transarc and the support provided us with a patched  AFS version, able to 
 operate with volumes having  too large IDs.

 Before migrating to OpenAFS we had to recover the normal AFS behaviour and 
 the 
 procedure we did at that time (2005) was described at 
 AFS  Kerberos Best Practices Workshop 2005 in Pittsburgh
 http://workshop.openafs.org/afsbpw05/talks/VirtualAFScell_Bracco_Pittsburgh2005.pdf.

 At that time it was not clear the reason of the initial  problem. Do have I 
 to 
 assume that now it has been identified?

 Giovanni
 
 This just goes to show that giving a talk at a workshop is not
 equivalent to submitting a bug report to openafs-b...@openafs.org.
 If this issue had been submitted to openafs-bugs, it would have been
 addressed a long time ago.   The problem is quite obvious.  Some of the
 volume id variables are signed and others are unsigned.  A volume id is
 a volume id and the type used to represent it must be consistent.

Yes, of course, but what error changed the MaxVolumeId in the vleserver
is still completely unclear. BTW also we had a giant jump in the volume
ids some years ago, but fortunately it was not big enough to reach the
sign bit.

Hartmut

 
 Jeffrey Altman
 
 
 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-20 Thread Rainer Toebbicke

Hartmut Reuter schrieb:



Yes, of course, but what error changed the MaxVolumeId in the vleserver
is still completely unclear. BTW also we had a giant jump in the volume
ids some years ago, but fortunately it was not big enough to reach the
sign bit.

Hartmut




Me too! Here, here!!

Last year we started a gigantic backup volrestore (not diskrestore) after a 
double disk failure in a RAID 5.


Very early in the process the maximum volume ID went from the usual 530-odd 
million to 1.9 billion. No obvious bit pattern in sight, and no obvious place 
where this could have gone wrong, and of course backup volrestore has been 
seen getting that right already. I even knew *when* it had happened with a 
precision of about 1 minute, but that didn't help.


Being a bit nostalgic I thought about zapping the VLDB to something in the 
traditional range again, apart from this bug the possibility that we ever 
overrun numbers of the new family is pretty remote...


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985   Fax: +41 22 767 7155
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread Hartmut Reuter

What says the VolserLog on the source server?

-Hartmut
McKee, Shawn wrote:
 Hi Everyone,
 
 I am having a problem trying to 'vos move' volumes after
 losing/restoring an AFS file server.   The server that was lost has
 been restored on new hardware.  The old RW volumes were moved to
 other servers (convertROtoRW) and now I want to use the 'vos move'
 command to move them back.
 
 Here is what happens (I have tokens as 'admin'.  Linat07 is the
 current RW home for OSGWN and Linat08 is the new server):
 
 vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose Starting
 transaction on source volume 536874901 ... done Allocating new volume
 id for clone of volume 536874901 ... done Cloning source volume
 536874901 ... done Ending the transaction on the source volume
 536874901 ... done Starting transaction on the cloned volume
 2681864210 ... Failed to start a transaction on the cloned
 volume2681864210 Volume not attached, does not exist, or not on line 
 vos move: operation interrupted, cleanup in progress... clear
 transaction contexts Recovery: Releasing VLDB lock on volume
 536874901 ... done Recovery: Accessing VLDB. move incomplete -
 attempt cleanup of target partition - no guarantee Recovery: Creating
 transaction for destination volume 536874901 ... Recovery: Unable to
 start transaction on destination volume 536874901. Recovery: Creating
 transaction on source volume 536874901 ... done Recovery: Setting
 flags on source volume 536874901 ... done Recovery: Ending
 transaction on source volume 536874901 ... done Recovery: Creating
 transaction on clone volume 2681864210 ... Recovery: Unable to start
 transaction on source volume 536874901. Recovery: Releasing lock on
 VLDB entry for volume 536874901 ... done cleanup complete - user
 verify desired result [linat08:local]# vos examine  2681864210 Could
 not fetch the entry for volume number 18446744072096448530 from VLDB
 
 I am assuming the large cloned volume ID is causing the problem as
 opposed to an inability to create a cloned volume.  I can make
 replicas on linat08 for existing volumes without a problem.
 
 NOTE: The hex representations of the cloned volume from the move
 attempt above and the 'vos examine':
 
 [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]#
 18446744072096448530 = 0x 9FDA0012
 
 Any suggestions?   This seems like a 64 vs 32 bit issue.
 
 Here is the information on servers and versions:
 
 We have 3 AFS DB servers: Linat02 - RHEL5/x86_64  -  OpenAFS 1.4.7 
 Linat03 - RHEL4/i686-  OpenAFS 1.4.6 Linat04 - RHEL5/x86_64  -
 OpenAFS 1.4.7
 
 We have 3 AFS file servers: Linat06 - RHEL4/x86_64  -  OpenAFS 1.4.6 
 Linat07 - RHEL4/x86_64  -  OpenAFS 1.4.6 Linat08 - RHEL5/x86_64  -
 OpenAFS 1.4.8
 
 Info on OSGWN volume:
 
 [linat08:~]# vos examine OSGWN OSGWN
 536874901 RW 505153 K  On-line linat07.grid.umich.edu /vicepf 
 RWrite  536874901 ROnly 18446744072096448530 Backup  0 
 MaxQuota200 K CreationTue Mar  3 03:43:06 2009 Copy
 Mon Dec  3 16:39:21 2007 Backup  Never Last Update Sat Feb 21
 15:18:05 2009 0 accesses in the past day (i.e., vnode references)
 
 RWrite: 536874901 ROnly: 536874902 number of sites - 2 server
 linat07.grid.umich.edu partition /vicepf RW Site server
 linat06.grid.umich.edu partition /vicepe RO Site
 
 Let me know if there is other info required to help resolve this.
 
 Thanks,
 
 Shawn McKee University of Michigan/ATLAS Group 
 ___ OpenAFS-info mailing
 list OpenAFS-info@openafs.org 
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread Jeffrey Altman
McKee, Shawn wrote:
 Could not fetch the entry for volume number 18446744072096448530 from VLDB
 
 I am assuming the large cloned volume ID is causing the problem as opposed 
 to an inability to create a cloned volume.  I can make replicas on linat08 
 for existing volumes without a problem.
 
 NOTE: The hex representations of the cloned volume from the move attempt 
 above and the 'vos examine':
 
 [linat08:local]# 2681864210 = 0x 9FDA0012
 [linat08:local]# 18446744072096448530 = 0x 9FDA0012
 
 Any suggestions?   This seems like a 64 vs 32 bit issue.

Its a signed vs unsigned int problem.

afsint.h and afscbint.h include

 typedef afs_uint32 VolumeId;

which is used throughout the src/vol package for volume ids.
In volser/vos.c and volser/vsprocs.c the volume id variables are
defined as signed 32-bit (afs_int32).

There are also some signed vs unsigned issued in some of the
protocol structures.  I believe the type VolumeId should be
used consistently to define types of volume id variables.

In vldbint.xg:

struct [nu]vldbentry uses afs_uint32 for the volumeId array but
afs_int32 for the cloneId.  Same for VldbUpdateEntry.

VldbListByAttributes uses afs_int32 for the volumeid.

This is not a full examination but I believe it shows where the
problem lies.

Jeffrey Altman









smime.p7s
Description: S/MIME Cryptographic Signature


RE: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread McKee, Shawn
Thanks for the quick reply Hartmut.

Here are the relevant lines from the attempt below:

Wed Mar 18 10:50:14 2009 1 Volser: Clone: Cloning volume 536874901 to new 
volume 2681864210
Wed Mar 18 10:50:15 2009 VAttachVolume: Failed to open 
/vicepf/V184467440720964485 (errno 2)
Wed Mar 18 10:50:15 2009 VAttachVolume: Failed to open 
/vicepf/V184467440720964485 (errno 2)

Shawn

-Original Message-
From: Hartmut Reuter [mailto:reu...@rzg.mpg.de] 
Sent: Thursday, March 19, 2009 9:01 AM
To: McKee, Shawn
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] Unable to 'move' volumevolume ID too large / cloned 
volume not findable?


What says the VolserLog on the source server?

-Hartmut
McKee, Shawn wrote:
 Hi Everyone,
 
 I am having a problem trying to 'vos move' volumes after
 losing/restoring an AFS file server.   The server that was lost has
 been restored on new hardware.  The old RW volumes were moved to
 other servers (convertROtoRW) and now I want to use the 'vos move'
 command to move them back.
 
 Here is what happens (I have tokens as 'admin'.  Linat07 is the
 current RW home for OSGWN and Linat08 is the new server):
 
 vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose Starting
 transaction on source volume 536874901 ... done Allocating new volume
 id for clone of volume 536874901 ... done Cloning source volume
 536874901 ... done Ending the transaction on the source volume
 536874901 ... done Starting transaction on the cloned volume
 2681864210 ... Failed to start a transaction on the cloned
 volume2681864210 Volume not attached, does not exist, or not on line 
 vos move: operation interrupted, cleanup in progress... clear
 transaction contexts Recovery: Releasing VLDB lock on volume
 536874901 ... done Recovery: Accessing VLDB. move incomplete -
 attempt cleanup of target partition - no guarantee Recovery: Creating
 transaction for destination volume 536874901 ... Recovery: Unable to
 start transaction on destination volume 536874901. Recovery: Creating
 transaction on source volume 536874901 ... done Recovery: Setting
 flags on source volume 536874901 ... done Recovery: Ending
 transaction on source volume 536874901 ... done Recovery: Creating
 transaction on clone volume 2681864210 ... Recovery: Unable to start
 transaction on source volume 536874901. Recovery: Releasing lock on
 VLDB entry for volume 536874901 ... done cleanup complete - user
 verify desired result [linat08:local]# vos examine  2681864210 Could
 not fetch the entry for volume number 18446744072096448530 from VLDB
 
 I am assuming the large cloned volume ID is causing the problem as
 opposed to an inability to create a cloned volume.  I can make
 replicas on linat08 for existing volumes without a problem.
 
 NOTE: The hex representations of the cloned volume from the move
 attempt above and the 'vos examine':
 
 [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]#
 18446744072096448530 = 0x 9FDA0012
 
 Any suggestions?   This seems like a 64 vs 32 bit issue.
 
 Here is the information on servers and versions:
 
 We have 3 AFS DB servers: Linat02 - RHEL5/x86_64  -  OpenAFS 1.4.7 
 Linat03 - RHEL4/i686-  OpenAFS 1.4.6 Linat04 - RHEL5/x86_64  -
 OpenAFS 1.4.7
 
 We have 3 AFS file servers: Linat06 - RHEL4/x86_64  -  OpenAFS 1.4.6 
 Linat07 - RHEL4/x86_64  -  OpenAFS 1.4.6 Linat08 - RHEL5/x86_64  -
 OpenAFS 1.4.8
 
 Info on OSGWN volume:
 
 [linat08:~]# vos examine OSGWN OSGWN
 536874901 RW 505153 K  On-line linat07.grid.umich.edu /vicepf 
 RWrite  536874901 ROnly 18446744072096448530 Backup  0 
 MaxQuota200 K CreationTue Mar  3 03:43:06 2009 Copy
 Mon Dec  3 16:39:21 2007 Backup  Never Last Update Sat Feb 21
 15:18:05 2009 0 accesses in the past day (i.e., vnode references)
 
 RWrite: 536874901 ROnly: 536874902 number of sites - 2 server
 linat07.grid.umich.edu partition /vicepf RW Site server
 linat06.grid.umich.edu partition /vicepe RO Site
 
 Let me know if there is other info required to help resolve this.
 
 Thanks,
 
 Shawn McKee University of Michigan/ATLAS Group 
 ___ OpenAFS-info mailing
 list OpenAFS-info@openafs.org 
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


RE: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread McKee, Shawn
Thanks Jeffrey,

Yes, I agree...you have found the issue.  I  am surprised no one else has hit 
this before (maybe they have but I didn't find this problem in my searching).

I guess this will take a patch to the code for existing versions to get this 
resolved.

Thanks again,

Shawn

-Original Message-
From: Jeffrey Altman [mailto:jalt...@secure-endpoints.com]
Sent: Thursday, March 19, 2009 9:19 AM
To: McKee, Shawn
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] Unable to 'move' volumevolume ID too large / cloned 
volume not findable?

McKee, Shawn wrote:
 Could not fetch the entry for volume number 18446744072096448530 from VLDB

 I am assuming the large cloned volume ID is causing the problem as opposed 
 to an inability to create a cloned volume.  I can make replicas on linat08 
 for existing volumes without a problem.

 NOTE: The hex representations of the cloned volume from the move attempt 
 above and the 'vos examine':

 [linat08:local]# 2681864210 = 0x 9FDA0012
 [linat08:local]# 18446744072096448530 = 0x 9FDA0012

 Any suggestions?   This seems like a 64 vs 32 bit issue.

Its a signed vs unsigned int problem.

afsint.h and afscbint.h include

 typedef afs_uint32 VolumeId;

which is used throughout the src/vol package for volume ids.
In volser/vos.c and volser/vsprocs.c the volume id variables are
defined as signed 32-bit (afs_int32).

There are also some signed vs unsigned issued in some of the
protocol structures.  I believe the type VolumeId should be
used consistently to define types of volume id variables.

In vldbint.xg:

struct [nu]vldbentry uses afs_uint32 for the volumeId array but
afs_int32 for the cloneId.  Same for VldbUpdateEntry.

VldbListByAttributes uses afs_int32 for the volumeid.

This is not a full examination but I believe it shows where the
problem lies.

Jeffrey Altman









smime.p7s
Description: S/MIME cryptographic signature


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread Jeffrey Altman
McKee, Shawn wrote:
 Thanks Jeffrey,
 
 Yes, I agree...you have found the issue.  I  am surprised no one else has hit 
 this before (maybe they have but I didn't find this problem in my searching).
 
 I guess this will take a patch to the code for existing versions to get this 
 resolved.
 
 Thanks again,
 
 Shawn

I have opened ticket 124510 for this issue.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info