[OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread McKee, Shawn
Hi Everyone,

I am having a problem trying to 'vos move' volumes after losing/restoring an 
AFS file server.   The server that was lost has been restored on new hardware.  
The old RW volumes were moved to other servers (convertROtoRW) and now I want 
to use the 'vos move' command to move them back.

Here is what happens (I have tokens as 'admin'.  Linat07 is the current RW home 
for OSGWN and Linat08 is the new server):

vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose
Starting transaction on source volume 536874901 ... done
Allocating new volume id for clone of volume 536874901 ... done
Cloning source volume 536874901 ... done
Ending the transaction on the source volume 536874901 ... done
Starting transaction on the cloned volume 2681864210 ...
Failed to start a transaction on the cloned volume2681864210
   Volume not attached, does not exist, or not on line
vos move: operation interrupted, cleanup in progress...
clear transaction contexts
Recovery: Releasing VLDB lock on volume 536874901 ... done
Recovery: Accessing VLDB.
move incomplete - attempt cleanup of target partition - no guarantee
Recovery: Creating transaction for destination volume 536874901 ...
Recovery: Unable to start transaction on destination volume 536874901.
Recovery: Creating transaction on source volume 536874901 ... done
Recovery: Setting flags on source volume 536874901 ... done
Recovery: Ending transaction on source volume 536874901 ... done
Recovery: Creating transaction on clone volume 2681864210 ...
Recovery: Unable to start transaction on source volume 536874901.
Recovery: Releasing lock on VLDB entry for volume 536874901 ... done
cleanup complete - user verify desired result
[linat08:local]# vos examine  2681864210
Could not fetch the entry for volume number 18446744072096448530 from VLDB

I am assuming the large cloned volume ID is causing the problem as opposed to 
an inability to create a cloned volume.  I can make replicas on linat08 for 
existing volumes without a problem.

NOTE: The hex representations of the cloned volume from the move attempt 
above and the 'vos examine':

[linat08:local]# 2681864210 = 0x 9FDA0012
[linat08:local]# 18446744072096448530 = 0x 9FDA0012

Any suggestions?   This seems like a 64 vs 32 bit issue.

Here is the information on servers and versions:

We have 3 AFS DB servers:
  Linat02 - RHEL5/x86_64  -  OpenAFS 1.4.7
  Linat03 - RHEL4/i686-  OpenAFS 1.4.6
  Linat04 - RHEL5/x86_64  -  OpenAFS 1.4.7

We have 3 AFS file servers:
  Linat06 - RHEL4/x86_64  -  OpenAFS 1.4.6
  Linat07 - RHEL4/x86_64  -  OpenAFS 1.4.6
  Linat08 - RHEL5/x86_64  -  OpenAFS 1.4.8

Info on OSGWN volume:

[linat08:~]# vos examine OSGWN
OSGWN 536874901 RW 505153 K  On-line
linat07.grid.umich.edu /vicepf
RWrite  536874901 ROnly 18446744072096448530 Backup  0
MaxQuota200 K
CreationTue Mar  3 03:43:06 2009
CopyMon Dec  3 16:39:21 2007
Backup  Never
Last Update Sat Feb 21 15:18:05 2009
0 accesses in the past day (i.e., vnode references)

RWrite: 536874901 ROnly: 536874902
number of sites - 2
   server linat07.grid.umich.edu partition /vicepf RW Site
   server linat06.grid.umich.edu partition /vicepe RO Site

Let me know if there is other info required to help resolve this.

Thanks,

Shawn McKee
University of Michigan/ATLAS Group
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread Hartmut Reuter

What says the VolserLog on the source server?

-Hartmut
McKee, Shawn wrote:
 Hi Everyone,
 
 I am having a problem trying to 'vos move' volumes after
 losing/restoring an AFS file server.   The server that was lost has
 been restored on new hardware.  The old RW volumes were moved to
 other servers (convertROtoRW) and now I want to use the 'vos move'
 command to move them back.
 
 Here is what happens (I have tokens as 'admin'.  Linat07 is the
 current RW home for OSGWN and Linat08 is the new server):
 
 vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose Starting
 transaction on source volume 536874901 ... done Allocating new volume
 id for clone of volume 536874901 ... done Cloning source volume
 536874901 ... done Ending the transaction on the source volume
 536874901 ... done Starting transaction on the cloned volume
 2681864210 ... Failed to start a transaction on the cloned
 volume2681864210 Volume not attached, does not exist, or not on line 
 vos move: operation interrupted, cleanup in progress... clear
 transaction contexts Recovery: Releasing VLDB lock on volume
 536874901 ... done Recovery: Accessing VLDB. move incomplete -
 attempt cleanup of target partition - no guarantee Recovery: Creating
 transaction for destination volume 536874901 ... Recovery: Unable to
 start transaction on destination volume 536874901. Recovery: Creating
 transaction on source volume 536874901 ... done Recovery: Setting
 flags on source volume 536874901 ... done Recovery: Ending
 transaction on source volume 536874901 ... done Recovery: Creating
 transaction on clone volume 2681864210 ... Recovery: Unable to start
 transaction on source volume 536874901. Recovery: Releasing lock on
 VLDB entry for volume 536874901 ... done cleanup complete - user
 verify desired result [linat08:local]# vos examine  2681864210 Could
 not fetch the entry for volume number 18446744072096448530 from VLDB
 
 I am assuming the large cloned volume ID is causing the problem as
 opposed to an inability to create a cloned volume.  I can make
 replicas on linat08 for existing volumes without a problem.
 
 NOTE: The hex representations of the cloned volume from the move
 attempt above and the 'vos examine':
 
 [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]#
 18446744072096448530 = 0x 9FDA0012
 
 Any suggestions?   This seems like a 64 vs 32 bit issue.
 
 Here is the information on servers and versions:
 
 We have 3 AFS DB servers: Linat02 - RHEL5/x86_64  -  OpenAFS 1.4.7 
 Linat03 - RHEL4/i686-  OpenAFS 1.4.6 Linat04 - RHEL5/x86_64  -
 OpenAFS 1.4.7
 
 We have 3 AFS file servers: Linat06 - RHEL4/x86_64  -  OpenAFS 1.4.6 
 Linat07 - RHEL4/x86_64  -  OpenAFS 1.4.6 Linat08 - RHEL5/x86_64  -
 OpenAFS 1.4.8
 
 Info on OSGWN volume:
 
 [linat08:~]# vos examine OSGWN OSGWN
 536874901 RW 505153 K  On-line linat07.grid.umich.edu /vicepf 
 RWrite  536874901 ROnly 18446744072096448530 Backup  0 
 MaxQuota200 K CreationTue Mar  3 03:43:06 2009 Copy
 Mon Dec  3 16:39:21 2007 Backup  Never Last Update Sat Feb 21
 15:18:05 2009 0 accesses in the past day (i.e., vnode references)
 
 RWrite: 536874901 ROnly: 536874902 number of sites - 2 server
 linat07.grid.umich.edu partition /vicepf RW Site server
 linat06.grid.umich.edu partition /vicepe RO Site
 
 Let me know if there is other info required to help resolve this.
 
 Thanks,
 
 Shawn McKee University of Michigan/ATLAS Group 
 ___ OpenAFS-info mailing
 list OpenAFS-info@openafs.org 
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread Jeffrey Altman
McKee, Shawn wrote:
 Could not fetch the entry for volume number 18446744072096448530 from VLDB
 
 I am assuming the large cloned volume ID is causing the problem as opposed 
 to an inability to create a cloned volume.  I can make replicas on linat08 
 for existing volumes without a problem.
 
 NOTE: The hex representations of the cloned volume from the move attempt 
 above and the 'vos examine':
 
 [linat08:local]# 2681864210 = 0x 9FDA0012
 [linat08:local]# 18446744072096448530 = 0x 9FDA0012
 
 Any suggestions?   This seems like a 64 vs 32 bit issue.

Its a signed vs unsigned int problem.

afsint.h and afscbint.h include

 typedef afs_uint32 VolumeId;

which is used throughout the src/vol package for volume ids.
In volser/vos.c and volser/vsprocs.c the volume id variables are
defined as signed 32-bit (afs_int32).

There are also some signed vs unsigned issued in some of the
protocol structures.  I believe the type VolumeId should be
used consistently to define types of volume id variables.

In vldbint.xg:

struct [nu]vldbentry uses afs_uint32 for the volumeId array but
afs_int32 for the cloneId.  Same for VldbUpdateEntry.

VldbListByAttributes uses afs_int32 for the volumeid.

This is not a full examination but I believe it shows where the
problem lies.

Jeffrey Altman









smime.p7s
Description: S/MIME Cryptographic Signature


RE: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread McKee, Shawn
Thanks for the quick reply Hartmut.

Here are the relevant lines from the attempt below:

Wed Mar 18 10:50:14 2009 1 Volser: Clone: Cloning volume 536874901 to new 
volume 2681864210
Wed Mar 18 10:50:15 2009 VAttachVolume: Failed to open 
/vicepf/V184467440720964485 (errno 2)
Wed Mar 18 10:50:15 2009 VAttachVolume: Failed to open 
/vicepf/V184467440720964485 (errno 2)

Shawn

-Original Message-
From: Hartmut Reuter [mailto:reu...@rzg.mpg.de] 
Sent: Thursday, March 19, 2009 9:01 AM
To: McKee, Shawn
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] Unable to 'move' volumevolume ID too large / cloned 
volume not findable?


What says the VolserLog on the source server?

-Hartmut
McKee, Shawn wrote:
 Hi Everyone,
 
 I am having a problem trying to 'vos move' volumes after
 losing/restoring an AFS file server.   The server that was lost has
 been restored on new hardware.  The old RW volumes were moved to
 other servers (convertROtoRW) and now I want to use the 'vos move'
 command to move them back.
 
 Here is what happens (I have tokens as 'admin'.  Linat07 is the
 current RW home for OSGWN and Linat08 is the new server):
 
 vos move OSGWN linat07 /vicepf linat08 /vicepg -verbose Starting
 transaction on source volume 536874901 ... done Allocating new volume
 id for clone of volume 536874901 ... done Cloning source volume
 536874901 ... done Ending the transaction on the source volume
 536874901 ... done Starting transaction on the cloned volume
 2681864210 ... Failed to start a transaction on the cloned
 volume2681864210 Volume not attached, does not exist, or not on line 
 vos move: operation interrupted, cleanup in progress... clear
 transaction contexts Recovery: Releasing VLDB lock on volume
 536874901 ... done Recovery: Accessing VLDB. move incomplete -
 attempt cleanup of target partition - no guarantee Recovery: Creating
 transaction for destination volume 536874901 ... Recovery: Unable to
 start transaction on destination volume 536874901. Recovery: Creating
 transaction on source volume 536874901 ... done Recovery: Setting
 flags on source volume 536874901 ... done Recovery: Ending
 transaction on source volume 536874901 ... done Recovery: Creating
 transaction on clone volume 2681864210 ... Recovery: Unable to start
 transaction on source volume 536874901. Recovery: Releasing lock on
 VLDB entry for volume 536874901 ... done cleanup complete - user
 verify desired result [linat08:local]# vos examine  2681864210 Could
 not fetch the entry for volume number 18446744072096448530 from VLDB
 
 I am assuming the large cloned volume ID is causing the problem as
 opposed to an inability to create a cloned volume.  I can make
 replicas on linat08 for existing volumes without a problem.
 
 NOTE: The hex representations of the cloned volume from the move
 attempt above and the 'vos examine':
 
 [linat08:local]# 2681864210 = 0x 9FDA0012 [linat08:local]#
 18446744072096448530 = 0x 9FDA0012
 
 Any suggestions?   This seems like a 64 vs 32 bit issue.
 
 Here is the information on servers and versions:
 
 We have 3 AFS DB servers: Linat02 - RHEL5/x86_64  -  OpenAFS 1.4.7 
 Linat03 - RHEL4/i686-  OpenAFS 1.4.6 Linat04 - RHEL5/x86_64  -
 OpenAFS 1.4.7
 
 We have 3 AFS file servers: Linat06 - RHEL4/x86_64  -  OpenAFS 1.4.6 
 Linat07 - RHEL4/x86_64  -  OpenAFS 1.4.6 Linat08 - RHEL5/x86_64  -
 OpenAFS 1.4.8
 
 Info on OSGWN volume:
 
 [linat08:~]# vos examine OSGWN OSGWN
 536874901 RW 505153 K  On-line linat07.grid.umich.edu /vicepf 
 RWrite  536874901 ROnly 18446744072096448530 Backup  0 
 MaxQuota200 K CreationTue Mar  3 03:43:06 2009 Copy
 Mon Dec  3 16:39:21 2007 Backup  Never Last Update Sat Feb 21
 15:18:05 2009 0 accesses in the past day (i.e., vnode references)
 
 RWrite: 536874901 ROnly: 536874902 number of sites - 2 server
 linat07.grid.umich.edu partition /vicepf RW Site server
 linat06.grid.umich.edu partition /vicepe RO Site
 
 Let me know if there is other info required to help resolve this.
 
 Thanks,
 
 Shawn McKee University of Michigan/ATLAS Group 
 ___ OpenAFS-info mailing
 list OpenAFS-info@openafs.org 
 https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-
Hartmut Reuter  e-mail  reu...@rzg.mpg.de
phone+49-89-3299-1328
fax  +49-89-3299-1301
RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


RE: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread McKee, Shawn
Thanks Jeffrey,

Yes, I agree...you have found the issue.  I  am surprised no one else has hit 
this before (maybe they have but I didn't find this problem in my searching).

I guess this will take a patch to the code for existing versions to get this 
resolved.

Thanks again,

Shawn

-Original Message-
From: Jeffrey Altman [mailto:jalt...@secure-endpoints.com]
Sent: Thursday, March 19, 2009 9:19 AM
To: McKee, Shawn
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] Unable to 'move' volumevolume ID too large / cloned 
volume not findable?

McKee, Shawn wrote:
 Could not fetch the entry for volume number 18446744072096448530 from VLDB

 I am assuming the large cloned volume ID is causing the problem as opposed 
 to an inability to create a cloned volume.  I can make replicas on linat08 
 for existing volumes without a problem.

 NOTE: The hex representations of the cloned volume from the move attempt 
 above and the 'vos examine':

 [linat08:local]# 2681864210 = 0x 9FDA0012
 [linat08:local]# 18446744072096448530 = 0x 9FDA0012

 Any suggestions?   This seems like a 64 vs 32 bit issue.

Its a signed vs unsigned int problem.

afsint.h and afscbint.h include

 typedef afs_uint32 VolumeId;

which is used throughout the src/vol package for volume ids.
In volser/vos.c and volser/vsprocs.c the volume id variables are
defined as signed 32-bit (afs_int32).

There are also some signed vs unsigned issued in some of the
protocol structures.  I believe the type VolumeId should be
used consistently to define types of volume id variables.

In vldbint.xg:

struct [nu]vldbentry uses afs_uint32 for the volumeId array but
afs_int32 for the cloneId.  Same for VldbUpdateEntry.

VldbListByAttributes uses afs_int32 for the volumeid.

This is not a full examination but I believe it shows where the
problem lies.

Jeffrey Altman









smime.p7s
Description: S/MIME cryptographic signature


Re: [OpenAFS] Unable to 'move' volume....volume ID too large / cloned volume not findable?

2009-03-19 Thread Jeffrey Altman
McKee, Shawn wrote:
 Thanks Jeffrey,
 
 Yes, I agree...you have found the issue.  I  am surprised no one else has hit 
 this before (maybe they have but I didn't find this problem in my searching).
 
 I guess this will take a patch to the code for existing versions to get this 
 resolved.
 
 Thanks again,
 
 Shawn

I have opened ticket 124510 for this issue.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] AFS lag

2009-03-19 Thread Ken Hornstein
 There is a lot of misinformation about Ubik out there; the voting
 protocol is actually not complicated, it's just not documented well.

it's actually well-documented, if you find Kazar's paper on Quorum Completion.

You know, we should try to find a copy of that and put it somewhere useful.
From what I remember (I think I saw a copy once), the paper gets you about
80% of the way there; the source code gets you the rest of the way.

Actually, I now realize that I _do_ have a copy of it.  Can we put it on
the OpenAFS web site?  I just have the PostScript; it's easy enough to
convert that to PDF.

 If your database servers are accessable via the Internet, we could take
 a look at them via udebug.  Really, there are only a few things that can
 go wrong; of all of the pieces of AFS, I think Ubik is one of the most
 bulletproof.

There are a couple (unlikely) open issues; See RT.

Didn't know about those.  Still, I think we need more information to diagnose
the original problem.

--Ken
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


RE: [OpenAFS] Openafs 1.4.8 on OSX very slow

2009-03-19 Thread ENEM | Hans Melgers

This problem is solved after my isp fixed some very slow dns servers that were 
even slower in conjuction with a dns relayon my router. Looks like the finder 
was waiting for that every time. Its running smoothly now as it used to be.

Thanks everyone!

-Original Message-
From: openafs-info-ad...@openafs.org [mailto:openafs-info-ad...@openafs.org] On 
Behalf Of Hans Melgers
Sent: maandag 9 maart 2009 22:13
To: jalt...@secure-endpoints.com
Cc: Felix Frank; Joel; OpenAFS-info
Subject: Re: [OpenAFS] Openafs 1.4.8 on OSX very slow



If the finder stats all of them then yes, this behaviour seems normal.
There are a couple of gigs in that tree.
However,  I did use the client before and in my memory it wasnt as
slow as it is now.
I tried to make a symlink from my local Documents folder directly to
my afs homedir but Finder doesnt like that either..again very slow,
now even when opening the Documents folder.




On 9 mrt 2009, at 21:33, Jeffrey Altman wrote:


 The next question then becomes how many files are in all of those
 subdirectories?  Finder is going to stat them all.  Since the MacOS X
 OpenAFS client does not have a working bulkstat mechanism, one
 FetchStatus RPC will be issued per directory object.  More if those
 object are symlinks.

 You can use tcpdump and wireshark to capture and examine the RPCs
 that being sent from your machine.  The more that are sent, the longer
 it will take.

 Jeffrey Altman

 Hans Melgers wrote:

 I log in as admin, so i have all permissions.

 On 9 mrt 2009, at 20:55, Jeffrey Altman wrote:

 Hans Melgers wrote:


 I did, no change. Just tried it again, doubleclicking on the user
 dir,
 its got 9 subdirs (all 9 different volumes), took 3 minutes
 before they
 showed up.
 Maybe a clue; opening a subfolder with no subdirs opens up
 immediately
 (all files shown), opening one with subfolders takes forever.

 Do you have 'l'ist but not 'r'ead permission on those subfolders?




 Hans Melgers
 ENEMB.V.

 www.enem.nl
 h...@enem.nl




 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info




Hans Melgers
ENEMB.V.

www.enem.nl
h...@enem.nl




___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info



Hans Melgers
ENEMB.V.

www.enem.nl
h...@enem.nl




___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info