Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-03-04 Thread Harald van Pee
Hi,

I have updated all clients to patched version 1.6.1, the servers still are 
1.6.0.1. No lustre related error message  occured since (2 weeks).

I think its reasonable (necessary?) to e2fsck all osts and the mdt?
The mdt resides on an drbd device configured as failover.

I now have the following questions.
1. Is there a recommended order to do the file system checks? mdt first and 
than the osts or vice versa?

2. If I umount the mdt should I use -f ? I assume there will be no file system 
access possible as long the mdt is back again. Would it be better to umount 
all servers and clients and than the mdt?

3. I think each ost can be checked during the others are working, but I am 
unsure if I should use -f to umount or not?

4. should I unmount all clients? If this is recommended  anyway, its maybe 
better to stop file system access for a couple of hours (2TB 70% used), but 
do the filesystem checks in parallel.

Thanks in advance
Harald



On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote:
 On Jan 21, 2008  18:55 +0100, Harald van Pee wrote:
  The directory is just not there! Directory or file not found.
 
  in my opinion there is no error message on the clients which is directly
  related to the problem on our node0010 today I have seen this problem a
  several time. Mostly the directory is not seen! Probably all of the other
  directories can be accessed at the same time.
 
  and here all lustre related messages from the last days (others are
  mostly timestamps!)
 
 
 
  Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0:
  (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800 alias

 A quick search in bugzilla for this error message shows bug 12123,
 which is fixed in the 1.6.1 release, and also has a patch.

 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.

-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-03-04 Thread Andreas Dilger
On Mar 04, 2008  19:52 +0100, Harald van Pee wrote:
 I have updated all clients to patched version 1.6.1, the servers still are 
 1.6.0.1. No lustre related error message  occured since (2 weeks).
 
 I think its reasonable (necessary?) to e2fsck all osts and the mdt?
 The mdt resides on an drbd device configured as failover.
 
 I now have the following questions.
 1. Is there a recommended order to do the file system checks? mdt first and 
 than the osts or vice versa?
 
 2. If I umount the mdt should I use -f ? I assume there will be no file 
 system 
 access possible as long the mdt is back again. Would it be better to umount 
 all servers and clients and than the mdt?
 
 3. I think each ost can be checked during the others are working, but I am 
 unsure if I should use -f to umount or not?
 
 4. should I unmount all clients? If this is recommended  anyway, its maybe 
 better to stop file system access for a couple of hours (2TB 70% used), but 
 do the filesystem checks in parallel.

If you are expecting to fix the filesystem, it is best to just unmount
everything and run e2fsck in parallel.  Alternately, you can just force
unmount the MDT+OST filesystems and let the clients hang until the MDT+OSTs
are restarted, but this can be more troublesome in some cases.

 On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote:
  On Jan 21, 2008  18:55 +0100, Harald van Pee wrote:
   The directory is just not there! Directory or file not found.
  
   in my opinion there is no error message on the clients which is directly
   related to the problem on our node0010 today I have seen this problem a
   several time. Mostly the directory is not seen! Probably all of the other
   directories can be accessed at the same time.
  
   and here all lustre related messages from the last days (others are
   mostly timestamps!)
  
  
  
   Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0:
   (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800 alias
 
  A quick search in bugzilla for this error message shows bug 12123,
  which is fixed in the 1.6.1 release, and also has a patch.
 
  Cheers, Andreas
  --
  Andreas Dilger
  Sr. Staff Engineer, Lustre Group
  Sun Microsystems of Canada, Inc.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-01-25 Thread Harald van Pee
On Thursday 24 January 2008 08:13 pm, you wrote:
 Hello Harald,

  Jan 21 18:12:51 node0010 kernel: Lustre: 5717:0:
  (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 134120476 alias
  2 Jan 21 18:12:51 node0010 kernel: Lustre: 5717:0:
  (namei.c:235:ll_mdc_blocking_ast()) Skipped 6 previous similar messages

 this looks very much like a real bug (and I don't have time to look into
 it). I would also guess it is fixed by more recent lustre version. I think
 there have been many changes of the patchless client between 1.6.0.1 and
 1.6.1 or 1.6.2.
 You really can't update your client systems by now?

Hm at the moment not, we have urgent jobs running all arround the day.
All heavy writing tasks we have done to local disks now, none of the error 
messages occoured since.

But its worth to think about that. Updating the clients alone should be 
possible much earlier than updating all machines, and of course can be done 
machine by machine.

But I would assume, that to be sure that no serious file system corruption 
will happen, I should also make a file system check on all the ost and maybe 
also mdt?

But you are right, because 1.6.0.1 servers with 1.6.1 clients is a supported 
configuration right? And therefore updating the clients asap would be a good 
idea!
Any objections about that?

Harald

-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss