Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients
Hi, I have updated all clients to patched version 1.6.1, the servers still are 1.6.0.1. No lustre related error message occured since (2 weeks). I think its reasonable (necessary?) to e2fsck all osts and the mdt? The mdt resides on an drbd device configured as failover. I now have the following questions. 1. Is there a recommended order to do the file system checks? mdt first and than the osts or vice versa? 2. If I umount the mdt should I use -f ? I assume there will be no file system access possible as long the mdt is back again. Would it be better to umount all servers and clients and than the mdt? 3. I think each ost can be checked during the others are working, but I am unsure if I should use -f to umount or not? 4. should I unmount all clients? If this is recommended anyway, its maybe better to stop file system access for a couple of hours (2TB 70% used), but do the filesystem checks in parallel. Thanks in advance Harald On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote: On Jan 21, 2008 18:55 +0100, Harald van Pee wrote: The directory is just not there! Directory or file not found. in my opinion there is no error message on the clients which is directly related to the problem on our node0010 today I have seen this problem a several time. Mostly the directory is not seen! Probably all of the other directories can be accessed at the same time. and here all lustre related messages from the last days (others are mostly timestamps!) Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0: (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800 alias A quick search in bugzilla for this error message shows bug 12123, which is fixed in the 1.6.1 release, and also has a patch. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- Harald van Pee Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients
On Mar 04, 2008 19:52 +0100, Harald van Pee wrote: I have updated all clients to patched version 1.6.1, the servers still are 1.6.0.1. No lustre related error message occured since (2 weeks). I think its reasonable (necessary?) to e2fsck all osts and the mdt? The mdt resides on an drbd device configured as failover. I now have the following questions. 1. Is there a recommended order to do the file system checks? mdt first and than the osts or vice versa? 2. If I umount the mdt should I use -f ? I assume there will be no file system access possible as long the mdt is back again. Would it be better to umount all servers and clients and than the mdt? 3. I think each ost can be checked during the others are working, but I am unsure if I should use -f to umount or not? 4. should I unmount all clients? If this is recommended anyway, its maybe better to stop file system access for a couple of hours (2TB 70% used), but do the filesystem checks in parallel. If you are expecting to fix the filesystem, it is best to just unmount everything and run e2fsck in parallel. Alternately, you can just force unmount the MDT+OST filesystems and let the clients hang until the MDT+OSTs are restarted, but this can be more troublesome in some cases. On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote: On Jan 21, 2008 18:55 +0100, Harald van Pee wrote: The directory is just not there! Directory or file not found. in my opinion there is no error message on the clients which is directly related to the problem on our node0010 today I have seen this problem a several time. Mostly the directory is not seen! Probably all of the other directories can be accessed at the same time. and here all lustre related messages from the last days (others are mostly timestamps!) Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0: (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800 alias A quick search in bugzilla for this error message shows bug 12123, which is fixed in the 1.6.1 release, and also has a patch. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients
On Thursday 24 January 2008 08:13 pm, you wrote: Hello Harald, Jan 21 18:12:51 node0010 kernel: Lustre: 5717:0: (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 134120476 alias 2 Jan 21 18:12:51 node0010 kernel: Lustre: 5717:0: (namei.c:235:ll_mdc_blocking_ast()) Skipped 6 previous similar messages this looks very much like a real bug (and I don't have time to look into it). I would also guess it is fixed by more recent lustre version. I think there have been many changes of the patchless client between 1.6.0.1 and 1.6.1 or 1.6.2. You really can't update your client systems by now? Hm at the moment not, we have urgent jobs running all arround the day. All heavy writing tasks we have done to local disks now, none of the error messages occoured since. But its worth to think about that. Updating the clients alone should be possible much earlier than updating all machines, and of course can be done machine by machine. But I would assume, that to be sure that no serious file system corruption will happen, I should also make a file system check on all the ost and maybe also mdt? But you are right, because 1.6.0.1 servers with 1.6.1 clients is a supported configuration right? And therefore updating the clients asap would be a good idea! Any objections about that? Harald -- Harald van Pee Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss