Re: [Lustre-discuss] Lustre, NFS and mds_getattr_lock operation
On 2010-05-06, at 11:57, Frederik Ferner wrote: > On our Lustre system we are seeing the following error fairly regularly, > so far we have not had complaints from users and have not noticed any > negative effects, but it would still be nice to understand the errors > better. The systems reporting these errors are NFS exporters for > subtrees of the Lustre file system. > > One the Lustre client/NFS server: > > May 6 14:23:09 i16-storage1 kernel: LustreError: 11-0: an error > occurred while communicating with 172.23.6...@tcp. The mds_getattr_lock > operation failed with -13 -13 is -EACCESS (per /usr/include/asm-generic/errno-base.h) or equivalent That just means that someone tried to access a file they don't have permission to access. As to why this is being printed on the console is a bit of a mystery, since I haven't seen anything similar. I wonder if NFS is going down some obscure code path that is returning the error to the RPC handler instead of stashing this "normal" error code inside the reply. In any case it is harmless and expected (sigh). I'd hope it would have been removed in newer versions, but I don't know at all. > Does anyone know if we should worry about those messages or if we can > safely ignore them? Or should we assume that some of our users might > have a problem accessing data that they have just not reported? Even > though I find that unlikely. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] MDS high cpu usage
Hello, On the lustre system I have set up I use one MDS and 5 OSTs. The MDS has almoast 100% CPU usage when I check top. Sometimes the filesystem responds slow. There are two processes running with over 35% cpu usage. socknal_sd00 and socknal_sd01 Are these the main processes of the lfs? Is it better to move the MDS to a faster node (with more CPU power)? The MDS has now a 2Ghz dualcore AMD Athlon processor. There area around 30 clients accessing the filesystem, somtimes all at once. Regards, Frans Ruzius ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre, NFS and mds_getattr_lock operation
On our Lustre system we are seeing the following error fairly regularly, so far we have not had complaints from users and have not noticed any negative effects, but it would still be nice to understand the errors better. The systems reporting these errors are NFS exporters for subtrees of the Lustre file system. One the Lustre client/NFS server: May 6 14:23:09 i16-storage1 kernel: LustreError: 11-0: an error occurred while communicating with 172.23.6...@tcp. The mds_getattr_lock operation failed with -13 May 6 14:23:09 i16-storage1 kernel: LustreError: Skipped 10 previous similar messages May 6 14:23:09 i16-storage1 kernel: LustreError: 3515:0:(llite_nfs.c:223:ll_get_parent()) failure -13 inode 108443563 get parent May 6 14:23:09 i16-storage1 kernel: LustreError: 3515:0:(llite_nfs.c:223:ll_get_parent()) Skipped 10 previous similar messages On the MDS: May 6 14:23:08 cs04r-sc-mds01-01 kernel: LustreError: 3595:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-13) r...@81042936a000 x4806957/t0 o34->33a488dc-5987-fee2-b810-00ff4304b...@net_0x2ac176821_uuid:0/0 lens 312/128 e 0 to 0 dl 1273152288 ref 1 fl Interpret:/0/0 rc -13/0 May 6 14:23:08 cs04r-sc-mds01-01 kernel: LustreError: 3595:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 14 previous similar messages We've checked the inodes mentioned in the various messages and can't spot anything that would make them different from other directories where this does not seem to happen. Unfortunately we have so far not been able to reproduce it. Does anyone know if we should worry about those messages or if we can safely ignore them? Or should we assume that some of our users might have a problem accessing data that they have just not reported? Even though I find that unlikely. I've seen a thread mentioning similar messages[1] but could not find any conclusion. Our MDS, OSSes and the clients involved are all running Lustre 1.6.7.2.ddn3.5 on RHEL5. If necessary I can probably find exactly which patches the ddn3.5 version has applied on top of 1.6.7.2. Kind regards, Frederik [1] http://lists.lustre.org/pipermail/lustre-discuss/2008-January/006309.html -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre RAID1 SNS
Was looking thru here, http://wiki.lustre.org/images/f/ff/OST_Migration_RAID1_SNS.pdf Is this actually work in progress or proposal? This is perhaps the feature of the decade for Lustre :-) ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss