On 9/23/10 10:03 PM, Tina Friedrich wrote: > Hi, > > thanks for the answer. I found it in the meantime; one of our ldap > servers had a wrong size limit entry. > > The logs I had of course already looked at - they didn't yield much in > terms of why, only what (as in, I could see it was permission errors, > but they do of course not really tell you why you are getting them. > There weren't any log entries that hinted at 'size limit exceeded' or > anything.). > > Still - could someone point me to the bit in the documentation that best > describes how the MDS queries that sort of information (group/passwd > info, I mean)? Or how to best test that it's mechanisms are working? For > example, in this case, I always thought one would only hit the size > limit if doing a bulk 'transfer' of data, not doing a lookup on one user > - plus I could do these sort lookups fine on all machines involved > (against all ldap servers). The topic about "User/Group Cache Upcall" maybe helpful for you. For lustre-1.8.x, it is chapter of 28.1; for lustre-2.0.x, it is chapter of 29.1. Good Luck!
Cheers, Nasf > Tina > > On 23/09/10 11:20, Ashley Pittman wrote: >> On 23 Sep 2010, at 10:46, Tina Friedrich wrote: >> >>> Hello List, >>> >>> I'm after debugging hints... >>> >>> I have a couple of users that intermittently get I/O errors when trying >>> to ls a directory (as in, within half an hour, works -> doesn't work -> >>> works...). >>> >>> Users/groups are kept in ldap; as far as I can see/check, the ldap >>> information is consistend everywhere (i.e. no replication failure or >>> anything). >>> >>> I am trying to figure out what is going on here/where this is going >>> wrong. Can someone give me a hint on how to debug this? Specifically, >>> how does the MDS look up this sort of information, could there be a >>> 'list too long' type of error involved, something like that? >> Could you give an indication as to the number of files in the directory >> concerned? What is the full ls command issued (allowing for shell aliases) >> and in the case where it works is there a large variation in the time it >> takes when it does work? >> >> In terms of debugging it I'd say the log files for the client in question >> and the MDS would be the most likely place to start. >> >> Ashley, >> > _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss