On 9/23/10 10:03 PM, Tina Friedrich wrote:
> Hi,
> thanks for the answer. I found it in the meantime; one of our ldap
> servers had a wrong size limit entry.
> The logs I had of course already looked at - they didn't yield much in
> terms of why, only what (as in, I could see it was permission errors,
> but they do of course not really tell you why you are getting them.
> There weren't any log entries that hinted at 'size limit exceeded' or
> anything.).
> Still - could someone point me to the bit in the documentation that best
> describes how the MDS queries that sort of information (group/passwd
> info, I mean)? Or how to best test that it's mechanisms are working? For
> example, in this case, I always thought one would only hit the size
> limit if doing a bulk 'transfer' of data, not doing a lookup on one user
> - plus I could do these sort lookups fine on all machines involved
> (against all ldap servers).
The topic about "User/Group Cache Upcall" maybe helpful for you.
For lustre-1.8.x, it is chapter of 28.1; for lustre-2.0.x, it is chapter 
of 29.1.
Good Luck!

> Tina
> On 23/09/10 11:20, Ashley Pittman wrote:
>> On 23 Sep 2010, at 10:46, Tina Friedrich wrote:
>>> Hello List,
>>> I'm after debugging hints...
>>> I have a couple of users that intermittently get I/O errors when trying
>>> to ls a directory (as in, within half an hour, works ->   doesn't work ->
>>> works...).
>>> Users/groups are kept in ldap; as far as I can see/check, the ldap
>>> information is consistend everywhere (i.e. no replication failure or
>>> anything).
>>> I am trying to figure out what is going on here/where this is going
>>> wrong. Can someone give me a hint on how to debug this? Specifically,
>>> how does the MDS look up this sort of information, could there be a
>>> 'list too long' type of error involved, something like that?
>> Could you give an indication as to the number of files in the directory 
>> concerned?  What is the full ls command issued (allowing for shell aliases) 
>> and in the case where it works is there a large variation in the time it 
>> takes when it does work?
>> In terms of debugging it I'd say the log files for the client in question 
>> and the MDS would be the most likely place to start.
>> Ashley,

Lustre-discuss mailing list

Reply via email to