Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from 
what you say probably not?

This whole incident was caused by a scheduled and extremely rare shutdown of 
our main datacentre for electrical testing. It's not something that's likely to 
happen again if at all so reproducing it will be nigh on impossible.

Food for thought though!

Richard

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Christof Schmitt
Sent: 11 January 2017 22:33
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] CES log files

A winbindd process taking up 100% could be caused by the problem documented in 
https://bugzilla.samba.org/show_bug.cgi?id=12105

Capturing a brief strace of the affected process and reporting that through a 
PMR would be helpful to debug this problem and provide a fix.

To answer the wider question: Log files are kept in /var/adm/ras/. In case more 
detailed traces are required, use the mmprotocoltrace command.

Regards,

Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ
[email protected]  ||  +1-520-799-2469    (T/L: 321-2469)



From:   "Sobey, Richard A" <[email protected]>
To:     gpfsug main discussion list <[email protected]>
Date:   01/11/2017 07:00 AM
Subject:        Re: [gpfsug-discuss] CES log files
Sent by:        [email protected]



Thanks. Some of the node would just say “failed” or “degraded” with the DCs 
offline. Of those that thought they were happy to host a CES IP address, they 
did not respond and winbindd process would take up 100% CPU as seen through top 
with no users on it.
 
Interesting that even though all CES nodes had the same configuration, three of 
them never had a problem at all.
 
JF – I’ll look at the protocol tracing next time this happens. It’s a rare 
thing that three DCs go offline at once but even so there should have been 
enough resiliency to cope.
 
Thanks
Richard
 
From: [email protected] [ 
mailto:[email protected]] On Behalf Of Andrew Beattie
Sent: 11 January 2017 09:55
To: [email protected]
Cc: [email protected]
Subject: Re: [gpfsug-discuss] CES log files
 
mmhealth might be a good place to start
 
CES should probably throw a message along the lines of the following:
 
mmhealth shows something is wrong with AD server:
...
CES                      DEGRADED                 ads_down 
...
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: [email protected]
 
 
----- Original message -----
From: "Sobey, Richard A" <[email protected]> Sent by: 
[email protected]
To: "'[email protected]'" <[email protected]
>
Cc:
Subject: [gpfsug-discuss] CES log files
Date: Wed, Jan 11, 2017 7:27 PM
 
Which files do I need to look in to determine what’s happening with CES… 
supposing for example a load of domain controllers were shut down and CES had 
no clue how to handle this and stopped working until the DCs were switched back 
on again.
 
Mmfs.log.latest said everything was fine btw.
 
Thanks
Richard
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to