We're in the same boat...gpfs snap hangs when the cluster / node is unresponsive but they don't know how to give us a root cause without one. Very frustrating.
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of [email protected] Sent: 07 March 2017 21:37 To: gpfsug main discussion list <[email protected]> Subject: Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode? On Tue, 07 Mar 2017 21:17:35 +0000, Bryan Banister said: > Just depends on how your problem is detected??? is it in a log? Is it > found by running a command (.e.g mm*)? Is it discovered in `ps` > output? Is your scheduler failing jobs? I think the problem here is that if you have a sudden cataclysmic event, you want to have been in flight-recorder mode and be able to look at the last 5 or 10 seconds of trace *before* you became aware that your filesystem just went walkies. Sure, you can start tracing when the filesystem dies - but at that point you just get a whole mess of failed I/O requests in the trace, and no hint of where things went south... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
