We're in the same boat...gpfs snap hangs when the cluster / node is 
unresponsive but they don't know how to give us a root cause without one. Very 
frustrating.

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
[email protected]
Sent: 07 March 2017 21:37
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] Potential problems - leaving trace enabled in 
over-write mode?

On Tue, 07 Mar 2017 21:17:35 +0000, Bryan Banister said:

> Just depends on how your problem is detected??? is it in a log?  Is it 
> found by running a command (.e.g mm*)?  Is it discovered in `ps` 
> output?  Is your scheduler failing jobs?

I think the problem here is that if you have a sudden cataclysmic event, you 
want to have been in flight-recorder mode and be able to look at the last 5 or
10 seconds of trace *before* you became aware that your filesystem just went 
walkies.  Sure, you can start tracing when the filesystem dies - but at that 
point you just get a whole mess of failed I/O requests in the trace, and no 
hint of where things went south...
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to