Re: [gpfsug-discuss] proper gpfs shutdown when node disappears

Olaf Weiser Fri, 03 Feb 2017 06:08:27 -0800

unknown , is the state you'll get, if a node is powered off and not reachable or doesnt has a valid config
down is the state, when the node is reachable, has a valid config and the mmfsd is not started

not sure, what steps you did now/where you are now.. but once the node has a valid config (by removing/re-adding, by restoring) and the rpms installed (don't forget to build portability layer)
your 're good to go...

From: "J. Eric Wonderley" <[email protected]>
To: gpfsug main discussion list <[email protected]>
Date: 02/03/2017 02:47 PM
Subject: Re: [gpfsug-discuss] proper gpfs shutdown when node disappears
Sent by: [email protected]

Well we got it into the down state using mmsdrrestore -p to recover stuff into /var/mmfs/gen to cl004.

Anyhow we ended up unknown for cl004 when it powered off. Short of removing node, unknown is the state you get.

Unknown seems stable for a hopefully short outage of cl004.

Thanks

On Thu, Feb 2, 2017 at 4:28 PM, Olaf Weiser <[email protected]> wrote:
many ways lead to Rome .. and I agree .. mmexpelnode is a nice command ..
another approach...
power it off .. (not reachable by ping) .. mmdelnode ... power on/boot ... mmaddnode ..

From: Aaron Knister <[email protected]>
To: <[email protected]>
Date: 02/02/2017 08:37 PM
Subject: Re: [gpfsug-discuss] proper gpfs shutdown when node disappears
Sent by: [email protected]

You could forcibly expel the node (one of my favorite GPFS commands): mmexpelnode -N $nodename and then power it off after the expulsion is complete and then do mmepelenode -r -N $nodename which will allow it to join the cluster next time you try and start up GPFS on it. You'll still likely have to go through recovery but you'll skip the part where GPFS wonders where the node went prior to it expelling it. -Aaron On 2/2/17 2:28 PM,[email protected]wrote: > On Thu, 02 Feb 2017 18:28:22 +0100, "Olaf Weiser" said: > >> but the /var/mmfs DIR is obviously damaged/empty .. what ever.. that's why you >> see a message like this.. >> have you reinstalled that node / any backup/restore thing ? > > The internal RAID controller died a horrid death and basically took > all the OS partitions with it. So the node was just sort of limping along, > where the mmfsd process was still coping because it wasn't doing any > I/O to the OS partitions - but 'ssh bad-node mmshutdown' wouldn't work > because that requires accessing stuff in /var. > > At that point, it starts getting tempting to just use ipmitool from > another node to power the comatose one down - but that often causes > a cascade of other issues while things are stuck waiting for timeouts. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss atspectrumscale.org>http://gpfsug.org/mailman/listinfo/gpfsug-discuss> -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center(301) 286-2776_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss atspectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] proper gpfs shutdown when node disappears

Reply via email to