Hi, many thanks to all of the suggestions for how to deal with this issue. Ftr, I tried this
mmchnode --noquorum -N <broken-nodes> --force on the node that was reinstalled which reinstated some of the communications between the cluster nodes, but then when I restarted the cluster, communications begain to fail again, complaining about not enough CCR nodes for quorum. I ended up reinstalling the cluster since at this point the nodes couldn't mount the remote data and I thought it would be faster. Thanks again for all of the responses, Renata Dart SLAC National Accelerator Lab On Wed, 27 Jun 2018, IBM Spectrum Scale wrote: > >Hi Renata, > >You may want to reduce the set of quorum nodes. If your version supports >the --force option, you can run > >mmchnode --noquorum -N <broken-nodes> --force > >It is a good idea to configure tiebreaker disks in a cluster that has only >2 quorum nodes. > >Regards, The Spectrum Scale (GPFS) team > >------------------------------------------------------------------------------------------------------------------ > >If you feel that your question can benefit other users of Spectrum Scale >(GPFS), then please post it to the public IBM developerWroks Forum at >https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > >If your query concerns a potential software error in Spectrum Scale (GPFS) >and you have an IBM software maintenance contract please contact >1-800-237-5511 in the United States or your local IBM Service Center in >other countries. > >The forum is informally monitored as time permits and should not be used >for priority messages to the Spectrum Scale (GPFS) team. > > > >From: Renata Maria Dart <ren...@slac.stanford.edu> >To: gpfsug-discuss@spectrumscale.org >Date: 06/27/2018 02:21 PM >Subject: [gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues >Sent by: gpfsug-discuss-boun...@spectrumscale.org > > > >Hi, we have a client cluster of 4 nodes with 3 quorum nodes. One of the >quorum nodes is no longer in service and the other was reinstalled with >a newer OS, both without informing the gpfs admins. Gpfs is still >"working" on the two remaining nodes, that is, they continue to have access >to the gpfs data on the remote clusters. But, I can no longer get >any gpfs commands to work. On one of the 2 nodes that are still serving >data, > >root@ocio-gpu01 ~]# mmlscluster >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmlscluster: Command failed. Examine previous error messages to determine >cause. > > >On the reinstalled node, this fails in the same way: > >[root@ocio-gpu02 ccr]# mmstartup >get file failed: Not enough CCR quorum nodes available (err 809) >gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 >mmstartup: Command failed. Examine previous error messages to determine >cause. > > >I have looked through the users group interchanges but didn't find anything >that seems to fit this scenario. > >Is there a way to salvage this cluster? Can it be done without >shutting gpfs down on the 2 nodes that continue to work? > >Thanks for any advice, > >Renata Dart >SLAC National Accelerator Lb > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss