I agree that is a special case and should be handled specially. I also think that attaching via the administrative endpoints should be allowed even if you are ringfenced so that we can issue a stop or other command to the locator.
-- Mike Stolz Principal Engineer, GemFire Product Manager Mobile: +1-631-835-4771 On Wed, Feb 15, 2017 at 10:42 PM, Swapnil Bawaskar (JIRA) <j...@apache.org> wrote: > > [ https://issues.apache.org/jira/browse/GEODE-2125?page= > com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel&focusedCommentId=15868740#comment-15868740 ] > > Swapnil Bawaskar commented on GEODE-2125: > ----------------------------------------- > > [~bschuchardt] Even though the logs clearly show that auto-reconnect went > into effect, I would still suggest that we should special case the network > partition detection logic so that, when there is only one member connected > to the locator, the locator does not fence itself. > > > GFSH cannot communicate with Locators that go into reconnect mode > > ----------------------------------------------------------------- > > > > Key: GEODE-2125 > > URL: https://issues.apache.org/jira/browse/GEODE-2125 > > Project: Geode > > Issue Type: Bug > > Components: management > > Affects Versions: 1.0.0-incubating > > Reporter: Kirk Lund > > Assignee: Kirk Lund > > Attachments: locator_failure-logs.txt, thread_dump.txt > > > > > > If the Locator is started from GFSH and the cluster's only server is > killed, network partition detection will initiate forceDisconnect in the > Locator and leave it in reconnect mode. To the User it will appear that the > Locator crashed and GFSH lost connection: > > {noformat} > > gfsh> > > No longer connected to 192.168.1.72[1099]. > > {noformat} > > During the time in which the Locator is in reconnect mode, the User > cannot connect via GFSH, nor can they issue status or stop commands against > it: > > {noformat} > > $ cd locator1 > > $ cat vf.gf.locator.pid > > 33959 > > $ ps 33959 > > PID TT STAT TIME COMMAND > > 33959 s001 S 0:19.97 /Library/Java/JavaVirtualMachines/jdk1.8.0_ > 66.jdk/Co > > {noformat} > > In GFSH: > > {noformat} > > gfsh>connect --locator=localhost[10334] > > Connecting to Locator at [host=localhost, port=10334] .. > > Connection refused > > gfsh>status locator --pid=33959 > > null > > gfsh>status locator --dir=locator1 > > null > > gfsh>stop locator --dir=locator1 > > Locator in /Users/klund/dev/geode/locator1 on null is currently not > responding. > > gfsh>stop locator --pid=33959 > > Locator in /Users/klund/dev/geode on null is currently not responding. > > {noformat} > > If a Locator has GFSH connected then it should notify GFSH that it is > going to forceDisconnect and go into reconnect mode. Then GFSH can notify > the User so the User is not suprised. > > In addition, GFSH status and stop commands should be modified to be able > to talk to a Locator in reconnect mode. GFSH start could also be modified > to report that the Locator is running in reconnect mode instead of > reporting a hung process in the Locator's directory. > > Attachments: > > * The Locator log file is attached as locator_failure-logs.txt > > * The Locator thread dump (via jstack) AFTER it has shut down due to > forceDisconnect is attached as thread_dump.txt > > > > -- > This message was sent by Atlassian JIRA > (v6.3.15#6346) >