Re: [jira] [Commented] (GEODE-2125) GFSH cannot communicate with Locators that go into reconnect mode

Michael Stolz Wed, 15 Feb 2017 14:59:43 -0800

I agree that is a special case and should be handled specially.
I also think that attaching via the administrative endpoints should be
allowed even if you are ringfenced so that we can issue a stop or other
command to the locator.


--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Wed, Feb 15, 2017 at 10:42 PM, Swapnil Bawaskar (JIRA) <j...@apache.org>
wrote:

>
>     [ https://issues.apache.org/jira/browse/GEODE-2125?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=15868740#comment-15868740 ]
>
> Swapnil Bawaskar commented on GEODE-2125:
> -----------------------------------------
>
> [~bschuchardt] Even though the logs clearly show that auto-reconnect went
> into effect, I would still suggest that we should special case the network
> partition detection logic so that, when there is only one member connected
> to the locator, the locator does not fence itself.
>
> > GFSH cannot communicate with Locators that go into reconnect mode
> > -----------------------------------------------------------------
> >
> >                 Key: GEODE-2125
> >                 URL: https://issues.apache.org/jira/browse/GEODE-2125
> >             Project: Geode
> >          Issue Type: Bug
> >          Components: management
> >    Affects Versions: 1.0.0-incubating
> >            Reporter: Kirk Lund
> >            Assignee: Kirk Lund
> >         Attachments: locator_failure-logs.txt, thread_dump.txt
> >
> >
> > If the Locator is started from GFSH and the cluster's only server is
> killed, network partition detection will initiate forceDisconnect in the
> Locator and leave it in reconnect mode. To the User it will appear that the
> Locator crashed and GFSH lost connection:
> > {noformat}
> > gfsh>
> > No longer connected to 192.168.1.72[1099].
> > {noformat}
> > During the time in which the Locator is in reconnect mode, the User
> cannot connect via GFSH, nor can they issue status or stop commands against
> it:
> > {noformat}
> > $ cd locator1
> > $ cat vf.gf.locator.pid
> > 33959
> > $ ps 33959
> >   PID   TT  STAT      TIME COMMAND
> > 33959 s001  S      0:19.97 /Library/Java/JavaVirtualMachines/jdk1.8.0_
> 66.jdk/Co
> > {noformat}
> > In GFSH:
> > {noformat}
> > gfsh>connect --locator=localhost[10334]
> > Connecting to Locator at [host=localhost, port=10334] ..
> > Connection refused
> > gfsh>status locator --pid=33959
> > null
> > gfsh>status locator --dir=locator1
> > null
> > gfsh>stop locator --dir=locator1
> > Locator in /Users/klund/dev/geode/locator1 on null is currently not
> responding.
> > gfsh>stop locator --pid=33959
> > Locator in /Users/klund/dev/geode on null is currently not responding.
> > {noformat}
> > If a Locator has GFSH connected then it should notify GFSH that it is
> going to forceDisconnect and go into reconnect mode. Then GFSH can notify
> the User so the User is not suprised.
> > In addition, GFSH status and stop commands should be modified to be able
> to talk to a Locator in reconnect mode. GFSH start could also be modified
> to report that the Locator is running in reconnect mode instead of
> reporting a hung process in the Locator's directory.
> > Attachments:
> > * The Locator log file is attached as locator_failure-logs.txt
> > * The Locator thread dump (via jstack) AFTER it has shut down due to
> forceDisconnect is attached as thread_dump.txt
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.15#6346)
>

Re: [jira] [Commented] (GEODE-2125) GFSH cannot communicate with Locators that go into reconnect mode

Reply via email to