[ 
https://issues.apache.org/jira/browse/GEODE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870922#comment-15870922
 ] 

Swapnil Bawaskar commented on GEODE-2125:
-----------------------------------------

[~bschuchardt] filed GEODE-2500. 

> GFSH cannot communicate with Locators that go into reconnect mode
> -----------------------------------------------------------------
>
>                 Key: GEODE-2125
>                 URL: https://issues.apache.org/jira/browse/GEODE-2125
>             Project: Geode
>          Issue Type: Bug
>          Components: management
>    Affects Versions: 1.0.0-incubating
>            Reporter: Kirk Lund
>            Assignee: Kirk Lund
>         Attachments: locator_failure-logs.txt, thread_dump.txt
>
>
> If the Locator is started from GFSH and the cluster's only server is killed, 
> network partition detection will initiate forceDisconnect in the Locator and 
> leave it in reconnect mode. To the User it will appear that the Locator 
> crashed and GFSH lost connection:
> {noformat}
> gfsh>
> No longer connected to 192.168.1.72[1099].
> {noformat}
> During the time in which the Locator is in reconnect mode, the User cannot 
> connect via GFSH, nor can they issue status or stop commands against it:
> {noformat}
> $ cd locator1
> $ cat vf.gf.locator.pid 
> 33959
> $ ps 33959
>   PID   TT  STAT      TIME COMMAND
> 33959 s001  S      0:19.97 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Co
> {noformat}
> In GFSH:
> {noformat}
> gfsh>connect --locator=localhost[10334]
> Connecting to Locator at [host=localhost, port=10334] ..
> Connection refused
> gfsh>status locator --pid=33959
> null
> gfsh>status locator --dir=locator1
> null
> gfsh>stop locator --dir=locator1
> Locator in /Users/klund/dev/geode/locator1 on null is currently not 
> responding.
> gfsh>stop locator --pid=33959
> Locator in /Users/klund/dev/geode on null is currently not responding.
> {noformat}
> If a Locator has GFSH connected then it should notify GFSH that it is going 
> to forceDisconnect and go into reconnect mode. Then GFSH can notify the User 
> so the User is not suprised.
> In addition, GFSH status and stop commands should be modified to be able to 
> talk to a Locator in reconnect mode. GFSH start could also be modified to 
> report that the Locator is running in reconnect mode instead of reporting a 
> hung process in the Locator's directory.
> Attachments:
> * The Locator log file is attached as locator_failure-logs.txt
> * The Locator thread dump (via jstack) AFTER it has shut down due to 
> forceDisconnect is attached as thread_dump.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to