[ https://issues.apache.org/jira/browse/GEODE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Nedzvetsky updated GEODE-4802: ------------------------------------- Summary: Geode cluster hung after network problems (was: Geode cluster hanged after network problems) > Geode cluster hung after network problems > ----------------------------------------- > > Key: GEODE-4802 > URL: https://issues.apache.org/jira/browse/GEODE-4802 > Project: Geode > Issue Type: Bug > Reporter: Eugene Nedzvetsky > Priority: Major > Attachments: clumsy2.jpg, threaddump.log > > > Test preparation: > # create file bin/server1/gemfire.properties with property > membership-port-range=2025-2030 > # create file bin/server2/gemfire.propertieswith property > membership-port-range=2035-2040 > # Download network problems emulator [https://jagt.github.io/clumsy] > # Fill field 'filtering' in Clumsy: tcp and (tcp.DstPort == 2025 or > tcp.DstPort == 2026 or tcp.DstPort == 2027 or tcp.DstPort == 2028 or > tcp.DstPort == 2029 or tcp.DstPort == 2030). Select function 'Drop' and set > Chance=100%. See clumsy2.jpg > Steps to reproduce > # Start gfsh > # start locator --name=locator1 > # start server --name=server1 --server-port=40411 > # start server --name=server2 --server-port=40412 > # create region --name=regionA --type=REPLICATE > # put --region=regionA --key="1" --value="one" > # Click on 'start' button in Clumsy > # put --region=regionA --key="1" --value="onev2" > # Wait *15s* and click on 'stop' in Clumsy > Gfsh console has hung. > bin\server1\server1.log: > [warning 2018/03/07 18:02:50.360 PST server1 <Function Execution Processor1> > tid=0x4b] 15 seconds have elapsed while waiting for replies: > <DistributedCacheOperation$CacheOperationReplyProcessor 22 waiting for 1 > replies from [192.168.100.109(server2:12804)<v2>:2035]> on > 192.168.100.109(server1:14416)<v1>:2045 whose current membership list is: > [[192.168.100.109(server2:12804)<v2>:2035, > 192.168.100.109(locator1:15628:locator)<ec><v0>:1024, > 192.168.100.109(server1:14416)<v1>:2045]] > Pulse has shown 'normal' status for both servers. > Gfsh works again if server1 process was killed. > Also i've reproduced another issue with the same scenario on my test > environment(see [^threaddump.log]) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)