[ 
https://issues.apache.org/jira/browse/GEODE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Nedzvetsky updated GEODE-4802:
-------------------------------------
    Summary: Geode cluster hung after network problems  (was: Geode cluster 
hanged after network problems)

> Geode cluster hung after network problems
> -----------------------------------------
>
>                 Key: GEODE-4802
>                 URL: https://issues.apache.org/jira/browse/GEODE-4802
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Eugene Nedzvetsky
>            Priority: Major
>         Attachments: clumsy2.jpg, threaddump.log
>
>
> Test preparation:
>  # create file bin/server1/gemfire.properties with property 
> membership-port-range=2025-2030
>  # create file bin/server2/gemfire.propertieswith property 
> membership-port-range=2035-2040
>  # Download network problems emulator [https://jagt.github.io/clumsy]
>  # Fill field 'filtering' in Clumsy: tcp and (tcp.DstPort == 2025 or 
> tcp.DstPort == 2026 or tcp.DstPort == 2027 or tcp.DstPort == 2028 or 
> tcp.DstPort == 2029 or tcp.DstPort == 2030). Select function 'Drop' and set 
> Chance=100%. See clumsy2.jpg
> Steps to reproduce
>  # Start gfsh
>  # start locator --name=locator1
>  # start server --name=server1 --server-port=40411
>  # start server --name=server2 --server-port=40412
>  # create region --name=regionA --type=REPLICATE
>  # put --region=regionA --key="1" --value="one"
>  # Click on 'start' button in Clumsy
>  # put --region=regionA --key="1" --value="onev2"
>  # Wait *15s* and click on 'stop' in Clumsy
> Gfsh console has hung.
> bin\server1\server1.log:
> [warning 2018/03/07 18:02:50.360 PST server1 <Function Execution Processor1> 
> tid=0x4b] 15 seconds have elapsed while waiting for replies: 
> <DistributedCacheOperation$CacheOperationReplyProcessor 22 waiting for 1 
> replies from [192.168.100.109(server2:12804)<v2>:2035]> on 
> 192.168.100.109(server1:14416)<v1>:2045 whose current membership list is: 
> [[192.168.100.109(server2:12804)<v2>:2035, 
> 192.168.100.109(locator1:15628:locator)<ec><v0>:1024, 
> 192.168.100.109(server1:14416)<v1>:2045]]
> Pulse has shown 'normal' status for both servers.
> Gfsh works again if server1 process was killed.
> Also  i've reproduced another issue with the same scenario on my test 
> environment(see [^threaddump.log])
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to