[ 
https://issues.apache.org/jira/browse/GEODE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hale Bales resolved GEODE-9906.
-------------------------------
    Resolution: Won't Fix

Relates to version 1.0.0-incubating

> Unable to reconnect a node after SO patching "15 seconds have elapsed while 
> waiting for replies"
> ------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9906
>                 URL: https://issues.apache.org/jira/browse/GEODE-9906
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Marco Baldessari
>            Priority: Major
>
> I have a cluster situation consisting of 4 total nodes, 3 servers and 1 
> management node, working properly.
> At the beginning of the month we planned to patch the OS and we started from 
> the first server node with this procedure:
> - Stop service
> - S.O. patching
> - Server restart
> - Start service
> The service of the first patched node named "serverA" fails to restart with 
> this error:
> Log entries cluster join:
> serverA:
> | INFO  | region-dm-12                 | ache.geode.internal.tcp.Connection | 
> --> Connection: shared=true ordered=false failed to connect to peer 
> 10.237.110.195( Server serverB:9993)<ec><v127>:1024 because: 
> java.net.ConnectException: Connection timed out (Connection timed out)
> | WARN  | region-dm-12               | ache.geode.internal.tcp.Connection | 
> --> Connection: Attempting reconnect to peer  10.237.110.195( Server 
> serverB:9993)<ec><v127>:1024
>  
> ServerMgmt:
> | WARN  | pool-3-thread-1              | tributed.internal.ReplyProcessor21   
>   | --> 15 seconds have elapsed while waiting for replies: 
> <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies 
> from [10.237.110.194( Server serverA:632)<ec><v174>:1024]> on 10.237.110.225( 
> Management:6033)<ec><v111>:1024 whose current membership list is: 
> [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225( 
> Management:6033)<ec><v111>:1024, 10.237.110.195( Server 
> serverB:9993)<ec><v127>:1024, 10.237.110.194( Server 
> serverA:632)<ec><v174>:1024]]
>  
> The connection between the systems was verified with tcpdumps, udp 1024 is 
> running fine.
>  
> We have tried redeploying the service and making numerous attempts but we 
> always get the same error during startup.
> Any idea? Thank you.
> Marco.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to