[ 
https://issues.apache.org/jira/browse/SOLR-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Endika Posadas updated SOLR-5945:
---------------------------------
    Attachment:     (was: retryConnectingToZookeeper.patch)

> Add retry for zookeeper reconnect failure
> -----------------------------------------
>
>                 Key: SOLR-5945
>                 URL: https://issues.apache.org/jira/browse/SOLR-5945
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 4.7
>            Reporter: Jessica Cheng Mallet
>            Priority: Major
>              Labels: solrcloud, zookeeper
>
> We had some network issue where we temporarily lost connection and DNS. The 
> zookeeper client properly triggered the watcher. However, when trying to 
> reconnect, this following Exception is thrown:
> 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line 
> 121) :java.net.UnknownHostException: <host name (scrubbed)>: Name or service 
> not known
>         at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>         at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
>         at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
>         at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
>         at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>         at java.net.InetAddress.getAllByName(InetAddress.java:1063)
>         at 
> org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
>         at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
>         at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
>         at 
> org.apache.solr.common.cloud.SolrZooKeeper.<init>(SolrZooKeeper.java:41)
>         at 
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
>         at 
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> I tried to look at the code and it seems that there'd be no further retries 
> to connect to Zookeeper, and the node is basically left in a bad state and 
> will not recover on its own. (Please correct me if I'm reading this wrong.) 
> Thinking about it, this is probably fair, since normally you wouldn't expect 
> retries to fix an "unknown host" issue (even though in our case it would 
> have) but I'm wondering what we should do to handle this situation if it 
> happens again in the future.
> Any advice is appreciated.
> From Mark Miller:
> We don’t currently retry, but I don’t think it would hurt much if we did - at 
> least briefly.
> If you want to file a JIRA issue, that would be the best way to get it in a 
> future release.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to