[ https://issues.apache.org/jira/browse/SOLR-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Endika Posadas updated SOLR-5945: --------------------------------- Attachment: (was: retryConnectingToZookeeper.patch) > Add retry for zookeeper reconnect failure > ----------------------------------------- > > Key: SOLR-5945 > URL: https://issues.apache.org/jira/browse/SOLR-5945 > Project: Solr > Issue Type: Improvement > Components: SolrCloud > Affects Versions: 4.7 > Reporter: Jessica Cheng Mallet > Priority: Major > Labels: solrcloud, zookeeper > > We had some network issue where we temporarily lost connection and DNS. The > zookeeper client properly triggered the watcher. However, when trying to > reconnect, this following Exception is thrown: > 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line > 121) :java.net.UnknownHostException: <host name (scrubbed)>: Name or service > not known > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866) > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258) > at java.net.InetAddress.getAllByName0(InetAddress.java:1211) > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > at java.net.InetAddress.getAllByName(InetAddress.java:1063) > at > org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60) > at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) > at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380) > at > org.apache.solr.common.cloud.SolrZooKeeper.<init>(SolrZooKeeper.java:41) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > I tried to look at the code and it seems that there'd be no further retries > to connect to Zookeeper, and the node is basically left in a bad state and > will not recover on its own. (Please correct me if I'm reading this wrong.) > Thinking about it, this is probably fair, since normally you wouldn't expect > retries to fix an "unknown host" issue (even though in our case it would > have) but I'm wondering what we should do to handle this situation if it > happens again in the future. > Any advice is appreciated. > From Mark Miller: > We don’t currently retry, but I don’t think it would hurt much if we did - at > least briefly. > If you want to file a JIRA issue, that would be the best way to get it in a > future release. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org