[
https://issues.apache.org/jira/browse/SOLR-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jessica Cheng updated SOLR-5945:
--------------------------------
Description:
We had some network issue where we temporarily lost connection and DNS. The
zookeeper client properly triggered the watcher. However, when trying to
reconnect, this following Exception is thrown:
2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line 121)
:java.net.UnknownHostException: <host name (scrubbed)>: Name or service not
known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
at
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getAllByName(InetAddress.java:1063)
at
org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at
org.apache.solr.common.cloud.SolrZooKeeper.<init>(SolrZooKeeper.java:41)
at
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
at
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
I tried to look at the code and it seems that there'd be no further retries to
connect to Zookeeper, and the node is basically left in a bad state and will
not recover on its own. (Please correct me if I'm reading this wrong.) Thinking
about it, this is probably fair, since normally you wouldn't expect retries to
fix an "unknown host" issue (even though in our case it would have) but I'm
wondering what we should do to handle this situation if it happens again in the
future.
Any advice is appreciated.
>From Mark Miller:
We don’t currently retry, but I don’t think it would hurt much if we did - at
least briefly.
If you want to file a JIRA issue, that would be the best way to get it in a
future release.
was:
We had some network issue where we temporarily lost connection and DNS. The
zookeeper client properly triggered the watcher. However, when trying to
reconnect, this following Exception is thrown:
2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line 121)
:java.net.UnknownHostException: <host name (scrubbed)>: Name or service not
known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
at
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getAllByName(InetAddress.java:1063)
at
org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at
org.apache.solr.common.cloud.SolrZooKeeper.<init>(SolrZooKeeper.java:41)
at
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
at
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
I tried to look at the code and it seems that there'd be no further retries to
connect to Zookeeper, and the node is basically left in a bad state and will
not recover on its own. (Please correct me if I'm reading this wrong.) Thinking
about it, this is probably fair, since normally you wouldn't expect retries to
fix an "unknown host" issue--even though in our case it would have--but I'm
wondering what we should do to handle this situation if it happens again in the
future.
Any advice is appreciated.
>From Mark Miller:
We don’t currently retry, but I don’t think it would hurt much if we did - at
least briefly.
If you want to file a JIRA issue, that would be the best way to get it in a
future release.
> Add retry for zookeeper reconnect failure
> -----------------------------------------
>
> Key: SOLR-5945
> URL: https://issues.apache.org/jira/browse/SOLR-5945
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Affects Versions: 4.7
> Reporter: Jessica Cheng
> Labels: solrcloud, zookeeper
>
> We had some network issue where we temporarily lost connection and DNS. The
> zookeeper client properly triggered the watcher. However, when trying to
> reconnect, this following Exception is thrown:
> 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line
> 121) :java.net.UnknownHostException: <host name (scrubbed)>: Name or service
> not known
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
> at
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
> at java.net.InetAddress.getAllByName(InetAddress.java:1127)
> at java.net.InetAddress.getAllByName(InetAddress.java:1063)
> at
> org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
> at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
> at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
> at
> org.apache.solr.common.cloud.SolrZooKeeper.<init>(SolrZooKeeper.java:41)
> at
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
> at
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> I tried to look at the code and it seems that there'd be no further retries
> to connect to Zookeeper, and the node is basically left in a bad state and
> will not recover on its own. (Please correct me if I'm reading this wrong.)
> Thinking about it, this is probably fair, since normally you wouldn't expect
> retries to fix an "unknown host" issue (even though in our case it would
> have) but I'm wondering what we should do to handle this situation if it
> happens again in the future.
> Any advice is appreciated.
> From Mark Miller:
> We don’t currently retry, but I don’t think it would hurt much if we did - at
> least briefly.
> If you want to file a JIRA issue, that would be the best way to get it in a
> future release.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]