[jira] [Commented] (HADOOP-12125) Retrying UnknownHostException on a proxy does not actually retry hostname resolution
[ https://issues.apache.org/jira/browse/HADOOP-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684427#comment-16684427 ] John Zhuge commented on HADOOP-12125: - Working on it [~medb]. > Retrying UnknownHostException on a proxy does not actually retry hostname > resolution > > > Key: HADOOP-12125 > URL: https://issues.apache.org/jira/browse/HADOOP-12125 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Reporter: Jason Lowe >Assignee: John Zhuge >Priority: Major > > When RetryInvocationHandler attempts to retry an UnknownHostException the > hostname fails to be resolved again. The InetSocketAddress in the > ConnectionId has cached the fact that the hostname is unresolvable, and when > the proxy tries to setup a new Connection object with that ConnectionId it > checks if the (cached) resolution result is unresolved and immediately throws. > The end result is we sleep and retry for no benefit. The hostname resolution > is never attempted again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12125) Retrying UnknownHostException on a proxy does not actually retry hostname resolution
[ https://issues.apache.org/jira/browse/HADOOP-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682187#comment-16682187 ] Igor Dvorzhak commented on HADOOP-12125: [~jzhuge] May you share a patch with the fix that your colleague has? > Retrying UnknownHostException on a proxy does not actually retry hostname > resolution > > > Key: HADOOP-12125 > URL: https://issues.apache.org/jira/browse/HADOOP-12125 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Reporter: Jason Lowe >Assignee: John Zhuge >Priority: Major > > When RetryInvocationHandler attempts to retry an UnknownHostException the > hostname fails to be resolved again. The InetSocketAddress in the > ConnectionId has cached the fact that the hostname is unresolvable, and when > the proxy tries to setup a new Connection object with that ConnectionId it > checks if the (cached) resolution result is unresolved and immediately throws. > The end result is we sleep and retry for no benefit. The hostname resolution > is never attempted again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12125) Retrying UnknownHostException on a proxy does not actually retry hostname resolution
[ https://issues.apache.org/jira/browse/HADOOP-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610935#comment-16610935 ] John Zhuge commented on HADOOP-12125: - My colleague has a fix for this issue. > Retrying UnknownHostException on a proxy does not actually retry hostname > resolution > > > Key: HADOOP-12125 > URL: https://issues.apache.org/jira/browse/HADOOP-12125 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Reporter: Jason Lowe >Assignee: John Zhuge >Priority: Major > > When RetryInvocationHandler attempts to retry an UnknownHostException the > hostname fails to be resolved again. The InetSocketAddress in the > ConnectionId has cached the fact that the hostname is unresolvable, and when > the proxy tries to setup a new Connection object with that ConnectionId it > checks if the (cached) resolution result is unresolved and immediately throws. > The end result is we sleep and retry for no benefit. The hostname resolution > is never attempted again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12125) Retrying UnknownHostException on a proxy does not actually retry hostname resolution
[ https://issues.apache.org/jira/browse/HADOOP-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404983#comment-16404983 ] Rushabh S Shah commented on HADOOP-12125: - [~jzhuge]: I don't have enough cycles to work on this jira. Please go ahead and re-assign if you plan to work on this. > Retrying UnknownHostException on a proxy does not actually retry hostname > resolution > > > Key: HADOOP-12125 > URL: https://issues.apache.org/jira/browse/HADOOP-12125 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Reporter: Jason Lowe >Assignee: Rushabh S Shah >Priority: Major > > When RetryInvocationHandler attempts to retry an UnknownHostException the > hostname fails to be resolved again. The InetSocketAddress in the > ConnectionId has cached the fact that the hostname is unresolvable, and when > the proxy tries to setup a new Connection object with that ConnectionId it > checks if the (cached) resolution result is unresolved and immediately throws. > The end result is we sleep and retry for no benefit. The hostname resolution > is never attempted again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12125) Retrying UnknownHostException on a proxy does not actually retry hostname resolution
[ https://issues.apache.org/jira/browse/HADOOP-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399707#comment-16399707 ] John Zhuge commented on HADOOP-12125: - [~shahrs87] and [~jlowe], any progress? We hit the same issue when the non-HA NN went down and AWS spun up another NN instance with a different IP address. Both Job History Server and Spark History Server were stuck because NameNodeProxy held on to the old IP address. > Retrying UnknownHostException on a proxy does not actually retry hostname > resolution > > > Key: HADOOP-12125 > URL: https://issues.apache.org/jira/browse/HADOOP-12125 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Reporter: Jason Lowe >Assignee: Rushabh S Shah >Priority: Major > > When RetryInvocationHandler attempts to retry an UnknownHostException the > hostname fails to be resolved again. The InetSocketAddress in the > ConnectionId has cached the fact that the hostname is unresolvable, and when > the proxy tries to setup a new Connection object with that ConnectionId it > checks if the (cached) resolution result is unresolved and immediately throws. > The end result is we sleep and retry for no benefit. The hostname resolution > is never attempted again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12125) Retrying UnknownHostException on a proxy does not actually retry hostname resolution
[ https://issues.apache.org/jira/browse/HADOOP-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608579#comment-14608579 ] Kihwal Lee commented on HADOOP-12125: - What happens when the address is really bad due to misconfiguration, dns update, etc.? Instead of retrying forever, we want it to fail after some time. As long as there is a way to properly propagate the failure up, I am fine with this approach. The retry proxy and the app should realize that the rpc proxy is unusable. It should not use the same proxy to retry. > Retrying UnknownHostException on a proxy does not actually retry hostname > resolution > > > Key: HADOOP-12125 > URL: https://issues.apache.org/jira/browse/HADOOP-12125 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Reporter: Jason Lowe > > When RetryInvocationHandler attempts to retry an UnknownHostException the > hostname fails to be resolved again. The InetSocketAddress in the > ConnectionId has cached the fact that the hostname is unresolvable, and when > the proxy tries to setup a new Connection object with that ConnectionId it > checks if the (cached) resolution result is unresolved and immediately throws. > The end result is we sleep and retry for no benefit. The hostname resolution > is never attempted again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12125) Retrying UnknownHostException on a proxy does not actually retry hostname resolution
[ https://issues.apache.org/jira/browse/HADOOP-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603668#comment-14603668 ] Jason Lowe commented on HADOOP-12125: - This is very similar to HDFS-8068, but that tries to work around the issue at the application layer when it tries to setup the proxy. Ideally this should be handled as much as possible in the IPC layer itself so we can treat UnknownHostException like other retriable exceptions. One possible approach is to have the Connection constructor try to call the updateAddress method if the ConnectionId socket address is unresolved. Then we would actually try to re-resolve the address. One downside to this appraoch is we could end up with multiple clients for the same server, since the ConnectionId socket address is used as part of the hashcode. However this seems better than either retrying forever for no benefit or requiring app-level code to retry this code on their own when setting up the proxy. > Retrying UnknownHostException on a proxy does not actually retry hostname > resolution > > > Key: HADOOP-12125 > URL: https://issues.apache.org/jira/browse/HADOOP-12125 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Reporter: Jason Lowe > > When RetryInvocationHandler attempts to retry an UnknownHostException the > hostname fails to be resolved again. The InetSocketAddress in the > ConnectionId has cached the fact that the hostname is unresolvable, and when > the proxy tries to setup a new Connection object with that ConnectionId it > checks if the (cached) resolution result is unresolved and immediately throws. > The end result is we sleep and retry for no benefit. The hostname resolution > is never attempted again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)