[ https://issues.apache.org/jira/browse/HADOOP-7472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071964#comment-13071964 ]
Kihwal Lee commented on HADOOP-7472: ------------------------------------ We are working on a new approach, which will address both 1 and 2. bq. Upper layers pass the InetSocketAddress down. They do not hold on to it. They don't. It's that they create an InetSocketAddress and the lower layers have no way of knowing what it was originally instantiated with. This is a headache when dealing with tokens. bq.One of the things I was thinking was to replace InetSocketAddress to the underlying layers with a wrapper, which allows updating the address with new resolved address. We thought about this. Darren is working on the token renewal problem and we found out we can have a common solution. One way was to do what you mentioned. But decided to keep it as is but use createUnresolved() to create an InetSocketAddress, so that we know what was used to instantiate it. If the user slapped in an IP address to begin with, we won't handle it. (I think it was indistinguishable before) The token will have whatever the user used (IP or name) in the beginning and in case of using name, the key to the token cache won't change even with addr changes. So the delegation token should continue to work. As for HADOOP-7380, this is for the HA case, where the identity of the two name nodes are known beforehand. The failover proxy is for switching between the pre-configured two. Since this is the HA strategy for 0.23, I don't think this patch will be applicable to the trunk. > RPC client should deal with the IP address changes > -------------------------------------------------- > > Key: HADOOP-7472 > URL: https://issues.apache.org/jira/browse/HADOOP-7472 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc > Affects Versions: 0.20.205.0 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Minor > Fix For: 0.20.205.0 > > Attachments: addr_change_dfs-1.patch.txt, addr_change_dfs.patch.txt > > > The current RPC client implementation and the client-side callers assume that > the hostname-address mappings of servers never change. The resolved address > is stored in an immutable InetSocketAddress object above/outside RPC, and the > reconnect logic in the RPC Connection implementation also trusts the resolved > address that was passed down. > If the NN suffers a failure that requires migration, it may be started on a > different node with a different IP address. In this case, even if the > name-address mapping is updated in DNS, the cluster is stuck trying old > address until the whole cluster is restarted. > The RPC client-side should detect this situation and exit or try to recover. > Updating ConnectionId within the Client implementation may get the system > work for the moment, there always is a risk of the cached address:port become > connectable again unintentionally. The real solution will be notifying upper > layer of the address change so that they can re-resolve and retry or > re-architecture the system as discussed in HDFS-34. > For 0.20 lines, some type of compromise may be acceptable. For example, raise > a custom exception for some well-defined high-impact upper layer to do > re-resolve/retry, while other will have to restart. For TRUNK, the HA work > will most likely determine what needs to be done. So this Jira won't cover > the solutions for TRUNK. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira