[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415606#comment-13415606 ] Eli Collins commented on HDFS-3150: --- Making this a top-level issue since unbreaking multihoming is really orthogonal to HDFS-3140. > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248578#comment-13248578 ] Tsz Wo (Nicholas), SZE commented on HDFS-3150: -- Eli, I can understand that it is easy to make mistakes when getting busy. Simply relax and, maybe slow down a little bit. I might have given you a hard time although it was not my intention. > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248014#comment-13248014 ] Eli Collins commented on HDFS-3150: --- Nicholas, this was a simple misunderstanding, Todd was +1 modulo the variable name and log message, I thought he had actually +1'd on the jira but was mistaken (I've had a lot of patches in flight recently). We obviously intend to honor the bylaws. > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247955#comment-13247955 ] Tsz Wo (Nicholas), SZE commented on HDFS-3150: -- Eli, this is not a question on the quality of the patch but whether we should honor the bylaws. > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247362#comment-13247362 ] Eli Collins commented on HDFS-3150: --- @Suresh, @Nicholas - if you have a specific suggestion for something that needs to be addressed in this patch let me know. > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247326#comment-13247326 ] Tsz Wo (Nicholas), SZE commented on HDFS-3150: -- I did not know that the +1 could come after commit. > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247071#comment-13247071 ] Todd Lipcon commented on HDFS-3150: --- Sorry, I should have said "+1 assuming these changes are addressed" in my above comment. Since Eli addressed my comments, here's my official +1 for the patch. > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247020#comment-13247020 ] Suresh Srinivas commented on HDFS-3150: --- Given there are some discussions happening around +1s from committer, it is probably a good idea to wait for +1. Should we also keep release manager posted about this change? I generally post an email to hdfs/common dev about this kind of changes. > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243614#comment-13243614 ] Todd Lipcon commented on HDFS-3150: --- Mostly looks good, just some nits: {code} +LOG.info("Opened streaming server at " + tmpPort); {code} This isn't the terminology used elsewhere. "Data transfer server" or "data transceiver server" is better {code} // Connect to backup machine +final String dnName = targets[0].getName(connectToDnViaHostname); {code} I think better to call this {{mirrorName}} or {{mirrorAddrString}} {code} + final String dnName = proxySource.getName(connectToDnViaHostname); + InetSocketAddress proxyAddr = NetUtils.createSocketAddr(dnName); {code} Similar here -- {{proxyDnName}} or {{proxyAddrString}} > Add option for clients to contact DNs via hostname in branch-1 > -- > > Key: HDFS-3150 > URL: https://issues.apache.org/jira/browse/HDFS-3150 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, hdfs client >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-3150-b1.txt > > > Per the document attached to HADOOP-8198, this is just for branch-1, and > unbreaks DN multihoming. The datanode can be configured to listen on a bond, > or all interfaces by specifying the wildcard in the dfs.datanode.*.address > configuration options, however per HADOOP-6867 only the source address of the > registration is exposed to clients. HADOOP-985 made clients access datanodes > by IP primarily to avoid the latency of a DNS lookup, this had the side > effect of breaking DN multihoming. In order to fix it let's add back the > option for Datanodes to be accessed by hostname. This can be done by: > # Modifying the primary field of the Datanode descriptor to be the hostname, > or > # Modifying Client/Datanode <-> Datanode access use the hostname field > instead of the IP > I'd like to go with approach #2 as it does not require making an incompatible > change to the client protocol, and is much less invasive. It minimizes the > scope of modification to just places where clients and Datanodes connect, vs > changing all uses of Datanode identifiers. > New client and Datanode configuration options are introduced: > - {{dfs.client.use.datanode.hostname}} indicates all client to datanode > connections should use the datanode hostname (as clients outside cluster may > not be able to route the IP) > - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should > use hostnames when connecting to other Datanodes for data transfer > If the configuration options are not used, there is no change in the current > behavior. > I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the > use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) > based on the context the ID is being used in, vs always using the IP:xferPort > as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira