[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Resolution: Fixed Fix Version/s: 2.2.0-alpha Target Version/s: (was: 2.2.0-alpha) Status: Resolved (was: Patch Available) I've committed this and merged to branch-2. Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0, 2.2.0-alpha Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt, hdfs-3150.txt, hdfs-3150.txt, hdfs-3150.txt, hdfs-3150.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Attachment: hdfs-3150.txt Patch attached again since it looks like jira lost the old version. Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt, hdfs-3150.txt, hdfs-3150.txt, hdfs-3150.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Attachment: hdfs-3150.txt Updated patch attached. Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt, hdfs-3150.txt, hdfs-3150.txt, hdfs-3150.txt, hdfs-3150.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Attachment: hdfs-3150.txt Trunk patch attached. Also makes DFS_DATANODE_HOST_NAME_KEY work when used from mini clusters. This was working and was broken by HDFS-2284 (which unconditionally clobbers). I've tested this on a local tarball install, I'm also going to test it on a cluster with the relevant configuration (public and private interfaces). Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt, hdfs-3150.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Status: Patch Available (was: Reopened) Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt, hdfs-3150.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Issue Type: New Feature (was: Sub-task) Parent: (was: HDFS-3140) Add option for clients to contact DNs via hostname in branch-1 -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt Per the document attached to HADOOP-8198, this is just for branch-1, and unbreaks DN multihoming. The datanode can be configured to listen on a bond, or all interfaces by specifying the wildcard in the dfs.datanode.*.address configuration options, however per HADOOP-6867 only the source address of the registration is exposed to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming. In order to fix it let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP I'd like to go with approach #2 as it does not require making an incompatible change to the client protocol, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based on the context the ID is being used in, vs always using the IP:xferPort as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Description: The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP Approach #2 does not require an incompatible client protocol change, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. was: Per the document attached to HADOOP-8198, this is just for branch-1, and unbreaks DN multihoming. The datanode can be configured to listen on a bond, or all interfaces by specifying the wildcard in the dfs.datanode.*.address configuration options, however per HADOOP-6867 only the source address of the registration is exposed to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming. In order to fix it let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP I'd like to go with approach #2 as it does not require making an incompatible change to the client protocol, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based on the context the ID is being used in, vs always using the IP:xferPort as the Datanode's name, and using the name everywhere. Target Version/s: 2.2.0-alpha Affects Version/s: 1.0.0 2.0.0-alpha Summary: Add option for clients to contact DNs via hostname (was: Add option for clients to contact DNs via hostname in branch-1) Add option for clients to contact DNs via hostname -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Attachment: hdfs-3150-b1.txt Thanks Todd. Updated patch with your suggestions, since they're just name / log message changes, and this change is independent of the other changes in the umbrella jira I'll go ahead and commit this. Add option for clients to contact DNs via hostname in branch-1 -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3150-b1.txt, hdfs-3150-b1.txt Per the document attached to HADOOP-8198, this is just for branch-1, and unbreaks DN multihoming. The datanode can be configured to listen on a bond, or all interfaces by specifying the wildcard in the dfs.datanode.*.address configuration options, however per HADOOP-6867 only the source address of the registration is exposed to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming. In order to fix it let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP I'd like to go with approach #2 as it does not require making an incompatible change to the client protocol, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based on the context the ID is being used in, vs always using the IP:xferPort as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3150) Add option for clients to contact DNs via hostname in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3150: -- Attachment: hdfs-3150-b1.txt Patch attached. Aside from the new tests I confirmed via debug logging that we're now connecting to DNs by hostname. Add option for clients to contact DNs via hostname in branch-1 -- Key: HDFS-3150 URL: https://issues.apache.org/jira/browse/HDFS-3150 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-3150-b1.txt Per the document attached to HADOOP-8198, this is just for branch-1, and unbreaks DN multihoming. The datanode can be configured to listen on a bond, or all interfaces by specifying the wildcard in the dfs.datanode.*.address configuration options, however per HADOOP-6867 only the source address of the registration is exposed to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming. In order to fix it let's add back the option for Datanodes to be accessed by hostname. This can be done by: # Modifying the primary field of the Datanode descriptor to be the hostname, or # Modifying Client/Datanode - Datanode access use the hostname field instead of the IP I'd like to go with approach #2 as it does not require making an incompatible change to the client protocol, and is much less invasive. It minimizes the scope of modification to just places where clients and Datanodes connect, vs changing all uses of Datanode identifiers. New client and Datanode configuration options are introduced: - {{dfs.client.use.datanode.hostname}} indicates all client to datanode connections should use the datanode hostname (as clients outside cluster may not be able to route the IP) - {{dfs.datanode.use.datanode.hostname}} indicates whether Datanodes should use hostnames when connecting to other Datanodes for data transfer If the configuration options are not used, there is no change in the current behavior. I'm doing something similar to #1 btw in trunk in HDFS-3144 - refactoring the use of DatanodeID to use the right field (IP, IP:xferPort, hostname, etc) based on the context the ID is being used in, vs always using the IP:xferPort as the Datanode's name, and using the name everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira