[ 
https://issues.apache.org/jira/browse/HDFS-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YUBI LEE updated HDFS-17877:
----------------------------
    Description: 
DataNodeID.updateRegInfo() updates hostName but misses hostNameBytes.

Since PBHelperClient.convert(DatanodeID) uses getHostNameBytes() for protobuf
serialization, clients end up receiving the stale hostname from before the
re-registration.

This becomes a real problem when a DataNode first registers with a PQDN and
later re-registers with a FQDN. With dfs.client.use.datanode.hostname=true,
the client tries to connect using the old PQDN and fails with
UnknownHostException.

The fix is to add hostNameBytes = nodeReg.getHostNameBytes() in updateRegInfo(),
same as how setIpAndXferPort() already handles ipAddr/ipAddrBytes together

In my environment, I use configurations as follows:

{code}
dfs.client.use.datanode.hostname=true
hadoop.security.token.service.use_ip=false
{code}


I got an UnknownHostException while reproducing this issue.
(hostname, IP address, and username are anonymized for privacy)

- {{datanode001}} should be {{datanode001.example.com}} as FQDN, not PQDN.
- I stopped the DataNode for approximately 10 minutes and 30 seconds (2 × 
dfs.namenode.heartbeat.recheck-interval + 10 × dfs.heartbeat.interval), so that 
it would be recognized as dead by the NameNode. After that, I restarted the 
DataNode, and also restarted both the active and standby NameNodes. After these 
steps, the issue was resolved.

{code}
$ HADOOP_ROOT_LOGGER=DEBUG,console yarn logs -applicationId 
application_1763013060073_51480 > application_1763013060073_51480.txt

26/01/29 18:42:09 DEBUG ipc.Client: IPC Client (307400933) connection to 
namenode001.example.com/10.1.1.2:9020 from [email protected] sending #138 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
26/01/29 18:42:09 DEBUG ipc.Client: IPC Client (307400933) connection to 
namenode001.example.com/10.1.1.2:9020 from [email protected] got value #138
26/01/29 18:42:09 DEBUG ipc.ProtobufRpcEngine2: Call: getFileInfo took 1ms
26/01/29 18:42:09 DEBUG hdfs.DFSClient: Connecting to datanode datanode001:9011
26/01/29 18:42:09 DEBUG impl.BlockReaderFactory: Block read failed. Getting 
remote block reader using TCP
java.io.IOException: Unresolved host: datanode001:9011
        at 
org.apache.hadoop.hdfs.DFSUtilClient.isLocalAddress(DFSUtilClient.java:640)
        at 
org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:152)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:472)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
        at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
        at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:685)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readLong(DataInputStream.java:416)
        at org.apache.hadoop.io.file.tfile.BCFile$Reader.<init>(BCFile.java:626)
        at org.apache.hadoop.io.file.tfile.TFile$Reader.<init>(TFile.java:804)
        at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.<init>(AggregatedLogFormat.java:581)
        at 
org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.readAggregatedLogs(LogAggregationTFileController.java:196)
        at 
org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:244)
        at 
org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:1185)
        at 
org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:374)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:139)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:403)
26/01/29 18:42:09 WARN impl.BlockReaderFactory: I/O error constructing remote 
block reader.
java.net.UnknownHostException: datanode001:9011
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
        at 
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3033)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:829)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:754)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:381)
        at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
        at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:685)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readLong(DataInputStream.java:416)
        at org.apache.hadoop.io.file.tfile.BCFile$Reader.<init>(BCFile.java:626)
        at org.apache.hadoop.io.file.tfile.TFile$Reader.<init>(TFile.java:804)
        at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.<init>(AggregatedLogFormat.java:581)
        at 
org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.readAggregatedLogs(LogAggregationTFileController.java:196)
        at 
org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:244)
        at 
org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:1185)
        at 
org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:374)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:139)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:403)
{code}

  was:
DataNodeID.updateRegInfo() updates hostName but misses hostNameBytes.

Since PBHelperClient.convert(DatanodeID) uses getHostNameBytes() for protobuf
serialization, clients end up receiving the stale hostname from before the
re-registration.

This becomes a real problem when a DataNode first registers with a PQDN and
later re-registers with a FQDN. With dfs.client.use.datanode.hostname=true,
the client tries to connect using the old PQDN and fails with
UnknownHostException.

The fix is to add hostNameBytes = nodeReg.getHostNameBytes() in updateRegInfo(),
same as how setIpAndXferPort() already handles ipAddr/ipAddrBytes together

In my environment, I use configurations as follows:

{code}
dfs.client.use.datanode.hostname=true
hadoop.security.token.service.use_ip=false
{code}


I got an UnknownHostException while reproducing this issue.
(hostname, IP address, and username are anonymized for privacy)

- `datanode001` should be `datanode001.example.com` as FQDN, not PQDN.
- I stopped the DataNode for approximately 10 minutes and 30 seconds (2 × 
dfs.namenode.heartbeat.recheck-interval + 10 × dfs.heartbeat.interval), so that 
it would be recognized as dead by the NameNode. After that, I restarted the 
DataNode, and also restarted both the active and standby NameNodes. After these 
steps, the issue was resolved.

{code}
$ HADOOP_ROOT_LOGGER=DEBUG,console yarn logs -applicationId 
application_1763013060073_51480 > application_1763013060073_51480.txt

26/01/29 18:42:09 DEBUG ipc.Client: IPC Client (307400933) connection to 
namenode001.example.com/10.1.1.2:9020 from [email protected] sending #138 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
26/01/29 18:42:09 DEBUG ipc.Client: IPC Client (307400933) connection to 
namenode001.example.com/10.1.1.2:9020 from [email protected] got value #138
26/01/29 18:42:09 DEBUG ipc.ProtobufRpcEngine2: Call: getFileInfo took 1ms
26/01/29 18:42:09 DEBUG hdfs.DFSClient: Connecting to datanode datanode001:9011
26/01/29 18:42:09 DEBUG impl.BlockReaderFactory: Block read failed. Getting 
remote block reader using TCP
java.io.IOException: Unresolved host: datanode001:9011
        at 
org.apache.hadoop.hdfs.DFSUtilClient.isLocalAddress(DFSUtilClient.java:640)
        at 
org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:152)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:472)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
        at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
        at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:685)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readLong(DataInputStream.java:416)
        at org.apache.hadoop.io.file.tfile.BCFile$Reader.<init>(BCFile.java:626)
        at org.apache.hadoop.io.file.tfile.TFile$Reader.<init>(TFile.java:804)
        at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.<init>(AggregatedLogFormat.java:581)
        at 
org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.readAggregatedLogs(LogAggregationTFileController.java:196)
        at 
org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:244)
        at 
org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:1185)
        at 
org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:374)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:139)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:403)
26/01/29 18:42:09 WARN impl.BlockReaderFactory: I/O error constructing remote 
block reader.
java.net.UnknownHostException: datanode001:9011
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
        at 
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3033)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:829)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:754)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:381)
        at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
        at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:685)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at java.io.DataInputStream.readLong(DataInputStream.java:416)
        at org.apache.hadoop.io.file.tfile.BCFile$Reader.<init>(BCFile.java:626)
        at org.apache.hadoop.io.file.tfile.TFile$Reader.<init>(TFile.java:804)
        at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.<init>(AggregatedLogFormat.java:581)
        at 
org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.readAggregatedLogs(LogAggregationTFileController.java:196)
        at 
org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:244)
        at 
org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:1185)
        at 
org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:374)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:139)
        at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:403)
{code}


> DatanodeID.updateRegInfo() does not update hostNameBytes causing stale 
> hostname on client
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-17877
>                 URL: https://issues.apache.org/jira/browse/HDFS-17877
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: YUBI LEE
>            Priority: Major
>              Labels: pull-request-available
>
> DataNodeID.updateRegInfo() updates hostName but misses hostNameBytes.
> Since PBHelperClient.convert(DatanodeID) uses getHostNameBytes() for protobuf
> serialization, clients end up receiving the stale hostname from before the
> re-registration.
> This becomes a real problem when a DataNode first registers with a PQDN and
> later re-registers with a FQDN. With dfs.client.use.datanode.hostname=true,
> the client tries to connect using the old PQDN and fails with
> UnknownHostException.
> The fix is to add hostNameBytes = nodeReg.getHostNameBytes() in 
> updateRegInfo(),
> same as how setIpAndXferPort() already handles ipAddr/ipAddrBytes together
> In my environment, I use configurations as follows:
> {code}
> dfs.client.use.datanode.hostname=true
> hadoop.security.token.service.use_ip=false
> {code}
> I got an UnknownHostException while reproducing this issue.
> (hostname, IP address, and username are anonymized for privacy)
> - {{datanode001}} should be {{datanode001.example.com}} as FQDN, not PQDN.
> - I stopped the DataNode for approximately 10 minutes and 30 seconds (2 × 
> dfs.namenode.heartbeat.recheck-interval + 10 × dfs.heartbeat.interval), so 
> that it would be recognized as dead by the NameNode. After that, I restarted 
> the DataNode, and also restarted both the active and standby NameNodes. After 
> these steps, the issue was resolved.
> {code}
> $ HADOOP_ROOT_LOGGER=DEBUG,console yarn logs -applicationId 
> application_1763013060073_51480 > application_1763013060073_51480.txt
> 26/01/29 18:42:09 DEBUG ipc.Client: IPC Client (307400933) connection to 
> namenode001.example.com/10.1.1.2:9020 from [email protected] sending #138 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
> 26/01/29 18:42:09 DEBUG ipc.Client: IPC Client (307400933) connection to 
> namenode001.example.com/10.1.1.2:9020 from [email protected] got value #138
> 26/01/29 18:42:09 DEBUG ipc.ProtobufRpcEngine2: Call: getFileInfo took 1ms
> 26/01/29 18:42:09 DEBUG hdfs.DFSClient: Connecting to datanode 
> datanode001:9011
> 26/01/29 18:42:09 DEBUG impl.BlockReaderFactory: Block read failed. Getting 
> remote block reader using TCP
> java.io.IOException: Unresolved host: datanode001:9011
>         at 
> org.apache.hadoop.hdfs.DFSUtilClient.isLocalAddress(DFSUtilClient.java:640)
>         at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:152)
>         at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:472)
>         at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:685)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
>         at java.io.DataInputStream.readFully(DataInputStream.java:195)
>         at java.io.DataInputStream.readLong(DataInputStream.java:416)
>         at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.<init>(BCFile.java:626)
>         at org.apache.hadoop.io.file.tfile.TFile$Reader.<init>(TFile.java:804)
>         at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.<init>(AggregatedLogFormat.java:581)
>         at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.readAggregatedLogs(LogAggregationTFileController.java:196)
>         at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:244)
>         at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:1185)
>         at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:374)
>         at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:139)
>         at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:403)
> 26/01/29 18:42:09 WARN impl.BlockReaderFactory: I/O error constructing remote 
> block reader.
> java.net.UnknownHostException: datanode001:9011
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:591)
>         at 
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3033)
>         at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:829)
>         at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:754)
>         at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:381)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:755)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:685)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957)
>         at java.io.DataInputStream.readFully(DataInputStream.java:195)
>         at java.io.DataInputStream.readLong(DataInputStream.java:416)
>         at 
> org.apache.hadoop.io.file.tfile.BCFile$Reader.<init>(BCFile.java:626)
>         at org.apache.hadoop.io.file.tfile.TFile$Reader.<init>(TFile.java:804)
>         at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.<init>(AggregatedLogFormat.java:581)
>         at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.readAggregatedLogs(LogAggregationTFileController.java:196)
>         at 
> org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:244)
>         at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:1185)
>         at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:374)
>         at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:139)
>         at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:403)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to