[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063995#comment-17063995 ]
Steven Rand commented on HDFS-15191: ------------------------------------ [~vagarychen] I looked at this some more, and found that one difference after HDFS-14611 is that we call this from {{SaslDataTranserClient#doSaslHandshake}} in 3.2.1, but not in 3.2.0: {code} BlockTokenIdentifier blockTokenIdentifier = accessToken.decodeIdentifier(); {code} Maybe trying to call {{BlockTokenIdentifier.readFieldsLegacy}} with the legacy block token would also have failed in 3.2.0, but we don't get there when we try to read a block. Also, I used the debugger to look at the block token, and check what position we're at in the underlying {{DataInputStream}} during each call in {{BlockTokenIdentifier.readFieldsLegacy}}. All the calls before {{length = WritableUtils.readVInt(in);}} seem fine, but then we're just out of bytes by the time we get there. {code} # The DataInputStream has 74 bytes in it. expiryDate = WritableUtils.readVLong(in); # pos = 0 keyId = WritableUtils.readVInt(in); # pos = 7 userId = WritableUtils.readString(in); # pos = 12 blockPoolId = WritableUtils.readString(in); # pos = 21 blockId = WritableUtils.readVLong(in); # pos = 63 int length = WritableUtils.readVIntInRange(in, 0, AccessMode.class.getEnumConstants().length); # pos = 68 for (int i = 0; i < length; i++) { modes.add(WritableUtils.readEnum(in, AccessMode.class)); } # pos = 69 length = WritableUtils.readVInt(in); # pos = 74, which is equal to the count, so we're at the end of the stream ... more code, but we don't get to it ... {code} > EOF when reading legacy buffer in BlockTokenIdentifier > ------------------------------------------------------ > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.2.1 > Reporter: Steven Rand > Priority: Major > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org