[
https://issues.apache.org/jira/browse/HDFS-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043295#comment-18043295
]
ASF GitHub Bot commented on HDFS-15732:
---------------------------------------
github-actions[bot] commented on PR #2558:
URL: https://github.com/apache/hadoop/pull/2558#issuecomment-3621411041
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> EC client will not retry get block token when block token expired in
> kerberized cluster
> ----------------------------------------------------------------------------------------
>
> Key: HDFS-15732
> URL: https://issues.apache.org/jira/browse/HDFS-15732
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: dfsclient, ec, erasure-coding
> Affects Versions: 3.1.1
> Environment: hadoop 3.1.1
> kerberos
> ec RS-3-2-1024k
> Reporter: gaozhan ding
> Assignee: gaozhan ding
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> When enable ec policy on hbase, we got some issues. Once block token was
> expired in datanode side, client side will not identify the InvalidToken
> error because of the SASL negotiation. As a result, ec client will not do
> retry by refetch token when create blockreader. Then the peer datanode was
> added to DeadNodes, and all calls to function createBlockReader aim at this
> datanode in current DFSStripedInputStream will consider this datanode was
> dead and return false. The finally result is a read failure.
> Some logs :
> hbase regionserver:
> 2020-12-17 10:00:24,291 WARN
> [RpcServer.default.FPBQ.Fifo.handler=15,queue=0,port=16020] hdfs.DFSClient:
> Failed to connect to /10.65.19.41:9866 for
> blockBP-1601568648-10.65.19.12-1550823043026:blk_-9223372036813273566_672859566
> java.io.IOException: DIGEST-MD5: IO error acquiring password
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:421)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:479)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:393)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:267)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:215)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at
> org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:647)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2936)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:821)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
> at
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:647)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:272)
> at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:333)
> at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:365)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:514)
> at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1354)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1318)
> at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:808)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1568)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1772)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1597)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1496)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:340)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:856)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:806)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:327)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:228)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:395)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:250)
> at
> org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2031)
> at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2022)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6408)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6388)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2926)
>
>
> datanode log:
> 2020-12-17 10:00:24,290 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> hadoop-btdn0005.eniot.io:9866:DataXceiver error processing unknown operation
> src: /10.65.19.42:53894 dst: /10.65.19.41:9866
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Block
> token with block_token_identifier (expiryDate=1608192045256,
> keyId=1449537289, userId=hbase,
> blockPoolId=BP-1601568648-10.65.19.12-1550823043026,
> blockId=-9223372036813273566, access modes=[READ], storageTypes= [DISK, DISK,
> DISK, DISK, DISK], storageIds= [DS-604a9eaf-94ba-4127-b70d-436361c49ecd,
> DS-b53bf503-84c1-4e9e-aede-a19f88e3fc9a,
> DS-a1ee0117-7430-4181-b279-3f1b16541567,
> DS-e8459904-e1e1-44a6-affa-9ce9fc56f877,
> DS-9f54936c-96cc-4c85-91e6-a6f8e00f246a]) is expired.]
> at
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598)
> at
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Block
> token with block_token_identifier (expiryDate=1608192045256,
> keyId=1449537289, userId=hbase,
> blockPoolId=BP-1601568648-10.65.19.12-1550823043026,
> blockId=-9223372036813273566, access modes=[READ], storageTypes= [DISK, DISK,
> DISK, DISK, DISK], storageIds= [DS-604a9eaf-94ba-4127-b70d-436361c49ecd,
> DS-b53bf503-84c1-4e9e-aede-a19f88e3fc9a,
> DS-a1ee0117-7430-4181-b279-3f1b16541567,
> DS-e8459904-e1e1-44a6-affa-9ce9fc56f877,
> DS-9f54936c-96cc-4c85-91e6-a6f8e00f246a]) is expired.
> at
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:479)
> at
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:81)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241)
> at
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
> ... 7 more
>
> final result
>
>
> java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for
> reader
> reader=hdfs://azbeta/hbase/data/o15427722038191/RAW_569dfc35-10dc-4fe4-bb8c-5c403368d6c1/ccf146fb5c852d751b6b2f8307f81fa0/d/386bdb6978d348f4b4e168e728762842,
> compression=snappy, cacheConf=blockCache=LruBlockCache\{blockCount=7915,
> currentSize=489.13 MB, freeSize=2.71 GB, maxSize=3.19 GB, heapSize=489.13 MB,
> minSize=3.03 GB, minFactor=0.95, multiSize=1.52 GB, multiFactor=0.5,
> singleSize=775.81 MB, singleFactor=0.25}, cacheDataOnRead=true,
> cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false,
> cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false,
> firstKey=Optional[\x01\x00\x00\x13\xBD\x00\x00\x00T\x070\x97\xF7/d:arm_xingneng_test_model@AI1/1603445985858/Put/seqid=0],
>
> lastKey=Optional[\x01\x00\x00\x16\xF9\x00\x00\x00T\x073-8/d:arm_xingneng_test_model@AI99/1603446144699/Put/seqid=0],
> avgKeyLen=53, avgValueLen=4, entries=2016557, length=22957603,
> cur=\x01\x00\x00\x14#\x00\x00\x00T\x072\xCC\xF7/d:arm_xingneng_test_model@AI84/1603446131146/Put/vlen=4/seqid=17230816]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:217)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:388)
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:327)
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
> at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1410)
> at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2187)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:596)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:638)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: 3 missing blocks, the stripe is:
> AlignedStripe(Offset=1048576, length=1048576, fetchedChunksNum=0,
> missingChunksNum=3); locatedBlocks is: LocatedBlocks\{; fileLength=22957603;
> underConstruction=false;
> blocks=[LocatedStripedBlock{BP-1197414916-10.27.20.30-1535978156945:blk_-9223372036830849056_317784750;
> getBlockSize()=22957603; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.27.20.42:9866,DS-d0d5a7ce-8280-45fe-b910-1e5c7b579367,DISK],
>
> DatanodeInfoWithStorage[10.27.22.86:9866,DS-b018c729-66d5-4953-94f1-ec7664a46cb7,DISK],
>
> DatanodeInfoWithStorage[10.27.22.79:9866,DS-d79c402a-1845-4b3f-893e-f84d94085b2a,DISK],
>
> DatanodeInfoWithStorage[10.27.20.41:9866,DS-b07ba1e0-9a34-4caf-ad51-6d5f302c08ea,DISK],
>
> DatanodeInfoWithStorage[10.27.20.39:9866,DS-8a059d87-f122-4fed-a6a2-e45662692305,DISK]];
> indices=[0, 1, 2, 3, 4]}];
> lastLocatedBlock=LocatedStripedBlock\{BP-1197414916-10.27.20.30-1535978156945:blk_-9223372036830849056_317784750;
> getBlockSize()=22957603; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.27.20.42:9866,DS-d0d5a7ce-8280-45fe-b910-1e5c7b579367,DISK],
>
> DatanodeInfoWithStorage[10.27.22.86:9866,DS-b018c729-66d5-4953-94f1-ec7664a46cb7,DISK],
>
> DatanodeInfoWithStorage[10.27.22.79:9866,DS-d79c402a-1845-4b3f-893e-f84d94085b2a,DISK],
>
> DatanodeInfoWithStorage[10.27.20.41:9866,DS-b07ba1e0-9a34-4caf-ad51-6d5f302c08ea,DISK],
>
> DatanodeInfoWithStorage[10.27.20.39:9866,DS-8a059d87-f122-4fed-a6a2-e45662692305,DISK]];
> indices=[0, 1, 2, 3, 4]}; isLastBlockComplete=true;
> ecPolicy=ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs,
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2]}
> at
> org.apache.hadoop.hdfs.StripeReader.checkMissingBlocks(StripeReader.java:177)
> at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:372)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:324)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:397)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:820)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:765)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1562)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1772)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1597)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1496)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:931)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1064)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1058)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1076)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208)
> ... 13 more
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]