[jira] [Resolved] (HDFS-17383) Datanode current block token should come from active NameNode in HA mode
[ https://issues.apache.org/jira/browse/HDFS-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17383. - Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Target Version/s: 3.5.0 Assignee: lei w Resolution: Fixed > Datanode current block token should come from active NameNode in HA mode > > > Key: HDFS-17383 > URL: https://issues.apache.org/jira/browse/HDFS-17383 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > Attachments: reproduce.diff > > > We found that transfer block failed during the namenode upgrade. The specific > error reported was that the block token verification failed. The reason is > that during the datanode transfer block process, the source datanode uses its > own generated block token, and the keyid comes from ANN or SBN. However, > because the newly upgraded NN has just been started, the keyid owned by the > source datanode may not be owned by the target datanode, so the write fails. > Here's how to reproduce this situation in the attachment -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp
[ https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17408. - Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Target Version/s: 3.5.0 Resolution: Fixed > Reduce the number of quota calculations in FSDirRenameOp > > > Key: HDFS-17408 > URL: https://issues.apache.org/jira/browse/HDFS-17408 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > During the execution of the rename operation, we first calculate the quota > for the source INode using verifyQuotaForRename, and at the same time, we > calculate the quota for the target INode. Subsequently, in > RenameOperation#removeSrc, RenameOperation#removeSrc4OldRename, and > RenameOperation#addSourceToDestination, the quota for the source directory is > calculated again. In exceptional cases, RenameOperation#restoreDst and > RenameOperation#restoreSource will also perform quota calculations for the > source and target directories. In fact, many of the quota calculations are > redundant and unnecessary, so we should optimize them away. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17401) EC: Excess internal block may not be able to be deleted correctly when it's stored in fallback storage
[ https://issues.apache.org/jira/browse/HDFS-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17401: Summary: EC: Excess internal block may not be able to be deleted correctly when it's stored in fallback storage (was: Erasure Coding: Excess internal block may not be able to be deleted correctly when it's stored in fallback storage) > EC: Excess internal block may not be able to be deleted correctly when it's > stored in fallback storage > -- > > Key: HDFS-17401 > URL: https://issues.apache.org/jira/browse/HDFS-17401 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: Ruinan Gu >Assignee: Ruinan Gu >Priority: Major > Labels: pull-request-available > > Excess internal block can't be deleted correctly when it's stored in fallback > storage. > Simple case: > EC-RS-6-3-1024k file is stored using ALL_SSD storage policy(SSD is default > storage type and DISK is fallback storage type), if the block group is as > follows > [0(SSD), 0(SSD), 1(SSD), 2(SSD), 3(SSD), 4(SSD), 5(SSD), 6(SSD), 7(SSD), > 8(DISK)] > The are two index 0 internal block and one of them should be chosen to > delete.But the current implement chooses the index 0 internal blocks as > candidates but DISK as exess storage type.As a result, the exess storage > type(DISK) can not correspond to the exess internal blocks' storage type(SSD) > correctly, and the exess internal block can not be deleted correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17345) Add a metrics to record block report generating cost time
[ https://issues.apache.org/jira/browse/HDFS-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17345. - Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Resolution: Fixed > Add a metrics to record block report generating cost time > - > > Key: HDFS-17345 > URL: https://issues.apache.org/jira/browse/HDFS-17345 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.5.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > Currently, we have block report send time metrics recorded by blockReports. > We should better add another metric to record block report creating cost time: > {code:java} > long brCreateCost = brSendStartTime - brCreateStartTime; {code} > It is useful for us to measure the perfomance of creating block reports. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up
[ https://issues.apache.org/jira/browse/HDFS-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17354: --- Assignee: lei w > Delay invoke clearStaleNamespacesInRouterStateIdContext during router start > up > --- > > Key: HDFS-17354 > URL: https://issues.apache.org/jira/browse/HDFS-17354 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > > We should start clear expired namespace thread at RouterRpcServer RUNNING > phase because StateStoreService is Initialized in initialization phase. > Now, router will throw IoException when start up. > {panel:title=Exception} > 2024-01-09 16:27:06,939 WARN > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not > fetch current list of namespaces. > java.io.IOException: State Store does not have an interface for > MembershipStore > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {panel} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17342) Fix DataNode may invalidates normal block causing missing block
[ https://issues.apache.org/jira/browse/HDFS-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17342. - Hadoop Flags: Reviewed Target Version/s: 3.5.0 Resolution: Fixed > Fix DataNode may invalidates normal block causing missing block > --- > > Key: HDFS-17342 > URL: https://issues.apache.org/jira/browse/HDFS-17342 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > When users read an append file, occasional exceptions may occur, such as > org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: xxx. > This can happen if one thread is reading the block while writer thread is > finalizing it simultaneously. > *Root cause:* > # The reader thread obtains a RBW replica from VolumeMap, such as: > blk_xxx_xxx[RBW] and the data file should be in /XXX/rbw/blk_xxx. > # Simultaneously, the writer thread will finalize this block, moving it from > the RBW directory to the FINALIZE directory. the data file is move from > /XXX/rbw/block_xxx to /XXX/finalize/block_xxx. > # The reader thread attempts to open this data input stream but encounters a > FileNotFoundException because the data file /XXX/rbw/blk_xxx or meta file > /XXX/rbw/blk_xxx_xxx doesn't exist at this moment. > # The reader thread will treats this block as corrupt, removes the replica > from the volume map, and the DataNode reports the deleted block to the > NameNode. > # The NameNode removes this replica for the block. > # If the current file replication is 1, this file will cause a missing block > issue until this DataNode executes the DirectoryScanner again. > As described above, when the reader thread encountered FileNotFoundException > is as expected, because the file is moved. > So we need to add a double check to the invalidateMissingBlock logic to > verify whether the data file or meta file exists to avoid similar cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17339) BPServiceActor should skip cacheReport when one blockPool does not have CacheBlock on this DataNode
[ https://issues.apache.org/jira/browse/HDFS-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17339. - Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Target Version/s: 3.5.0 Resolution: Fixed > BPServiceActor should skip cacheReport when one blockPool does not have > CacheBlock on this DataNode > --- > > Key: HDFS-17339 > URL: https://issues.apache.org/jira/browse/HDFS-17339 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Now, DataNode will cacheReport to all NameNode when CacheCapacitySize is not > zero. But sometimes, not all NameNodes have CacheBlock on this DataNode. So > BPServiceActor should skip cacheReport when one blockPool does not have > CacheBlock on this DataNode. If so, the NameNode will reduce unnecessary lock > contention -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17339) BPServiceActor should skip cacheReport when one blockPool does not have CacheBlock on this DataNode
[ https://issues.apache.org/jira/browse/HDFS-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17339: --- Assignee: lei w > BPServiceActor should skip cacheReport when one blockPool does not have > CacheBlock on this DataNode > --- > > Key: HDFS-17339 > URL: https://issues.apache.org/jira/browse/HDFS-17339 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > > Now, DataNode will cacheReport to all NameNode when CacheCapacitySize is not > zero. But sometimes, not all NameNodes have CacheBlock on this DataNode. So > BPServiceActor should skip cacheReport when one blockPool does not have > CacheBlock on this DataNode. If so, the NameNode will reduce unnecessary lock > contention -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17346) Fix DirectoryScanner check mark the normal blocks as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17346: Issue Type: Bug (was: Improvement) > Fix DirectoryScanner check mark the normal blocks as corrupt. > - > > Key: HDFS-17346 > URL: https://issues.apache.org/jira/browse/HDFS-17346 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > DirectoryScanner check mark the normal blocks as corrupt and report to > namenode, it maybe cause some corrupted blocks, actually these are health. > This can happen if Appending and DirectoryScanner are running at the same > time, and the probability is very high. > *Root cause:* > * Create a file such as:blk_xxx_1001 and diskFile is > "file:/XXX/current/finalized/blk_xxx", diskMetaFile is > "file:/XXX/current/finalized/blk_xxx_1001.meta" > * Run DirectoryScanner, first will create BlockPoolReport.ScanInfo and record > blockFile is "file:/XXX/current/finalized/blk_xxx" and metaFile is > "file:/XXX/current/finalized/blk_xxx_1001.meta" > * Simultaneously other thread to complete append for blk_xxx, then the > diskFile "file:/XXX/current/finalized/blk_xxx", diskMetaFile > "file:/XXX/current/finalized/blk_xxx_1002.meta", memMetaFile > "file:/XXX/current/finalized/blk_xxx", memDataFile > "file:/XXX/current/finalized/blk_xxx_1002.meta" > * DirectoryScanner continue to run, due to the different generation stamps of > the metadata file in mem and metadata file in scanInfo will add the scanInfo > object to the list of differences > * Continue to run FsDatasetImpl#checkAndUpdate will traverse the list of > differences, due to current diskMetaFile > "/XXX/current/finalized/blk_xxx_1001.meta" is not exists, so isRegular as > false > {code:java} > final boolean isRegular = FileUtil.isRegularFile(diskMetaFile, false) && > FileUtil.isRegularFile(diskFile, false); > {code} > * Here will mark the normal blocks as corrupt and report to namenode > {code:java} > } else if (!isRegular) { > corruptBlock = new Block(memBlockInfo); > LOG.warn("Block:{} is not a regular file.", corruptBlock.getBlockId()); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17346) Fix DirectoryScanner check mark the normal blocks as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17346. - Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Target Version/s: 3.5.0 Resolution: Fixed > Fix DirectoryScanner check mark the normal blocks as corrupt. > - > > Key: HDFS-17346 > URL: https://issues.apache.org/jira/browse/HDFS-17346 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > DirectoryScanner check mark the normal blocks as corrupt and report to > namenode, it maybe cause some corrupted blocks, actually these are health. > This can happen if Appending and DirectoryScanner are running at the same > time, and the probability is very high. > *Root cause:* > * Create a file such as:blk_xxx_1001 and diskFile is > "file:/XXX/current/finalized/blk_xxx", diskMetaFile is > "file:/XXX/current/finalized/blk_xxx_1001.meta" > * Run DirectoryScanner, first will create BlockPoolReport.ScanInfo and record > blockFile is "file:/XXX/current/finalized/blk_xxx" and metaFile is > "file:/XXX/current/finalized/blk_xxx_1001.meta" > * Simultaneously other thread to complete append for blk_xxx, then the > diskFile "file:/XXX/current/finalized/blk_xxx", diskMetaFile > "file:/XXX/current/finalized/blk_xxx_1002.meta", memMetaFile > "file:/XXX/current/finalized/blk_xxx", memDataFile > "file:/XXX/current/finalized/blk_xxx_1002.meta" > * DirectoryScanner continue to run, due to the different generation stamps of > the metadata file in mem and metadata file in scanInfo will add the scanInfo > object to the list of differences > * Continue to run FsDatasetImpl#checkAndUpdate will traverse the list of > differences, due to current diskMetaFile > "/XXX/current/finalized/blk_xxx_1001.meta" is not exists, so isRegular as > false > {code:java} > final boolean isRegular = FileUtil.isRegularFile(diskMetaFile, false) && > FileUtil.isRegularFile(diskFile, false); > {code} > * Here will mark the normal blocks as corrupt and report to namenode > {code:java} > } else if (!isRegular) { > corruptBlock = new Block(memBlockInfo); > LOG.warn("Block:{} is not a regular file.", corruptBlock.getBlockId()); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17293) First packet data + checksum size will be set to 516 bytes when writing to a new block.
[ https://issues.apache.org/jira/browse/HDFS-17293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17293. - Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Resolution: Fixed > First packet data + checksum size will be set to 516 bytes when writing to a > new block. > --- > > Key: HDFS-17293 > URL: https://issues.apache.org/jira/browse/HDFS-17293 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > First packet size will be set to 516 bytes when writing to a new block. > In method computePacketChunkSize, the parameters psize and csize would be > (0, 512) > when writting to a new block. It should better use writePacketSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17331) Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in federationhealth.html
[ https://issues.apache.org/jira/browse/HDFS-17331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17331. - Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Target Version/s: 3.5.0 Assignee: lei w Resolution: Fixed > Fix Blocks are always -1 and DataNode`s version are always UNKNOWN in > federationhealth.html > --- > > Key: HDFS-17331 > URL: https://issues.apache.org/jira/browse/HDFS-17331 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > Attachments: After fix.png, Before fix.png > > > Blocks are always -1 and DataNode`s version are always UNKNOWN in > federationhealth.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17283) Change the name of variable SECOND in HdfsClientConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17283: Target Version/s: 3.5.0 (was: 3.4.0) > Change the name of variable SECOND in HdfsClientConfigKeys > -- > > Key: HDFS-17283 > URL: https://issues.apache.org/jira/browse/HDFS-17283 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Trivial > Labels: pull-request-available > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17291) DataNode metric bytesWritten is not totally accurate in some situations.
[ https://issues.apache.org/jira/browse/HDFS-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17291: Target Version/s: 3.5.0 (was: 3.4.0) > DataNode metric bytesWritten is not totally accurate in some situations. > > > Key: HDFS-17291 > URL: https://issues.apache.org/jira/browse/HDFS-17291 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > As the title described, dataNode metric bytesWritten is not totally accurate > in some situations, such as failure recovery, re-send data. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17291) DataNode metric bytesWritten is not totally accurate in some situations.
[ https://issues.apache.org/jira/browse/HDFS-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17291: Fix Version/s: 3.5.0 (was: 3.4.0) > DataNode metric bytesWritten is not totally accurate in some situations. > > > Key: HDFS-17291 > URL: https://issues.apache.org/jira/browse/HDFS-17291 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > As the title described, dataNode metric bytesWritten is not totally accurate > in some situations, such as failure recovery, re-send data. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17337) RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync.
[ https://issues.apache.org/jira/browse/HDFS-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17337: Fix Version/s: 3.5.0 (was: 3.4.0) > RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync. > --- > > Key: HDFS-17337 > URL: https://issues.apache.org/jira/browse/HDFS-17337 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Currently, FSEditLogAsync is enabled by default. > We have below codes in method Server$RpcCall#run: > > {code:java} > if (!isResponseDeferred()) { > long deltaNanos = Time.monotonicNowNanos() - startNanos; > ProcessingDetails details = getProcessingDetails(); > details.set(Timing.PROCESSING, deltaNanos, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKWAIT, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKSHARED, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKEXCLUSIVE, TimeUnit.NANOSECONDS); > details.set(Timing.LOCKFREE, deltaNanos, TimeUnit.NANOSECONDS); > startNanos = Time.monotonicNowNanos(); > setResponseFields(value, responseParams); > sendResponse(); > deltaNanos = Time.monotonicNowNanos() - startNanos; > details.set(Timing.RESPONSE, deltaNanos, TimeUnit.NANOSECONDS); > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("Deferring response for callId: " + this.callId); > } > }{code} > It computes Timing.RESPONSE of a RpcCall using *Time.monotonicNowNanos() - > startNanos;* > However, if we use async editlogging, we will not send response here but in > FSEditLogAsync.RpcEdit#logSyncNotify. > This causes the Timing.RESPONSE of a RpcCall not be exactly accurate. > {code:java} > @Override > public void logSyncNotify(RuntimeException syncEx) { > try { > if (syncEx == null) { > call.sendResponse(); > } else { > call.abortResponse(syncEx); > } > } catch (Exception e) {} // don't care if not sent. > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17283) Change the name of variable SECOND in HdfsClientConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17283: Fix Version/s: 3.5.0 (was: 3.4.0) > Change the name of variable SECOND in HdfsClientConfigKeys > -- > > Key: HDFS-17283 > URL: https://issues.apache.org/jira/browse/HDFS-17283 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Trivial > Labels: pull-request-available > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.
[ https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17289: Target Version/s: 3.5.0 (was: 3.4.0) > Considering the size of non-lastBlocks equals to complete block size can > cause append failure. > -- > > Key: HDFS-17289 > URL: https://issues.apache.org/jira/browse/HDFS-17289 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.
[ https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17289: Fix Version/s: 3.5.0 (was: 3.4.0) > Considering the size of non-lastBlocks equals to complete block size can > cause append failure. > -- > > Key: HDFS-17289 > URL: https://issues.apache.org/jira/browse/HDFS-17289 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17337) RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync.
[ https://issues.apache.org/jira/browse/HDFS-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17337: Target Version/s: 3.5.0 (was: 3.4.0) > RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync. > --- > > Key: HDFS-17337 > URL: https://issues.apache.org/jira/browse/HDFS-17337 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Currently, FSEditLogAsync is enabled by default. > We have below codes in method Server$RpcCall#run: > > {code:java} > if (!isResponseDeferred()) { > long deltaNanos = Time.monotonicNowNanos() - startNanos; > ProcessingDetails details = getProcessingDetails(); > details.set(Timing.PROCESSING, deltaNanos, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKWAIT, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKSHARED, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKEXCLUSIVE, TimeUnit.NANOSECONDS); > details.set(Timing.LOCKFREE, deltaNanos, TimeUnit.NANOSECONDS); > startNanos = Time.monotonicNowNanos(); > setResponseFields(value, responseParams); > sendResponse(); > deltaNanos = Time.monotonicNowNanos() - startNanos; > details.set(Timing.RESPONSE, deltaNanos, TimeUnit.NANOSECONDS); > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("Deferring response for callId: " + this.callId); > } > }{code} > It computes Timing.RESPONSE of a RpcCall using *Time.monotonicNowNanos() - > startNanos;* > However, if we use async editlogging, we will not send response here but in > FSEditLogAsync.RpcEdit#logSyncNotify. > This causes the Timing.RESPONSE of a RpcCall not be exactly accurate. > {code:java} > @Override > public void logSyncNotify(RuntimeException syncEx) { > try { > if (syncEx == null) { > call.sendResponse(); > } else { > call.abortResponse(syncEx); > } > } catch (Exception e) {} // don't care if not sent. > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17337) RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync.
[ https://issues.apache.org/jira/browse/HDFS-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17337. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Target Version/s: 3.4.0 (was: 3.5.0) Resolution: Fixed > RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync. > --- > > Key: HDFS-17337 > URL: https://issues.apache.org/jira/browse/HDFS-17337 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Currently, FSEditLogAsync is enabled by default. > We have below codes in method Server$RpcCall#run: > > {code:java} > if (!isResponseDeferred()) { > long deltaNanos = Time.monotonicNowNanos() - startNanos; > ProcessingDetails details = getProcessingDetails(); > details.set(Timing.PROCESSING, deltaNanos, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKWAIT, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKSHARED, TimeUnit.NANOSECONDS); > deltaNanos -= details.get(Timing.LOCKEXCLUSIVE, TimeUnit.NANOSECONDS); > details.set(Timing.LOCKFREE, deltaNanos, TimeUnit.NANOSECONDS); > startNanos = Time.monotonicNowNanos(); > setResponseFields(value, responseParams); > sendResponse(); > deltaNanos = Time.monotonicNowNanos() - startNanos; > details.set(Timing.RESPONSE, deltaNanos, TimeUnit.NANOSECONDS); > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("Deferring response for callId: " + this.callId); > } > }{code} > It computes Timing.RESPONSE of a RpcCall using *Time.monotonicNowNanos() - > startNanos;* > However, if we use async editlogging, we will not send response here but in > FSEditLogAsync.RpcEdit#logSyncNotify. > This causes the Timing.RESPONSE of a RpcCall not be exactly accurate. > {code:java} > @Override > public void logSyncNotify(RuntimeException syncEx) { > try { > if (syncEx == null) { > call.sendResponse(); > } else { > call.abortResponse(syncEx); > } > } catch (Exception e) {} // don't care if not sent. > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17291) DataNode metric bytesWritten is not totally accurate in some situations.
[ https://issues.apache.org/jira/browse/HDFS-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17291. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Target Version/s: 3.4.0 (was: 3.5.0) Resolution: Fixed > DataNode metric bytesWritten is not totally accurate in some situations. > > > Key: HDFS-17291 > URL: https://issues.apache.org/jira/browse/HDFS-17291 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > As the title described, dataNode metric bytesWritten is not totally accurate > in some situations, such as failure recovery, re-send data. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.
[ https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17289. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Target Version/s: 3.4.0 (was: 3.5.0) Resolution: Fixed > Considering the size of non-lastBlocks equals to complete block size can > cause append failure. > -- > > Key: HDFS-17289 > URL: https://issues.apache.org/jira/browse/HDFS-17289 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17283) Change the name of variable SECOND in HdfsClientConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17283. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Target Version/s: 3.4.0 (was: 3.5.0) Resolution: Fixed > Change the name of variable SECOND in HdfsClientConfigKeys > -- > > Key: HDFS-17283 > URL: https://issues.apache.org/jira/browse/HDFS-17283 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17275) Judge whether the block has been deleted in the block report
[ https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17275. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Assignee: lei w Resolution: Fixed > Judge whether the block has been deleted in the block report > > > Key: HDFS-17275 > URL: https://issues.apache.org/jira/browse/HDFS-17275 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In > block report.,We may do some useless block related calculations when blocks > haven't been added to invalidateBlocks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17275) Judge whether the block has been deleted in the block report
[ https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17275: Summary: Judge whether the block has been deleted in the block report (was: We should determine whether the block has been deleted in the block report) > Judge whether the block has been deleted in the block report > > > Key: HDFS-17275 > URL: https://issues.apache.org/jira/browse/HDFS-17275 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Priority: Minor > Labels: pull-request-available > > Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In > block report.,We may do some useless block related calculations when blocks > haven't been added to invalidateBlocks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17152) Fix the documentation of count command in FileSystemShell.md
[ https://issues.apache.org/jira/browse/HDFS-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17152. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Fix the documentation of count command in FileSystemShell.md > > > Key: HDFS-17152 > URL: https://issues.apache.org/jira/browse/HDFS-17152 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > > count -q means show quotas and usage. > count -u means show quotas. > We should fix this minor documentation error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17243) Add the parameter storage type for getBlocks method
[ https://issues.apache.org/jira/browse/HDFS-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17243: Component/s: balancer (was: balancer & mover) > Add the parameter storage type for getBlocks method > --- > > Key: HDFS-17243 > URL: https://issues.apache.org/jira/browse/HDFS-17243 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > When Balancer is running, it is found that there are many logs, such as > {code:java} > INFO balancer.Dispatcher (Dispatcher.java:markMovedIfGoodBlock(306)) - No > striped internal block on source xxx:50010:SSD, block blk_-xxx_xxx > size=982142783. Skipping. > {code} > these logs show that Balancer cannot to balancer SSD type source, and it > causes that Balancer will frequently get blocks from NN through getBlocks RPC. > The main reason is the storage type in the current Source is SSD, but now > getBlocks obtains all list of blocks belonging to datanode, so need add the > parameter storage type for getBlocks method -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17243) Add the parameter storage type for getBlocks method
[ https://issues.apache.org/jira/browse/HDFS-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17243: Component/s: balancer & mover (was: balamcer) > Add the parameter storage type for getBlocks method > --- > > Key: HDFS-17243 > URL: https://issues.apache.org/jira/browse/HDFS-17243 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > When Balancer is running, it is found that there are many logs, such as > {code:java} > INFO balancer.Dispatcher (Dispatcher.java:markMovedIfGoodBlock(306)) - No > striped internal block on source xxx:50010:SSD, block blk_-xxx_xxx > size=982142783. Skipping. > {code} > these logs show that Balancer cannot to balancer SSD type source, and it > causes that Balancer will frequently get blocks from NN through getBlocks RPC. > The main reason is the storage type in the current Source is SSD, but now > getBlocks obtains all list of blocks belonging to datanode, so need add the > parameter storage type for getBlocks method -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17243) Add the parameter storage type for getBlocks method
[ https://issues.apache.org/jira/browse/HDFS-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17243. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Add the parameter storage type for getBlocks method > --- > > Key: HDFS-17243 > URL: https://issues.apache.org/jira/browse/HDFS-17243 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balamcer >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > When Balancer is running, it is found that there are many logs, such as > {code:java} > INFO balancer.Dispatcher (Dispatcher.java:markMovedIfGoodBlock(306)) - No > striped internal block on source xxx:50010:SSD, block blk_-xxx_xxx > size=982142783. Skipping. > {code} > these logs show that Balancer cannot to balancer SSD type source, and it > causes that Balancer will frequently get blocks from NN through getBlocks RPC. > The main reason is the storage type in the current Source is SSD, but now > getBlocks obtains all list of blocks belonging to datanode, so need add the > parameter storage type for getBlocks method -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17227) EC: Fix bug in choosing targets when racks is not enough.
[ https://issues.apache.org/jira/browse/HDFS-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17227: Description: *Bug description* If, 1. There is a striped block blockinfo1, which has an excess replica on datanodeA. 2. blockinfo1 has an internal block that needs to be reconstruction. 3. The number of racks is less than the number of internal blocks of blockinfo1. Then, NN may choose datanodeA to reconstruct the internal block, resulting in two internal blocks of blockinfo1 on datanodeA, causing confusion. *Root cause and solution* When we use `BlockPlacementPolicyRackFaultTolerant` for choosing targets and the racks is insufficient, `chooseEvenlyFromRemainingRacks` will be called. Currently, `chooseEvenlyFromRemainingRacks` calls `chooseOnce`, `chooseOnce` use `newExcludeNodes` as parameter instead of `excludedNodes`. When we choose targets for reconstructing internal blocks, 'newExcludeNodes' only includes datanodes that contain live replicas, and does not include datanodes that have excess replicas. This may result in datanodes with excess replicas are chosen. I think we do not need to use `newExcludeNodes`, just pass `excludedNodes` to `chooseOnce`. was: *Bug description* If, 1. There is a striped block blockinfo1, which has an excess replica on datanodeA. 2. blockinfo1 has an internal block that needs to be reconstruction. 3. The number of racks is less than the number of internal blocks of Blockinfo1. Then, NN may choose datanodeA to reconstruct the internal block, resulting in two internal blocks of blockinfo1 on datanodeA, causing confusion. *Root cause and solution* When we use `BlockPlacementPolicyRackFaultTolerant` for choosing targets and the racks is insufficient, `chooseEvenlyFromRemainingRacks` will be called. Currently, `chooseEvenlyFromRemainingRacks` calls `chooseOnce`, `chooseOnce` use `newExcludeNodes` as parameter instead of `excludedNodes`. When we choose targets for reconstructing internal blocks, 'newExcludeNodes' only includes those datanodes that contain live replicas, and does not include datanodes that have excess replicas. This may result in datanodes with excess replicas is chosen. I don't think we need to use 'newExcludeNodes', just pass `excludedNodes` to `chooseOnce`. > EC: Fix bug in choosing targets when racks is not enough. > - > > Key: HDFS-17227 > URL: https://issues.apache.org/jira/browse/HDFS-17227 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > *Bug description* > If, > 1. There is a striped block blockinfo1, which has an excess replica on > datanodeA. > 2. blockinfo1 has an internal block that needs to be reconstruction. > 3. The number of racks is less than the number of internal blocks of > blockinfo1. > Then, NN may choose datanodeA to reconstruct the internal block, resulting in > two internal blocks of blockinfo1 on datanodeA, causing confusion. > *Root cause and solution* > When we use `BlockPlacementPolicyRackFaultTolerant` for choosing targets and > the racks is insufficient, `chooseEvenlyFromRemainingRacks` will be called. > Currently, `chooseEvenlyFromRemainingRacks` calls `chooseOnce`, `chooseOnce` > use `newExcludeNodes` as parameter instead of `excludedNodes`. When we choose > targets for reconstructing internal blocks, 'newExcludeNodes' only includes > datanodes that contain live replicas, and does not include datanodes that > have excess replicas. This may result in datanodes with excess replicas are > chosen. > I think we do not need to use `newExcludeNodes`, just pass `excludedNodes` to > `chooseOnce`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17227) EC: Fix bug in choosing targets when racks is not enough.
Shuyan Zhang created HDFS-17227: --- Summary: EC: Fix bug in choosing targets when racks is not enough. Key: HDFS-17227 URL: https://issues.apache.org/jira/browse/HDFS-17227 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang *Bug description* If, 1. There is a striped block blockinfo1, which has an excess replica on datanodeA. 2. blockinfo1 has an internal block that needs to be reconstruction. 3. The number of racks is less than the number of internal blocks of Blockinfo1. Then, NN may choose datanodeA to reconstruct the internal block, resulting in two internal blocks of blockinfo1 on datanodeA, causing confusion. *Root cause and solution* When we use `BlockPlacementPolicyRackFaultTolerant` for choosing targets and the racks is insufficient, `chooseEvenlyFromRemainingRacks` will be called. Currently, `chooseEvenlyFromRemainingRacks` calls `chooseOnce`, `chooseOnce` use `newExcludeNodes` as parameter instead of `excludedNodes`. When we choose targets for reconstructing internal blocks, 'newExcludeNodes' only includes those datanodes that contain live replicas, and does not include datanodes that have excess replicas. This may result in datanodes with excess replicas is chosen. I don't think we need to use 'newExcludeNodes', just pass `excludedNodes` to `chooseOnce`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17227) EC: Fix bug in choosing targets when racks is not enough.
[ https://issues.apache.org/jira/browse/HDFS-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17227: --- Assignee: Shuyan Zhang > EC: Fix bug in choosing targets when racks is not enough. > - > > Key: HDFS-17227 > URL: https://issues.apache.org/jira/browse/HDFS-17227 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > *Bug description* > If, > 1. There is a striped block blockinfo1, which has an excess replica on > datanodeA. > 2. blockinfo1 has an internal block that needs to be reconstruction. > 3. The number of racks is less than the number of internal blocks of > Blockinfo1. > Then, NN may choose datanodeA to reconstruct the internal block, resulting in > two internal blocks of blockinfo1 on datanodeA, causing confusion. > *Root cause and solution* > When we use `BlockPlacementPolicyRackFaultTolerant` for choosing targets and > the racks is insufficient, `chooseEvenlyFromRemainingRacks` will be called. > Currently, `chooseEvenlyFromRemainingRacks` calls `chooseOnce`, `chooseOnce` > use `newExcludeNodes` as parameter instead of `excludedNodes`. When we choose > targets for reconstructing internal blocks, 'newExcludeNodes' only includes > those datanodes that contain live replicas, and does not include datanodes > that have excess replicas. This may result in datanodes with excess replicas > is chosen. > I don't think we need to use 'newExcludeNodes', just pass `excludedNodes` to > `chooseOnce`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-17218) NameNode should remove its excess blocks from the ExcessRedundancyMap When a DN registers
[ https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774370#comment-17774370 ] Shuyan Zhang edited comment on HDFS-17218 at 10/12/23 8:21 AM: --- Hi, [~haiyang Hu] , your report is very valuable. I think the root cause here is that NameNode has no timeout mechanism for handling excess replicas, just like PendingReconstructionMonitor in processing block reconstruction. was (Author: zhangshuyan): Hi, [~haiyang Hu] , your report is very valuable. I would like to discuss it with you. As you say, {quote}since block1 is not a new block, the processExtraRedundancy logic will not be executed. {quote} Therefore, even if we remove corresponding excess blocks from the ExcessRedundancyMap when a DN registers, it seems that we cannot avoid this problem because will still not be executed. I think the root cause here is that NameNode has no timeout mechanism for handling excess replicas, just like PendingReconstructionMonitor in processing block reconstruction. > NameNode should remove its excess blocks from the ExcessRedundancyMap When a > DN registers > - > > Key: HDFS-17218 > URL: https://issues.apache.org/jira/browse/HDFS-17218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-12-15-52-52-336.png > > > Currently found that DN will lose all pending DNA_INVALIDATE blocks if it > restarts. > *Root case* > Current DN enables asynchronously deletion, it have many pending deletion > blocks in memory. > when DN restarts, these cached blocks may be lost. it causes some blocks in > the excess map in the namenode to be leaked and this will result in many > blocks having more replicas then expected. > *solution* > Consider NameNode should remove its excess blocks from the > ExcessRedundancyMap When a DN registers, > this approach will ensure that when processing the DN's full block report, > the 'processExtraRedundancy' can be performed according to the actual of the > blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17218) NameNode should remove its excess blocks from the ExcessRedundancyMap When a DN registers
[ https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774370#comment-17774370 ] Shuyan Zhang commented on HDFS-17218: - Hi, [~haiyang Hu] , your report is very valuable. I would like to discuss it with you. As you say, {quote}since block1 is not a new block, the processExtraRedundancy logic will not be executed. {quote} Therefore, even if we remove corresponding excess blocks from the ExcessRedundancyMap when a DN registers, it seems that we cannot avoid this problem because will still not be executed. I think the root cause here is that NameNode has no timeout mechanism for handling excess replicas, just like PendingReconstructionMonitor in processing block reconstruction. > NameNode should remove its excess blocks from the ExcessRedundancyMap When a > DN registers > - > > Key: HDFS-17218 > URL: https://issues.apache.org/jira/browse/HDFS-17218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-12-15-52-52-336.png > > > Currently found that DN will lose all pending DNA_INVALIDATE blocks if it > restarts. > *Root case* > Current DN enables asynchronously deletion, it have many pending deletion > blocks in memory. > when DN restarts, these cached blocks may be lost. it causes some blocks in > the excess map in the namenode to be leaked and this will result in many > blocks having more replicas then expected. > *solution* > Consider NameNode should remove its excess blocks from the > ExcessRedundancyMap When a DN registers, > this approach will ensure that when processing the DN's full block report, > the 'processExtraRedundancy' can be performed according to the actual of the > blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17218) NameNode should remove its excess blocks from the ExcessRedundancyMap When a DN registers
[ https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17218: Attachment: image-2023-10-12-15-52-52-336.png > NameNode should remove its excess blocks from the ExcessRedundancyMap When a > DN registers > - > > Key: HDFS-17218 > URL: https://issues.apache.org/jira/browse/HDFS-17218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-12-15-52-52-336.png > > > Currently found that DN will lose all pending DNA_INVALIDATE blocks if it > restarts. > *Root case* > Current DN enables asynchronously deletion, it have many pending deletion > blocks in memory. > when DN restarts, these cached blocks may be lost. it causes some blocks in > the excess map in the namenode to be leaked and this will result in many > blocks having more replicas then expected. > *solution* > Consider NameNode should remove its excess blocks from the > ExcessRedundancyMap When a DN registers, > this approach will ensure that when processing the DN's full block report, > the 'processExtraRedundancy' can be performed according to the actual of the > blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17204) EC: Reduce unnecessary log when processing excess redundancy.
Shuyan Zhang created HDFS-17204: --- Summary: EC: Reduce unnecessary log when processing excess redundancy. Key: HDFS-17204 URL: https://issues.apache.org/jira/browse/HDFS-17204 Project: Hadoop HDFS Issue Type: Improvement Reporter: Shuyan Zhang This is a follow-up of [HDFS-16964|https://issues.apache.org/jira/browse/HDFS-16964]. We now avoid stale replicas when dealing with redundancy. This may result in redundant replicas not being in the `nonExcess` set when we enter `BlockManager#chooseExcessRedundancyStriped` (because the datanode where the redundant replicas are located has not send FBR yet, so those replicas are filtered out and not added to the `nonExcess` set). A further result is that no excess storage type is selected and the log "excess types chosen for block..." is printed. When a failover occurs, a large number of datanodes become stale, which causes NameNodes to print a large number of unnecessary logs. This issue needs to be fixed, otherwise the performance after failover will be affected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17204) EC: Reduce unnecessary log when processing excess redundancy.
[ https://issues.apache.org/jira/browse/HDFS-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17204: --- Assignee: Shuyan Zhang > EC: Reduce unnecessary log when processing excess redundancy. > - > > Key: HDFS-17204 > URL: https://issues.apache.org/jira/browse/HDFS-17204 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > This is a follow-up of > [HDFS-16964|https://issues.apache.org/jira/browse/HDFS-16964]. We now avoid > stale replicas when dealing with redundancy. This may result in redundant > replicas not being in the `nonExcess` set when we enter > `BlockManager#chooseExcessRedundancyStriped` (because the datanode where the > redundant replicas are located has not send FBR yet, so those replicas are > filtered out and not added to the `nonExcess` set). A further result is that > no excess storage type is selected and the log "excess types chosen for > block..." is printed. When a failover occurs, a large number of datanodes > become stale, which causes NameNodes to print a large number of unnecessary > logs. > This issue needs to be fixed, otherwise the performance after failover will > be affected. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17197) Show file replication when listing corrupt files.
Shuyan Zhang created HDFS-17197: --- Summary: Show file replication when listing corrupt files. Key: HDFS-17197 URL: https://issues.apache.org/jira/browse/HDFS-17197 Project: Hadoop HDFS Issue Type: Improvement Reporter: Shuyan Zhang Files with different replication have different reliability guarantees. We need to pay attention to corrupted files with a specified replication greater than or equal to 3. So, when listing corrupt files, it would be useful to display the corresponding replication of the files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17190) EC: Fix bug of OIV processing XAttr.
[ https://issues.apache.org/jira/browse/HDFS-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17190: Summary: EC: Fix bug of OIV processing XAttr. (was: EC: Fix bug for OIV processing XAttr.) > EC: Fix bug of OIV processing XAttr. > > > Key: HDFS-17190 > URL: https://issues.apache.org/jira/browse/HDFS-17190 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > > When we need to use OIV to print EC information for a directory, > `PBImageTextWriter#getErasureCodingPolicyName` will be called. Currently, > this method uses `XATTR_ERASURECODING_POLICY.contains(xattr.getName())` to > filter and obtain EC XAttr, which is very dangerous. If we have an XAttr > whose name happens to be a substring of `hdfs.erasurecoding.policy`, then > `getErasureCodingPolicyName` will return the wrong result. Our internal > production environment has customized some XAttrs, and this bug caused errors > in the parsing results of OIV when using `-ec` option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17190) EC: Fix bug for OIV processing XAttr.
[ https://issues.apache.org/jira/browse/HDFS-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17190: Description: When we need to use OIV to print EC information for a directory, `PBImageTextWriter#getErasureCodingPolicyName` will be called. Currently, this method uses `XATTR_ERASURECODING_POLICY.contains(xattr.getName())` to filter and obtain EC XAttr, which is very dangerous. If we have an XAttr whose name happens to be a substring of `hdfs.erasurecoding.policy`, then `getErasureCodingPolicyName` will return the wrong result. Our internal production environment has customized some XAttrs, and this bug caused errors in the parsing results of OIV when using `-ec` option. > EC: Fix bug for OIV processing XAttr. > - > > Key: HDFS-17190 > URL: https://issues.apache.org/jira/browse/HDFS-17190 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > > When we need to use OIV to print EC information for a directory, > `PBImageTextWriter#getErasureCodingPolicyName` will be called. Currently, > this method uses `XATTR_ERASURECODING_POLICY.contains(xattr.getName())` to > filter and obtain EC XAttr, which is very dangerous. If we have an XAttr > whose name happens to be a substring of `hdfs.erasurecoding.policy`, then > `getErasureCodingPolicyName` will return the wrong result. Our internal > production environment has customized some XAttrs, and this bug caused errors > in the parsing results of OIV when using `-ec` option. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17190) EC: Fix bug for OIV processing XAttr.
[ https://issues.apache.org/jira/browse/HDFS-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17190: Environment: (was: When we need to use OIV to print EC information for a directory, `PBImageTextWriter#getErasureCodingPolicyName` will be called. Currently, this method uses `XATTR_ERASURECODING_POLICY.contains(xattr.getName())` to filter and obtain EC XAttr, which is very dangerous. If we have an XAttr whose name happens to be a substring of `hdfs.erasurecoding.policy`, then `getErasureCodingPolicyName` will return the wrong result. Our internal production environment has customized some XAttrs, and this bug caused errors in the parsing results of OIV when using `-ec` option. ) > EC: Fix bug for OIV processing XAttr. > - > > Key: HDFS-17190 > URL: https://issues.apache.org/jira/browse/HDFS-17190 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17190) EC: Fix bug for OIV processing XAttr.
Shuyan Zhang created HDFS-17190: --- Summary: EC: Fix bug for OIV processing XAttr. Key: HDFS-17190 URL: https://issues.apache.org/jira/browse/HDFS-17190 Project: Hadoop HDFS Issue Type: Bug Environment: When we need to use OIV to print EC information for a directory, `PBImageTextWriter#getErasureCodingPolicyName` will be called. Currently, this method uses `XATTR_ERASURECODING_POLICY.contains(xattr.getName())` to filter and obtain EC XAttr, which is very dangerous. If we have an XAttr whose name happens to be a substring of `hdfs.erasurecoding.policy`, then `getErasureCodingPolicyName` will return the wrong result. Our internal production environment has customized some XAttrs, and this bug caused errors in the parsing results of OIV when using `-ec` option. Reporter: Shuyan Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17154) EC: Fix bug in updateBlockForPipeline after failover
[ https://issues.apache.org/jira/browse/HDFS-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17154: --- Assignee: Shuyan Zhang > EC: Fix bug in updateBlockForPipeline after failover > > > Key: HDFS-17154 > URL: https://issues.apache.org/jira/browse/HDFS-17154 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > In the method `updateBlockForPipeline`, NameNode uses the > `BlockUnderConstructionFeature` of a BlockInfo to generate the member > `blockIndices` of `LocatedStripedBlock`. > And then, NameNode uses `blockIndices` to generate block tokens for client. > However, if there is a failover, the location info in > BlockUnderConstructionFeature may be incomplete, which results in the absence > of the corresponding block tokens. > When the client receives these incomplete block tokens, it will throw a NPE > because `updatedBlks[i]` is null. > NameNode should just return block tokens for all indices to the client. > Client can pick whichever it likes to use. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17154) EC: Fix bug in updateBlockForPipeline after failover
Shuyan Zhang created HDFS-17154: --- Summary: EC: Fix bug in updateBlockForPipeline after failover Key: HDFS-17154 URL: https://issues.apache.org/jira/browse/HDFS-17154 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang In the method `updateBlockForPipeline`, NameNode uses the `BlockUnderConstructionFeature` of a BlockInfo to generate the member `blockIndices` of `LocatedStripedBlock`. And then, NameNode uses `blockIndices` to generate block tokens for client. However, if there is a failover, the location info in BlockUnderConstructionFeature may be incomplete, which results in the absence of the corresponding block tokens. When the client receives these incomplete block tokens, it will throw a NPE because `updatedBlks[i]` is null. NameNode should just return block tokens for all indices to the client. Client can pick whichever it likes to use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17151) EC: Fix wrong metadata in BlockInfoStriped after recovery
Shuyan Zhang created HDFS-17151: --- Summary: EC: Fix wrong metadata in BlockInfoStriped after recovery Key: HDFS-17151 URL: https://issues.apache.org/jira/browse/HDFS-17151 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang When the datanode completes a block recovery, it will call `commitBlockSynchronization` method to notify NN the new locations of the block. For a EC block group, NN determines the index of each internal block based on the position of the DatanodeID in the parameter `newtargets`. If the internal blocks written by the client don't have continuous indices, the current datanode code might cause NN to record incorrect block metadata. For simplicity, let's take RS (3,2) as an example. The timeline of the problem is as follows: 1. The client plans to write internal blocks with indices [0,1,2,3,4] to datanode [dn0, dn1, dn2, dn3, dn4] respectively. But dn1 is unable to connect, so the client only writes data to the remaining 4 datanodes; 2. Client crashes; 3. NN fails over; 4. Now the content of `uc. getExpectedStorageLocations()` completely depends on block reports, and now it is ; 5. When the lease expires hard limit, NN issues a block recovery command; 6. Datanode that receives the recovery command fills `DatanodeID [] newLocs` with [dn0, null, dn2, dn3, dn4]; 7. The serialization process filters out null values, so the parameters passed to NN become [dn0, dn2, dn3, dn4]; 8. NN mistakenly believes that dn2 stores an internal block with index 1, dn3 stores an internal block with index 2, and so on. The above timeline is just an example, and there are other situations that may result in the same error, such as an update pipeline occurs on the client side. We should fix this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17150) EC: Fix the bug of failed lease recovery.
Shuyan Zhang created HDFS-17150: --- Summary: EC: Fix the bug of failed lease recovery. Key: HDFS-17150 URL: https://issues.apache.org/jira/browse/HDFS-17150 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang If the client crashes without writing the minimum number of internal blocks required by the EC policy, the lease recovery process for the corresponding unclosed file may continue to fail. Taking RS(6,3) policy as an example, the timeline is as follows: 1. The client writes some data to only 5 datanodes; 2. Client crashes; 3. NN fails over; 4. Now the result of `uc.getNumExpectedLocations()` completely depends on block report, and there are 5 datanodes reporting internal blocks; 5. When the lease expires hard limit, NN issues a block recovery command; 6. The datanode checks the command and finds that the number of internal blocks is insufficient, resulting in an error and recovery failure; 7. The lease expires hard limit again, and NN issues a block recovery command again, but the recovery fails again.. When the number of internal blocks written by the client is less than 6, the block group is actually unrecoverable. We should equate this situation to the case where the number of replicas is 0 when processing replica files, i.e., directly remove the last block group and close the file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17134) RBF: Fix duplicate results of getListing through Router.
[ https://issues.apache.org/jira/browse/HDFS-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17134: Description: The result of `getListing` in NameNode are sorted based on `byte[]`, while the Router side is based on `String`. If there are special characters in path, the sorting result of the router will be inconsistent with the namenode. This may result in duplicate `getListing` results obtained by the client due to wrong `startAfter` parameter. For exemple, namenode returns [path1, path2, path3] for a `getListing` request, while router returns [path1, path3, path2] to client. Then client will pass `path2` as `startAfter` at the next iteration, so it will receive `path3` again. We need to fix the Router code so that the order of its result is the same as NameNode. was: The result of `getListing` in NameNode are sorted based on `byte[]`, while the Router side is based on `String`. If there are special characters in path, the sorting result of the router is inconsistent with the namenode. This may result in duplicate `getListing` results obtained by the client due to wrong `startAfter` parameter. For exemple, namenode returns [path1, path2, path3], while router returns [path1, path3, path2] to client. Then client will pass `startAfter` as `path2` at the next iteration, so it will receive `path3` again. We need to fix the Router code so that the order of its results is the same as NameNode. > RBF: Fix duplicate results of getListing through Router. > > > Key: HDFS-17134 > URL: https://issues.apache.org/jira/browse/HDFS-17134 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > The result of `getListing` in NameNode are sorted based on `byte[]`, while > the Router side is based on `String`. If there are special characters in > path, the sorting result of the router will be inconsistent with the > namenode. This may result in duplicate `getListing` results obtained by the > client due to wrong `startAfter` parameter. > For exemple, namenode returns [path1, path2, path3] for a `getListing` > request, while router returns [path1, path3, path2] to client. Then client > will pass `path2` as `startAfter` at the next iteration, so it will receive > `path3` again. > We need to fix the Router code so that the order of its result is the same as > NameNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17134) RBF: Fix duplicate results of getListing through Router.
[ https://issues.apache.org/jira/browse/HDFS-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17134: --- Assignee: Shuyan Zhang > RBF: Fix duplicate results of getListing through Router. > > > Key: HDFS-17134 > URL: https://issues.apache.org/jira/browse/HDFS-17134 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > The result of `getListing` in NameNode are sorted based on `byte[]`, while > the Router side is based on `String`. If there are special characters in > path, the sorting result of the router is inconsistent with the namenode. > This may result in duplicate `getListing` results obtained by the client due > to wrong `startAfter` parameter. > For exemple, namenode returns [path1, path2, path3], while router returns > [path1, path3, path2] to client. Then client will pass `startAfter` as > `path2` at the next iteration, so it will receive `path3` again. > We need to fix the Router code so that the order of its results is the same > as NameNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17134) RBF: Fix duplicate results of getListing through Router.
Shuyan Zhang created HDFS-17134: --- Summary: RBF: Fix duplicate results of getListing through Router. Key: HDFS-17134 URL: https://issues.apache.org/jira/browse/HDFS-17134 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang The result of `getListing` in NameNode are sorted based on `byte[]`, while the Router side is based on `String`. If there are special characters in path, the sorting result of the router is inconsistent with the namenode. This may result in duplicate `getListing` results obtained by the client due to wrong `startAfter` parameter. For exemple, namenode returns [path1, path2, path3], while router returns [path1, path3, path2] to client. Then client will pass `startAfter` as `path2` at the next iteration, so it will receive `path3` again. We need to fix the Router code so that the order of its results is the same as NameNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17112) Show decommission duration in JMX and HTML
[ https://issues.apache.org/jira/browse/HDFS-17112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17112: --- Assignee: Shuyan Zhang > Show decommission duration in JMX and HTML > -- > > Key: HDFS-17112 > URL: https://issues.apache.org/jira/browse/HDFS-17112 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > Expose decommission duration time in JMX page. It's a very useful info when > decommissioning a batch of datanodes in a cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17112) Show decommission duration in JMX and HTML
Shuyan Zhang created HDFS-17112: --- Summary: Show decommission duration in JMX and HTML Key: HDFS-17112 URL: https://issues.apache.org/jira/browse/HDFS-17112 Project: Hadoop HDFS Issue Type: Improvement Reporter: Shuyan Zhang Expose decommission duration time in JMX page. It's a very useful info when decommissioning a batch of datanodes in a cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes
[ https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17094: Description: When a block recovery occurs, `RecoveryTaskStriped` in datanode expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one correspondence. However, if there are locations in stale state when NameNode handles heartbeat, this correspondence will be disrupted. In detail, there is no stale location in `recoveryLocations`, but the block indices array is still complete (i.e. contains the indices of all the locations). This will cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong internal block ID, and the corresponding datanode cannot find the replica, thus making the recovery process fail. This bug needs to be fixed. (was: When a block recovery occurs, `RecoveryTaskStriped` in datanode expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one correspondence. However, if there are locations in stale state when NameNode handles heartbeat, this correspondence will be disrupted. In detail, there is no stale location in `recoveryLocations`, but the block indices array is still complete (i.e. contains the indices of all the locations). This will cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong internal block ID, and the corresponding datanode cannot find the relica, thus making the recovery process fail. This bug needs to be fixed.) > EC: Fix bug in block recovery when there are stale datanodes > > > Key: HDFS-17094 > URL: https://issues.apache.org/jira/browse/HDFS-17094 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > When a block recovery occurs, `RecoveryTaskStriped` in datanode expects > `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one > correspondence. However, if there are locations in stale state when NameNode > handles heartbeat, this correspondence will be disrupted. In detail, there is > no stale location in `recoveryLocations`, but the block indices array is > still complete (i.e. contains the indices of all the locations). This will > cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong > internal block ID, and the corresponding datanode cannot find the replica, > thus making the recovery process fail. This bug needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes
[ https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17094: --- Assignee: Shuyan Zhang > EC: Fix bug in block recovery when there are stale datanodes > > > Key: HDFS-17094 > URL: https://issues.apache.org/jira/browse/HDFS-17094 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > When a block recovery occurs, `RecoveryTaskStriped` in datanode expects > `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one > correspondence. However, if there are locations in stale state when NameNode > handles heartbeat, this correspondence will be disrupted. In detail, there is > no stale location in `recoveryLocations`, but the block indices array is > still complete (i.e. contains the indices of all the locations). This will > cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong > internal block ID, and the corresponding datanode cannot find the replica, > thus making the recovery process fail. This bug needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes
Shuyan Zhang created HDFS-17094: --- Summary: EC: Fix bug in block recovery when there are stale datanodes Key: HDFS-17094 URL: https://issues.apache.org/jira/browse/HDFS-17094 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang When a block recovery occurs, `RecoveryTaskStriped` in datanode expects `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one correspondence. However, if there are locations in stale state when NameNode handles heartbeat, this correspondence will be disrupted. In detail, there is no stale location in `recoveryLocations`, but the block indices array is still complete (i.e. contains the indices of all the locations). This will cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong internal block ID, and the corresponding datanode cannot find the relica, thus making the recovery process fail. This bug needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17089) Close child file systems in ViewFileSystem when cache is disabled.
[ https://issues.apache.org/jira/browse/HDFS-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17089: Summary: Close child file systems in ViewFileSystem when cache is disabled. (was: Close child files systems in ViewFileSystem when cache is disabled.) > Close child file systems in ViewFileSystem when cache is disabled. > -- > > Key: HDFS-17089 > URL: https://issues.apache.org/jira/browse/HDFS-17089 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > When the cache is configured to disabled (namely, > `fs.viewfs.enable.inner.cache=false` and `fs.*.impl.disable.cache=true`), > even if `FileSystem.close()` is called, the client cannot truly close the > child file systems in a ViewFileSystem. This caused our long-running clients > to constantly produce resource leaks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17089) Close child files systems in ViewFileSystem when cache is disabled.
Shuyan Zhang created HDFS-17089: --- Summary: Close child files systems in ViewFileSystem when cache is disabled. Key: HDFS-17089 URL: https://issues.apache.org/jira/browse/HDFS-17089 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang When the cache is configured to disabled (namely, `fs.viewfs.enable.inner.cache=false` and `fs.*.impl.disable.cache=true`), even if `FileSystem.close()` is called, the client cannot truly close the child file systems in a ViewFileSystem. This caused our long-running clients to constantly produce resource leaks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17049) EC: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator
[ https://issues.apache.org/jira/browse/HDFS-17049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17049: Description: When I used multiple clients to write EC files concurrently, I found that NameNode generated the same block group ID for different files: ``` 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_-9223372036854697568_14389 for /ec-test/10/4068034329705654124 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_-9223372036854697568_14390 for /ec-test/19/7042966144171770731 ``` After diving into `SequentialBlockGroupIdGenerator`, I found that the current implementation of `nextValue` is not thread-safe. This problem must be fixed. was: When I used multiple clients to write EC files concurrently, I found that NameNode generated the same block group ID for different files: ``` 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_-9223372036854697568_14389 for /ec-test/10/4068034329705654124 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_-9223372036854697568_14390 for /ec-test/19/7042966144171770731 ``` After diving into `SequentialBlockGroupIdGenerator`, I found that the current implementation of `nextValue` is not thread safety. This problem must be fixed. > EC: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator > -- > > Key: HDFS-17049 > URL: https://issues.apache.org/jira/browse/HDFS-17049 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > When I used multiple clients to write EC files concurrently, I found that > NameNode generated the same block group ID for different files: > ``` > 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_-9223372036854697568_14389 for /ec-test/10/4068034329705654124 > 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_-9223372036854697568_14390 for /ec-test/19/7042966144171770731 > ``` > After diving into `SequentialBlockGroupIdGenerator`, I found that the current > implementation of `nextValue` is not thread-safe. > This problem must be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17049) EC: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator
[ https://issues.apache.org/jira/browse/HDFS-17049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17049: Summary: EC: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator (was: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator) > EC: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator > -- > > Key: HDFS-17049 > URL: https://issues.apache.org/jira/browse/HDFS-17049 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > When I used multiple clients to write EC files concurrently, I found that > NameNode generated the same block group ID for different files: > ``` > 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_-9223372036854697568_14389 for /ec-test/10/4068034329705654124 > 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_-9223372036854697568_14390 for /ec-test/19/7042966144171770731 > ``` > After diving into `SequentialBlockGroupIdGenerator`, I found that the current > implementation of `nextValue` is not thread safety. > This problem must be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17049) Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator
[ https://issues.apache.org/jira/browse/HDFS-17049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17049: Summary: Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator (was: Fix duplicate block group ids generated by SequentialBlockGroupIdGenerator) > Fix duplicate block group IDs generated by SequentialBlockGroupIdGenerator > -- > > Key: HDFS-17049 > URL: https://issues.apache.org/jira/browse/HDFS-17049 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > When I used multiple clients to write EC files concurrently, I found that > NameNode generated the same block group ID for different files: > ``` > 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_-9223372036854697568_14389 for /ec-test/10/4068034329705654124 > 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_-9223372036854697568_14390 for /ec-test/19/7042966144171770731 > ``` > After diving into `SequentialBlockGroupIdGenerator`, I found that the current > implementation of `nextValue` is not thread safety. > This problem must be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17049) Fix duplicate block group ids generated by SequentialBlockGroupIdGenerator
[ https://issues.apache.org/jira/browse/HDFS-17049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-17049: --- Assignee: Shuyan Zhang > Fix duplicate block group ids generated by SequentialBlockGroupIdGenerator > -- > > Key: HDFS-17049 > URL: https://issues.apache.org/jira/browse/HDFS-17049 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > When I used multiple clients to write EC files concurrently, I found that > NameNode generated the same block group ID for different files: > ``` > 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_-9223372036854697568_14389 for /ec-test/10/4068034329705654124 > 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_-9223372036854697568_14390 for /ec-test/19/7042966144171770731 > ``` > After diving into `SequentialBlockGroupIdGenerator`, I found that the current > implementation of `nextValue` is not thread safety. > This problem must be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17049) Fix duplicate block group ids generated by SequentialBlockGroupIdGenerator
Shuyan Zhang created HDFS-17049: --- Summary: Fix duplicate block group ids generated by SequentialBlockGroupIdGenerator Key: HDFS-17049 URL: https://issues.apache.org/jira/browse/HDFS-17049 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang When I used multiple clients to write EC files concurrently, I found that NameNode generated the same block group ID for different files: ``` 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_-9223372036854697568_14389 for /ec-test/10/4068034329705654124 2023-06-13 20:09:59,514 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocate blk_-9223372036854697568_14390 for /ec-test/19/7042966144171770731 ``` After diving into `SequentialBlockGroupIdGenerator`, I found that the current implementation of `nextValue` is not thread safety. This problem must be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17037) Consider nonDfsUsed when running balancer
Shuyan Zhang created HDFS-17037: --- Summary: Consider nonDfsUsed when running balancer Key: HDFS-17037 URL: https://issues.apache.org/jira/browse/HDFS-17037 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang Assignee: Shuyan Zhang When we run balancer with `BalancingPolicy.Node` policy, our goal is to make each datanode storage balanced. But in the current implementation, the balancer doesn't account for storage used by non-dfs on the datanodes, which can make the situation worse for datanodes that are already strained on storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17021) GENSTAMP_MISMATCH replica can not be removed by invalidateCorruptReplicas()
[ https://issues.apache.org/jira/browse/HDFS-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang resolved HDFS-17021. - Resolution: Not A Bug > GENSTAMP_MISMATCH replica can not be removed by invalidateCorruptReplicas() > --- > > Key: HDFS-17021 > URL: https://issues.apache.org/jira/browse/HDFS-17021 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > If a replica is corrupted due to generation stamp mismatch, the corresponding > datanode stores a wrong generation stamp while `invalidateCorruptReplicas()` > will send right generation stamp to the datanode. Therefore, the check on > datanode can not pass successfully as discussion in > [https://github.com/apache/hadoop/pull/5643.] resulting in the corrupted > replica unable to be successfully deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17021) GENSTAMP_MISMATCH replica can not be removed by invalidateCorruptReplicas()
[ https://issues.apache.org/jira/browse/HDFS-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17021: Environment: (was: If a replica is corrupted due to generation stamp mismatch, the corresponding datanode stores a wrong generation stamp while `invalidateCorruptReplicas()` will send right generation stamp to the datanode. Therefore, the check on datanode can not pass successfully as discussion in [https://github.com/apache/hadoop/pull/5643.] resulting in the corrupted replica unable to be successfully deleted. ) > GENSTAMP_MISMATCH replica can not be removed by invalidateCorruptReplicas() > --- > > Key: HDFS-17021 > URL: https://issues.apache.org/jira/browse/HDFS-17021 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17021) GENSTAMP_MISMATCH replica can not be removed by invalidateCorruptReplicas()
[ https://issues.apache.org/jira/browse/HDFS-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-17021: Description: If a replica is corrupted due to generation stamp mismatch, the corresponding datanode stores a wrong generation stamp while `invalidateCorruptReplicas()` will send right generation stamp to the datanode. Therefore, the check on datanode can not pass successfully as discussion in [https://github.com/apache/hadoop/pull/5643.] resulting in the corrupted replica unable to be successfully deleted. > GENSTAMP_MISMATCH replica can not be removed by invalidateCorruptReplicas() > --- > > Key: HDFS-17021 > URL: https://issues.apache.org/jira/browse/HDFS-17021 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > If a replica is corrupted due to generation stamp mismatch, the corresponding > datanode stores a wrong generation stamp while `invalidateCorruptReplicas()` > will send right generation stamp to the datanode. Therefore, the check on > datanode can not pass successfully as discussion in > [https://github.com/apache/hadoop/pull/5643.] resulting in the corrupted > replica unable to be successfully deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17021) GENSTAMP_MISMATCH replica can not be removed by invalidateCorruptReplicas()
Shuyan Zhang created HDFS-17021: --- Summary: GENSTAMP_MISMATCH replica can not be removed by invalidateCorruptReplicas() Key: HDFS-17021 URL: https://issues.apache.org/jira/browse/HDFS-17021 Project: Hadoop HDFS Issue Type: Bug Environment: If a replica is corrupted due to generation stamp mismatch, the corresponding datanode stores a wrong generation stamp while `invalidateCorruptReplicas()` will send right generation stamp to the datanode. Therefore, the check on datanode can not pass successfully as discussion in [https://github.com/apache/hadoop/pull/5643.] resulting in the corrupted replica unable to be successfully deleted. Reporter: Shuyan Zhang Assignee: Shuyan Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16999) Fix wrong use of processFirstBlockReport()
[ https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-16999: --- Assignee: Shuyan Zhang > Fix wrong use of processFirstBlockReport() > -- > > Key: HDFS-16999 > URL: https://issues.apache.org/jira/browse/HDFS-16999 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > `processFirstBlockReport()` is used to process first block report from > datanode. It does not calculating `toRemove` list because it believes that > there is no metadata about the datanode in the namenode. However, If a > datanode is re registered after restarting, its `blockReportCount` will be > updated to 0. That is to say, the first block report after a datanode > restarts will be processed by `processFirstBlockReport()`. This is > unreasonable because the metadata of the datanode already exists in namenode > at this time, and if redundant replica metadata is not removed in time, the > blocks with insufficient replicas cannot be reconstruct in time, which > increases the risk of missing block. In summary, `processFirstBlockReport()` > should only be used when the namenode restarts, not when the datanode > restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16999) Fix wrong use of processFirstBlockReport()
Shuyan Zhang created HDFS-16999: --- Summary: Fix wrong use of processFirstBlockReport() Key: HDFS-16999 URL: https://issues.apache.org/jira/browse/HDFS-16999 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang `processFirstBlockReport()` is used to process first block report from datanode. It does not calculating `toRemove` list because it believes that there is no metadata about the datanode in the namenode. However, If a datanode is re registered after restarting, its `blockReportCount` will be updated to 0. That is to say, the first block report after a datanode restarts will be processed by `processFirstBlockReport()`. This is unreasonable because the metadata of the datanode already exists in namenode at this time, and if redundant replica metadata is not removed in time, the blocks with insufficient replicas cannot be reconstruct in time, which increases the risk of missing block. In summary, `processFirstBlockReport()` should only be used when the namenode restarts, not when the datanode restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16986) EC: Fix locationBudget in getListing()
Shuyan Zhang created HDFS-16986: --- Summary: EC: Fix locationBudget in getListing() Key: HDFS-16986 URL: https://issues.apache.org/jira/browse/HDFS-16986 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang The current `locationBudget` is estimated using the `block_replication` in `FileStatus`, which is unreasonable on EC files, because it will count the number of locations of a EC block as 1. We should consider ErasureCodingPolicy of the files to keep the meaning of `locationBudget` consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16974) Consider load of every volume when choosing target
[ https://issues.apache.org/jira/browse/HDFS-16974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-16974: --- Assignee: Shuyan Zhang > Consider load of every volume when choosing target > -- > > Key: HDFS-16974 > URL: https://issues.apache.org/jira/browse/HDFS-16974 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > The current target choosing policy only considers the load of the entire > datanode. If both DN1 and DN2 have an `xceiverCount` of 100, but DN1 has 10 > volumes to write to and DN2 only has 1, then the pressure on DN2 is actually > much greater than that on DN1. This patch has added a configuration that > allows us to avoid nodes with too much pressure on a single volume when > choosing targets, so as to avoid overloading datanodes with few volumes or > slowing down writes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16974) Consider load of every volume when choosing target
Shuyan Zhang created HDFS-16974: --- Summary: Consider load of every volume when choosing target Key: HDFS-16974 URL: https://issues.apache.org/jira/browse/HDFS-16974 Project: Hadoop HDFS Issue Type: Improvement Reporter: Shuyan Zhang The current target choosing policy only considers the load of the entire datanode. If both DN1 and DN2 have an `xceiverCount` of 100, but DN1 has 10 volumes to write to and DN2 only has 1, then the pressure on DN2 is actually much greater than that on DN1. This patch has added a configuration that allows us to avoid nodes with too much pressure on a single volume when choosing targets, so as to avoid overloading datanodes with few volumes or slowing down writes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16964) Improve processing of excess redundancy after failover
[ https://issues.apache.org/jira/browse/HDFS-16964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-16964: --- Assignee: Shuyan Zhang > Improve processing of excess redundancy after failover > -- > > Key: HDFS-16964 > URL: https://issues.apache.org/jira/browse/HDFS-16964 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > After failover, the block with excess redundancy cannot be processed until > all replicas are not stale, because the stale ones may have been deleted. > That is to say, we need to wait for the FBRs of all datanodes on which the > block resides before deleting the redundant replicas. This is unnecessary, we > can bypass stale replicas when dealing with excess replicas, and delete > non-stale excess replicas in a more timely manner. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16964) Improve processing of excess redundancy after failover
Shuyan Zhang created HDFS-16964: --- Summary: Improve processing of excess redundancy after failover Key: HDFS-16964 URL: https://issues.apache.org/jira/browse/HDFS-16964 Project: Hadoop HDFS Issue Type: Improvement Reporter: Shuyan Zhang After failover, the block with excess redundancy cannot be processed until all replicas are not stale, because the stale ones may have been deleted. That is to say, we need to wait for the FBRs of all datanodes on which the block resides before deleting the redundant replicas. This is unnecessary, we can bypass stale replicas when dealing with excess replicas, and delete non-stale excess replicas in a more timely manner. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16958) Fix bug in processing EC excess redundancy
Shuyan Zhang created HDFS-16958: --- Summary: Fix bug in processing EC excess redundancy Key: HDFS-16958 URL: https://issues.apache.org/jira/browse/HDFS-16958 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang Assignee: Shuyan Zhang When processing excess redundancy, the number of internal blocks is computed by traversing `nonExcess`. This way is not accurate, because `nonExcess` excludes replicas in abnormal states, such as corrupt ones, or maintenance ones. `numOfTarget` may be smaller than the actual value, which will result in inaccurate generated `excessTypes`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16939) Fix the thread safety bug in LowRedundancyBlocks
Shuyan Zhang created HDFS-16939: --- Summary: Fix the thread safety bug in LowRedundancyBlocks Key: HDFS-16939 URL: https://issues.apache.org/jira/browse/HDFS-16939 Project: Hadoop HDFS Issue Type: Bug Components: namanode Reporter: Shuyan Zhang Assignee: Shuyan Zhang The remove method in LowRedundancyBlocks is not protected by synchronized. This method is private and is called by BlockManager. As a result, priorityQueues has the risk of being accessed concurrently by multiple threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16735) Reduce the number of HeartbeatManager loops
[ https://issues.apache.org/jira/browse/HDFS-16735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-16735: Description: HeartbeatManager only processes one dead datanode (and failed storage) per round in heartbeatCheck(), that is to say, if there are ten failed storages, all datanode states need to be scanned 10 times, which is unnecessary and a waste of resources. This patch makes the number of bad storages processed per scan configurable. (was: HeartbeatManager only processes one dead datanode (and failed storage) per round in heartbeatCheck(), that is to say, if there are ten failed storages, all datanode states need to be scanned 10 times, which is unnecessary. This patch makes the number of bad storages processed per scan configurable.) > Reduce the number of HeartbeatManager loops > --- > > Key: HDFS-16735 > URL: https://issues.apache.org/jira/browse/HDFS-16735 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > HeartbeatManager only processes one dead datanode (and failed storage) per > round in heartbeatCheck(), that is to say, if there are ten failed storages, > all datanode states need to be scanned 10 times, which is unnecessary and a > waste of resources. This patch makes the number of bad storages processed per > scan configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16735) Reduce the number of HeartbeatManager loops
[ https://issues.apache.org/jira/browse/HDFS-16735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-16735: --- Assignee: Shuyan Zhang > Reduce the number of HeartbeatManager loops > --- > > Key: HDFS-16735 > URL: https://issues.apache.org/jira/browse/HDFS-16735 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > HeartbeatManager only processes one dead datanode (and failed storage) per > round in heartbeatCheck(), that is to say, if there are ten failed storages, > all datanode states need to be scanned 10 times, which is unnecessary. This > patch makes the number of bad storages processed per scan configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16735) Reduce the number of HeartbeatManager loops
Shuyan Zhang created HDFS-16735: --- Summary: Reduce the number of HeartbeatManager loops Key: HDFS-16735 URL: https://issues.apache.org/jira/browse/HDFS-16735 Project: Hadoop HDFS Issue Type: Improvement Reporter: Shuyan Zhang HeartbeatManager only processes one dead datanode (and failed storage) per round in heartbeatCheck(), that is to say, if there are ten failed storages, all datanode states need to be scanned 10 times, which is unnecessary. This patch makes the number of bad storages processed per scan configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16683) All method metrics related to the rpc protocol should be initialized
[ https://issues.apache.org/jira/browse/HDFS-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-16683: --- Assignee: Shuyan Zhang > All method metrics related to the rpc protocol should be initialized > > > Key: HDFS-16683 > URL: https://issues.apache.org/jira/browse/HDFS-16683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When an RPC protocol is used, the metric of protocol-related methods should > be initialized; otherwise, metric information will be incomplete. For > example, when we call HAServiceProtocol#monitorHealth(), only the metric of > monitorHealth() are initialized, and the metric of transitionToStandby() are > still not reported. This incompleteness caused a little trouble for our > monitoring system. > The root cause is that the parameter passed by RpcEngine to > MutableRatesWithAggregation#init(java.lang.Class) is always > XXXProtocolPB, which is inherited from BlockingInterface and does not > implement any methods. We should fix this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16683) All method metrics related to the rpc protocol should be initialized
[ https://issues.apache.org/jira/browse/HDFS-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-16683: Environment: (was: When an RPC protocol is used, the metric of protocol-related methods should be initialized; otherwise, metric information will be incomplete. For example, when we call HAServiceProtocol#monitorHealth(), only the metric of monitorHealth() are initialized, and the metric of transitionToStandby() are still not reported. This incompleteness caused a little trouble for our monitoring system. The root cause is that the parameter passed by RpcEngine to MutableRatesWithAggregation#init(java.lang.Class) is always XXXProtocolPB, which is inherited from BlockingInterface and does not implement any methods. We should fix this bug.) > All method metrics related to the rpc protocol should be initialized > > > Key: HDFS-16683 > URL: https://issues.apache.org/jira/browse/HDFS-16683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16683) All method metrics related to the rpc protocol should be initialized
[ https://issues.apache.org/jira/browse/HDFS-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-16683: Description: When an RPC protocol is used, the metric of protocol-related methods should be initialized; otherwise, metric information will be incomplete. For example, when we call HAServiceProtocol#monitorHealth(), only the metric of monitorHealth() are initialized, and the metric of transitionToStandby() are still not reported. This incompleteness caused a little trouble for our monitoring system. The root cause is that the parameter passed by RpcEngine to MutableRatesWithAggregation#init(java.lang.Class) is always XXXProtocolPB, which is inherited from BlockingInterface and does not implement any methods. We should fix this bug. > All method metrics related to the rpc protocol should be initialized > > > Key: HDFS-16683 > URL: https://issues.apache.org/jira/browse/HDFS-16683 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When an RPC protocol is used, the metric of protocol-related methods should > be initialized; otherwise, metric information will be incomplete. For > example, when we call HAServiceProtocol#monitorHealth(), only the metric of > monitorHealth() are initialized, and the metric of transitionToStandby() are > still not reported. This incompleteness caused a little trouble for our > monitoring system. > The root cause is that the parameter passed by RpcEngine to > MutableRatesWithAggregation#init(java.lang.Class) is always > XXXProtocolPB, which is inherited from BlockingInterface and does not > implement any methods. We should fix this bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16683) All method metrics related to the rpc protocol should be initialized
Shuyan Zhang created HDFS-16683: --- Summary: All method metrics related to the rpc protocol should be initialized Key: HDFS-16683 URL: https://issues.apache.org/jira/browse/HDFS-16683 Project: Hadoop HDFS Issue Type: Bug Environment: When an RPC protocol is used, the metric of protocol-related methods should be initialized; otherwise, metric information will be incomplete. For example, when we call HAServiceProtocol#monitorHealth(), only the metric of monitorHealth() are initialized, and the metric of transitionToStandby() are still not reported. This incompleteness caused a little trouble for our monitoring system. The root cause is that the parameter passed by RpcEngine to MutableRatesWithAggregation#init(java.lang.Class) is always XXXProtocolPB, which is inherited from BlockingInterface and does not implement any methods. We should fix this bug. Reporter: Shuyan Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16146) All three replicas are lost due to not adding a new DataNode in time
Shuyan Zhang created HDFS-16146: --- Summary: All three replicas are lost due to not adding a new DataNode in time Key: HDFS-16146 URL: https://issues.apache.org/jira/browse/HDFS-16146 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs Reporter: Shuyan Zhang Assignee: Shuyan Zhang We have a three-replica file, and all replicas of a block are lost when the default datanode replacement strategy is used. It happened like this: 1. addBlock() applies for a new block and successfully connects three datanodes (dn1, dn2 and dn3) to build a pipeline; 2. Write data; 3. dn1 has an error and was kicked out. At this time, the remaining datanodes in the pipeline > 1, according to the replacement strategy, there is no need to add a new datanode; 4. After writing is completed, enter PIPELINE_CLOSE; 5. dn2 has an error and was kicked out. But because it is already in the close phase, addDatanode2ExistingPipeline() decides to hand over the task of transfering the replica to the NameNode. At this time, there is only one datanode left in the pipeline; 6. dn3 error, all replicas are lost. If we add a new datanode in step 5, we can avoid losing all replicas in this case. I think error in PIPELINE_CLOSE and error in DATA_STREAMING have the same risk of losing replicas, we should not skip adding a new datanode during PIPELINE_CLOSE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16099) Make bpServiceToActive to be volatile
[ https://issues.apache.org/jira/browse/HDFS-16099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-16099: Description: BPOfferService#bpServiceToActive is not volatile, which may cause _CommandProcessingThread_ to get the out-of-date active namenode. When a failover occurs, the old ANN's _CommandProcessingThread_ may read the outdated BPOfferService#bpServiceToActive and execute the NN's command. At this time, if the new ANN's _CommandProcessingThread_ reads the new value of bpServiceToActive, split brain will occur; otherwise, the new ANN's commands cannot be executed normally, which is also unacceptable. was:BPOfferService#bpServiceToActive is not volatile, which may cause _commandProcessingThread_ to get an out-of-date active namenode. > Make bpServiceToActive to be volatile > - > > Key: HDFS-16099 > URL: https://issues.apache.org/jira/browse/HDFS-16099 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > BPOfferService#bpServiceToActive is not volatile, which may cause > _CommandProcessingThread_ to get the out-of-date active namenode. > When a failover occurs, the old ANN's _CommandProcessingThread_ may read the > outdated BPOfferService#bpServiceToActive and execute the NN's command. At > this time, if the new ANN's _CommandProcessingThread_ reads the new value of > bpServiceToActive, split brain will occur; otherwise, the new ANN's commands > cannot be executed normally, which is also unacceptable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16099) Make bpServiceToActive to be volatile
[ https://issues.apache.org/jira/browse/HDFS-16099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16099 started by Shuyan Zhang. --- > Make bpServiceToActive to be volatile > - > > Key: HDFS-16099 > URL: https://issues.apache.org/jira/browse/HDFS-16099 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > BPOfferService#bpServiceToActive is not volatile, which may cause > _commandProcessingThread_ to get an out-of-date active namenode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16099) Make bpServiceToActive to be volatile
Shuyan Zhang created HDFS-16099: --- Summary: Make bpServiceToActive to be volatile Key: HDFS-16099 URL: https://issues.apache.org/jira/browse/HDFS-16099 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Shuyan Zhang Assignee: Shuyan Zhang BPOfferService#bpServiceToActive is not volatile, which may cause _commandProcessingThread_ to get an out-of-date active namenode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14263) Remove unnecessary block file exists check from FsDatasetImpl#getBlockInputStream()
[ https://issues.apache.org/jira/browse/HDFS-14263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-14263: Description: As discussed in HDFS-10636, {{FsDatasetImpl#getBlockInputStream()}} doing unnecessary block replica exist check. (was: * As discussed in HDFS-10636, {{FsDatasetImpl#getBlockInputStream()}} doing unnecessary block replica exist check.) > Remove unnecessary block file exists check from > FsDatasetImpl#getBlockInputStream() > --- > > Key: HDFS-14263 > URL: https://issues.apache.org/jira/browse/HDFS-14263 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14263.001.patch, HDFS-14263.002.patch > > > As discussed in HDFS-10636, {{FsDatasetImpl#getBlockInputStream()}} doing > unnecessary block replica exist check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14263) Remove unnecessary block file exists check from FsDatasetImpl#getBlockInputStream()
[ https://issues.apache.org/jira/browse/HDFS-14263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-14263: Description: * As discussed in HDFS-10636, {{FsDatasetImpl#getBlockInputStream()}} doing unnecessary block replica exist check. (was: As discussed in HDFS-10636, {{FsDatasetImpl#getBlockInputStream()}} doing unnecessary block replica exist check.) > Remove unnecessary block file exists check from > FsDatasetImpl#getBlockInputStream() > --- > > Key: HDFS-14263 > URL: https://issues.apache.org/jira/browse/HDFS-14263 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14263.001.patch, HDFS-14263.002.patch > > > * As discussed in HDFS-10636, {{FsDatasetImpl#getBlockInputStream()}} doing > unnecessary block replica exist check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-15963: Attachment: HDFS-15963.003.patch > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-15963: Attachment: HDFS-15963.002.patch > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-15963: Attachment: HDFS-15963.001.patch > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Priority: Major > Attachments: HDFS-15963.001.patch > > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15963) Unreleased volume references cause an infinite loop
Shuyan Zhang created HDFS-15963: --- Summary: Unreleased volume references cause an infinite loop Key: HDFS-15963 URL: https://issues.apache.org/jira/browse/HDFS-15963 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Shuyan Zhang When BlockSender throws an exception because the meta-data cannot be found, the volume reference obtained by the thread is not released, which causes the thread trying to remove the volume to wait and fall into an infinite loop. {code:java} boolean checkVolumesRemoved() { Iterator it = volumesBeingRemoved.iterator(); while (it.hasNext()) { FsVolumeImpl volume = it.next(); if (!volume.checkClosed()) { return false; } it.remove(); } return true; } boolean checkClosed() { // always be true. if (this.reference.getReferenceCount() > 0) { FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", this, reference.getReferenceCount()); return false; } return true; } {code} At the same time, because the thread has been holding checkDirsLock when removing the volume, other threads trying to acquire the same lock will be permanently blocked. Similar problems also occur in RamDiskAsyncLazyPersistService and FsDatasetAsyncDiskService. This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org