[jira] [Commented] (HDFS-16100) HA: Improve performance of Standby node transition to Active
[ https://issues.apache.org/jira/browse/HDFS-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376251#comment-17376251 ] Xiaoqiao He commented on HDFS-16100: [^HDFS-16100.001.patch] it is safe to checkin and improve it for me. Kindly ping [~weichiu], [~inigoiri] do you mind to give another reviews? CI seems work not well and trigger it manually, refer: https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/671/ > HA: Improve performance of Standby node transition to Active > - > > Key: HDFS-16100 > URL: https://issues.apache.org/jira/browse/HDFS-16100 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1 >Reporter: wudeyu >Assignee: wudeyu >Priority: Major > Attachments: HDFS-16100.001.patch, HDFS-16100.patch > > > pendingDNMessages in Standby is used to support process postponed block > reports. Block reports in pendingDNMessages would be processed: > # If GS of replica is in the future, Standby Node will process it when > corresponding edit log(e.g add_block) is loaded. > # If replica is corrupted, Standby Node will process it while it transfer to > Active. > # If DataNode is removed, corresponding of block reports will be removed in > pendingDNMessages. > Obviously, if num of corrupted replica grows, more time cost during > transferring. In out situation, there're 60 millions block reports in > pendingDNMessages before transfer. Processing block reports cost almost 7mins > and it's killed by zkfc. The replica state of the most block reports is RBW > with wrong GS(less than storedblock in Standby Node). > In my opinion, Standby Node could ignore the block reports that replica state > is RBW with wrong GS. Because Active node/DataNode will remove it later. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.
[ https://issues.apache.org/jira/browse/HDFS-16116?focusedWorklogId=619748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619748 ] ASF GitHub Bot logged work on HDFS-16116: - Author: ASF GitHub Bot Created on: 07/Jul/21 04:28 Start Date: 07/Jul/21 04:28 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3181: URL: https://github.com/apache/hadoop/pull/3181#issuecomment-875267101 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 17m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | shelldocs | 0m 0s | | Shelldocs was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 32s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 1s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 49s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 33s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 21s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 1m 35s | | the patch passed | | +1 :green_heart: | shellcheck | 0m 8s | | No new issues. | | +1 :green_heart: | shadedclient | 15m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 31s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 0m 20s | | hadoop-federation-balance in the patch passed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 93m 34s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3181 | | Optional Tests | dupname asflicense mvnsite unit codespell shellcheck shelldocs markdownlint | | uname | Linux 3dac8065161a 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 981002ac1ee636b739e797256ec1762ee6ebd91c | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/3/testReport/ | | Max. process+thread count | 572 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-federation-balance U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/3/console | | versions | git=2.25.1 maven=3.6.3 shellcheck=0.7.0 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619748) Time Spent: 40m (was: 0.5h) > Fix Hadoop FedBalance shell and federationBanance markdown bug. > --- > > Key: HDFS-16116 > URL: https://issues.apache.org/jira/browse/HDFS-16116 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: panlijie >Assignee: panlijie >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Fix Hadoop FedBalance shell and federationBanance
[jira] [Work logged] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.
[ https://issues.apache.org/jira/browse/HDFS-16116?focusedWorklogId=619742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619742 ] ASF GitHub Bot logged work on HDFS-16116: - Author: ASF GitHub Bot Created on: 07/Jul/21 04:13 Start Date: 07/Jul/21 04:13 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3181: URL: https://github.com/apache/hadoop/pull/3181#issuecomment-875261420 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 12m 7s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | shelldocs | 0m 0s | | Shelldocs was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 1s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 53s | | trunk passed | | +1 :green_heart: | shadedclient | 13m 13s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | shellcheck | 0m 7s | | No new issues. | | +1 :green_heart: | shadedclient | 13m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 39s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 0m 24s | | hadoop-federation-balance in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 81m 30s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3181 | | Optional Tests | dupname asflicense mvnsite unit codespell shellcheck shelldocs markdownlint | | uname | Linux 17f3b7f8e5f8 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 981002ac1ee636b739e797256ec1762ee6ebd91c | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/2/testReport/ | | Max. process+thread count | 720 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-federation-balance U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/2/console | | versions | git=2.25.1 maven=3.6.3 shellcheck=0.7.0 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619742) Time Spent: 0.5h (was: 20m) > Fix Hadoop FedBalance shell and federationBanance markdown bug. > --- > > Key: HDFS-16116 > URL: https://issues.apache.org/jira/browse/HDFS-16116 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: panlijie >Assignee: panlijie >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Fix Hadoop FedBalance shell and federationBanance
[jira] [Updated] (HDFS-16094) HDFS balancer process start failed owing to daemon pid file is not cleared in some exception senario
[ https://issues.apache.org/jira/browse/HDFS-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16094: - Target Version/s: 3.1.1 (was: 3.4.0) > HDFS balancer process start failed owing to daemon pid file is not cleared in > some exception senario > > > Key: HDFS-16094 > URL: https://issues.apache.org/jira/browse/HDFS-16094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: scripts >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Major > > HDFS balancer process start failed owing to daemon pid file is not cleared in > some exception senario, but there is no useful information in log to trouble > shoot as below. > {code:java} > //代码占位符 > hadoop_error "${daemonname} is running as process $(cat "${daemon_pidfile}") > {code} > but actually, the process is not running as the error msg details above. > Therefore, some more explicit information should be print in error log to > guide users to clear the pid file and where the pid file location is. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.
[ https://issues.apache.org/jira/browse/HDFS-16116?focusedWorklogId=619702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619702 ] ASF GitHub Bot logged work on HDFS-16116: - Author: ASF GitHub Bot Created on: 07/Jul/21 02:11 Start Date: 07/Jul/21 02:11 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3181: URL: https://github.com/apache/hadoop/pull/3181#issuecomment-875214589 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 34s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | shelldocs | 0m 0s | | Shelldocs was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 55s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 9s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 54s | | trunk passed | | +1 :green_heart: | shadedclient | 13m 22s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 18s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | shellcheck | 0m 9s | | No new issues. | | +1 :green_heart: | shadedclient | 12m 59s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 38s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 0m 25s | | hadoop-federation-balance in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 69m 40s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3181 | | Optional Tests | dupname asflicense mvnsite unit codespell shellcheck shelldocs markdownlint | | uname | Linux c18d15971a47 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 725b76f468575b31fa8539a3b5ea9abf4aca8a71 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/1/testReport/ | | Max. process+thread count | 693 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-federation-balance U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/1/console | | versions | git=2.25.1 maven=3.6.3 shellcheck=0.7.0 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619702) Time Spent: 20m (was: 10m) > Fix Hadoop FedBalance shell and federationBanance markdown bug. > --- > > Key: HDFS-16116 > URL: https://issues.apache.org/jira/browse/HDFS-16116 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: panlijie >Assignee: panlijie >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Fix Hadoop FedBalance shell and federationBanance
[jira] [Comment Edited] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376146#comment-17376146 ] Daniel Ma edited comment on HDFS-16115 at 7/7/21, 2:01 AM: --- [~hexiaoqiao] Thanks for your review and tips. Actually the issue that my patch attempt to solve is totally different from the one HDFS-15651 mentioned. I have noticed this Jira perviously, but it can not solve my issue perfectly. What I try to solve in this patch is : 1-Once CommandProcess thread caught a non-fatal error or exception, there will be 5 time retry instead of simply interrup it, and after it reach the max retry times , we need to stop the corresponding BPServiceActor thread as well. In HDFS-15651, no matter what kind of the error is , just simply close the thread, but there are many non-fatal errors that probably recover automatically like "cannot create native thread error", when the thread in os drop, the BPServiceActor service still dead can not recover by itself. 2-In my patch, for the non-fatal error, BPOfferService thread always running a periodical thread to try to recover the BPServiceActor thread that is dead owing to non-fatal error, which is the essential difference between our patch and HDFS-15651 was (Author: daniel ma): [~hexiaoqiao] Thanks for your review and tips. Actually the issue that my patch attempt to solve is totally different from the one HDFS-15651 mentioned. I have noticed this Jira perviously, but it can not solve my issue perfectly. What I try to solve in this patch is : 1-Once CommandProcess thread is dead, we need to stop the corresponding BPServiceActor thread as well. 2-In my patch, for the non-fatal error, BPOfferService thread always running a periodical thread to try to recover the BPServiceActor thread that is dead owing to non-fatal error, which is the essential difference between our patch and HDFS-15651 > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > {code:java} > //代码占位符 > 2021-07-02 16:26:02,315 | WARN | Command processor | Exception happened when > process queue BPServiceActor.java:1393 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2315) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2237) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:752) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:698) > at >
[jira] [Comment Edited] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375401#comment-17375401 ] Daniel Ma edited comment on HDFS-16115 at 7/7/21, 1:47 AM: --- Hello [~brahmareddy],[~hemant] Could you pls help to review this patch. thanks. was (Author: daniel ma): Hello [~brahmareddy],[~hemant] Pls help to review this patch. thanks. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > {code:java} > //代码占位符 > 2021-07-02 16:26:02,315 | WARN | Command processor | Exception happened when > process queue BPServiceActor.java:1393 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2315) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2237) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:752) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:698) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1417) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1463) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1382) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1365) > {code} > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. > Therefore, in this patch, two things will be done: > 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread > thread which is 5 by default and configurable; > 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor > thread is dead owing to too many times non-fatal error, it should not be > simply removed from BPServviceActor lists stored in BPOfferService, instead, > the monitor thread will periodically try to start these special dead > BPServiceActor thread. the interval is also configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375401#comment-17375401 ] Daniel Ma edited comment on HDFS-16115 at 7/7/21, 1:46 AM: --- Hello [~brahmareddy],[~hemant] Pls help to review this patch. thanks. was (Author: daniel ma): Hello [~brahmareddy], [~ayush] Pls help to review this patch. thanks. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > {code:java} > //代码占位符 > 2021-07-02 16:26:02,315 | WARN | Command processor | Exception happened when > process queue BPServiceActor.java:1393 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2315) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2237) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:752) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:698) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1417) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1463) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1382) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1365) > {code} > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. > Therefore, in this patch, two things will be done: > 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread > thread which is 5 by default and configurable; > 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor > thread is dead owing to too many times non-fatal error, it should not be > simply removed from BPServviceActor lists stored in BPOfferService, instead, > the monitor thread will periodically try to start these special dead > BPServiceActor thread. the interval is also configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.
[ https://issues.apache.org/jira/browse/HDFS-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panlijie updated HDFS-16116: Fix Version/s: 3.4.0 > Fix Hadoop FedBalance shell and federationBanance markdown bug. > --- > > Key: HDFS-16116 > URL: https://issues.apache.org/jira/browse/HDFS-16116 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: panlijie >Assignee: panlijie >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Fix Hadoop FedBalance shell and federationBanance markdown bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376146#comment-17376146 ] Daniel Ma commented on HDFS-16115: -- [~hexiaoqiao] Thanks for your review and tips. Actually the issue that my patch attempt to solve is totally different from the one HDFS-15651 mentioned. I have noticed this Jira perviously, but it can not solve my issue perfectly. What I try to solve in this patch is : 1-Once CommandProcess thread is dead, we need to stop the corresponding BPServiceActor thread as well. 2-In my patch, for the non-fatal error, BPOfferService thread always running a periodical thread to try to recover the BPServiceActor thread that is dead owing to non-fatal error, which is the essential difference between our patch and HDFS-15651 > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > {code:java} > //代码占位符 > 2021-07-02 16:26:02,315 | WARN | Command processor | Exception happened when > process queue BPServiceActor.java:1393 > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2315) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2237) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:752) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:698) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1417) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1463) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1382) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1365) > {code} > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. > Therefore, in this patch, two things will be done: > 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread > thread which is 5 by default and configurable; > 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor > thread is dead owing to too many times non-fatal error, it should not be > simply removed from BPServviceActor lists stored in BPOfferService, instead, > the monitor thread will periodically try to start these special dead > BPServiceActor thread. the interval is also configurable. -- This message was sent by
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, {code:java} //代码占位符 2021-07-02 16:26:02,315 | WARN | Command processor | Exception happened when process queue BPServiceActor.java:1393 java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2315) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2237) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:752) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:698) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1417) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1463) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1382) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1365) {code} currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefore, in this patch, two things will be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply removed from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPServiceActor thread. the interval is also configurable. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefore, in this patch, two things will be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply removed from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special
[jira] [Updated] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.
[ https://issues.apache.org/jira/browse/HDFS-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16116: -- Labels: pull-request-available (was: ) > Fix Hadoop FedBalance shell and federationBanance markdown bug. > --- > > Key: HDFS-16116 > URL: https://issues.apache.org/jira/browse/HDFS-16116 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: panlijie >Assignee: panlijie >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Fix Hadoop FedBalance shell and federationBanance markdown bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.
[ https://issues.apache.org/jira/browse/HDFS-16116?focusedWorklogId=619681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619681 ] ASF GitHub Bot logged work on HDFS-16116: - Author: ASF GitHub Bot Created on: 07/Jul/21 01:00 Start Date: 07/Jul/21 01:00 Worklog Time Spent: 10m Work Description: xiaoxiaopan118 opened a new pull request #3181: URL: https://github.com/apache/hadoop/pull/3181 …n bug. ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.) For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619681) Remaining Estimate: 0h Time Spent: 10m > Fix Hadoop FedBalance shell and federationBanance markdown bug. > --- > > Key: HDFS-16116 > URL: https://issues.apache.org/jira/browse/HDFS-16116 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: panlijie >Assignee: panlijie >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Fix Hadoop FedBalance shell and federationBanance markdown bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.
[ https://issues.apache.org/jira/browse/HDFS-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panlijie reassigned HDFS-16116: --- Assignee: panlijie > Fix Hadoop FedBalance shell and federationBanance markdown bug. > --- > > Key: HDFS-16116 > URL: https://issues.apache.org/jira/browse/HDFS-16116 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: panlijie >Assignee: panlijie >Priority: Critical > > Fix Hadoop FedBalance shell and federationBanance markdown bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.
panlijie created HDFS-16116: --- Summary: Fix Hadoop FedBalance shell and federationBanance markdown bug. Key: HDFS-16116 URL: https://issues.apache.org/jira/browse/HDFS-16116 Project: Hadoop HDFS Issue Type: Bug Components: rbf Reporter: panlijie Fix Hadoop FedBalance shell and federationBanance markdown bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?focusedWorklogId=619550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619550 ] ASF GitHub Bot logged work on HDFS-16114: - Author: ASF GitHub Bot Created on: 06/Jul/21 18:44 Start Date: 06/Jul/21 18:44 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3179: URL: https://github.com/apache/hadoop/pull/3179#issuecomment-874997298 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 1s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 37m 13s | | trunk passed | | +1 :green_heart: | compile | 1m 43s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 25s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 17s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 40s | | trunk passed | | +1 :green_heart: | javadoc | 1m 11s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 41s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 59s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 17s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 28s | | the patch passed | | +1 :green_heart: | compile | 1m 32s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 32s | | the patch passed | | +1 :green_heart: | compile | 1m 22s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 22s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 16s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 33s | | the patch passed | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 38s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 4m 8s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 12s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 373m 50s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3179/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 45s | | The patch does not generate ASF License warnings. | | | | 478m 53s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3179/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3179 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 685d6a45fd47 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / cec6f8fac0014e0c88f299e3729f9402900ddc7f | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
[jira] [Updated] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap
[ https://issues.apache.org/jira/browse/HDFS-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-16101: Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanx [~lei w] for the contribution!!! > Remove unuse variable and IoException in ProvidedStorageMap > --- > > Key: HDFS-16101 > URL: https://issues.apache.org/jira/browse/HDFS-16101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: lei w >Assignee: lei w >Priority: Minor > Fix For: 3.4.0 > > Attachments: HDFS-16101.001.patch > > > Remove unuse variable and IoException in ProvidedStorageMap -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16101) Remove unuse variable and IoException in ProvidedStorageMap
[ https://issues.apache.org/jira/browse/HDFS-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375907#comment-17375907 ] Ayush Saxena commented on HDFS-16101: - +1 > Remove unuse variable and IoException in ProvidedStorageMap > --- > > Key: HDFS-16101 > URL: https://issues.apache.org/jira/browse/HDFS-16101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: lei w >Assignee: lei w >Priority: Minor > Attachments: HDFS-16101.001.patch > > > Remove unuse variable and IoException in ProvidedStorageMap -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?focusedWorklogId=619465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619465 ] ASF GitHub Bot logged work on HDFS-16114: - Author: ASF GitHub Bot Created on: 06/Jul/21 16:38 Start Date: 06/Jul/21 16:38 Worklog Time Spent: 10m Work Description: hemanthboyina edited a comment on pull request #3179: URL: https://github.com/apache/hadoop/pull/3179#issuecomment-874913168 +1 LGTM, will wait for jenkins -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619465) Time Spent: 40m (was: 0.5h) > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Minor > Labels: balancer, pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > public String toString() { > return String.format("%s.%s [%s," + " threshold = %s," > + " max idle iteration = %s," + " #excluded nodes = %s," > + " #included nodes = %s," + " #source nodes = %s," > + " #blockpools = %s," + " run during upgrade = %s," > {color:#FF}+ " hot block time interval = %s]"{color} > {color:#FF} + " sort top nodes = %s",{color} > Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, > threshold, maxIdleIteration, excludedNodes.size(), > includedNodes.size(), sourceNodes.size(), blockpools.size(), > runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color}); > } > print error. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?focusedWorklogId=619462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619462 ] ASF GitHub Bot logged work on HDFS-16114: - Author: ASF GitHub Bot Created on: 06/Jul/21 16:37 Start Date: 06/Jul/21 16:37 Worklog Time Spent: 10m Work Description: hemanthboyina commented on pull request #3179: URL: https://github.com/apache/hadoop/pull/3179#issuecomment-874913168 +1 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619462) Time Spent: 0.5h (was: 20m) > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Minor > Labels: balancer, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > public String toString() { > return String.format("%s.%s [%s," + " threshold = %s," > + " max idle iteration = %s," + " #excluded nodes = %s," > + " #included nodes = %s," + " #source nodes = %s," > + " #blockpools = %s," + " run during upgrade = %s," > {color:#FF}+ " hot block time interval = %s]"{color} > {color:#FF} + " sort top nodes = %s",{color} > Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, > threshold, maxIdleIteration, excludedNodes.size(), > includedNodes.size(), sourceNodes.size(), blockpools.size(), > runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color}); > } > print error. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=619390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619390 ] ASF GitHub Bot logged work on HDFS-16088: - Author: ASF GitHub Bot Created on: 06/Jul/21 14:41 Start Date: 06/Jul/21 14:41 Worklog Time Spent: 10m Work Description: tomscut edited a comment on pull request #3140: URL: https://github.com/apache/hadoop/pull/3140#issuecomment-874820122 This failed unit test `TestDecommissioningStatusWithBackoffMonitor` is unrelated to the change. I filed a JIRA [HDFS-16112](https://issues.apache.org/jira/browse/HDFS-16112). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619390) Time Spent: 3h 50m (was: 3h 40m) > Standby NameNode process getLiveDatanodeStorageReport request to reduce > Active load > --- > > Key: HDFS-16088 > URL: https://issues.apache.org/jira/browse/HDFS-16088 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: standyby-ipcserver.jpg > > Time Spent: 3h 50m > Remaining Estimate: 0h > > As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also > request to SNN to reduce the ANN load. > There are two points that need to be mentioned: > 1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, > so we can access SNN directly. > 2. We can share the same UT(testBalancerRequestSBNWithHA) with > NameNodeConnector#getBlocks(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=619389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619389 ] ASF GitHub Bot logged work on HDFS-16088: - Author: ASF GitHub Bot Created on: 06/Jul/21 14:40 Start Date: 06/Jul/21 14:40 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3140: URL: https://github.com/apache/hadoop/pull/3140#issuecomment-874820122 This failed unit test is unrelated to the change. I filed a JIRA [HDFS-16112](https://issues.apache.org/jira/browse/HDFS-16112). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619389) Time Spent: 3h 40m (was: 3.5h) > Standby NameNode process getLiveDatanodeStorageReport request to reduce > Active load > --- > > Key: HDFS-16088 > URL: https://issues.apache.org/jira/browse/HDFS-16088 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: standyby-ipcserver.jpg > > Time Spent: 3h 40m > Remaining Estimate: 0h > > As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also > request to SNN to reduce the ANN load. > There are two points that need to be mentioned: > 1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, > so we can access SNN directly. > 2. We can share the same UT(testBalancerRequestSBNWithHA) with > NameNodeConnector#getBlocks(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16095) Add lsQuotaList command and getQuotaListing api for hdfs quota
[ https://issues.apache.org/jira/browse/HDFS-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375756#comment-17375756 ] Xiaoqiao He commented on HDFS-16095: [~zhuxiangyi] Thanks for involving me here. {quote}It has a potential to hold the fsn/fsd lock for a long time and cause service outage or delays.{quote} [~kihwal] has leaved comment at Github PR. I total support that. It is fatal for NameNode to hold global lock for long time. I heard many many cases report that NameNode out of service because invoke quotaUsage/count/du to a huge path. So we should avoid to involve more heavy operation for NameNode before fine-grained locking solution is ready. Thanks again. > Add lsQuotaList command and getQuotaListing api for hdfs quota > -- > > Key: HDFS-16095 > URL: https://issues.apache.org/jira/browse/HDFS-16095 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Currently hdfs does not support obtaining all quota information. The > administrator may need to check which quotas have been added to a certain > directory, or the quotas of the entire cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375741#comment-17375741 ] Xiaoqiao He commented on HDFS-16115: [~Daniel Ma] Thanks for your report. I am not sure if HDFS-15651 could solve this issue. IMO If this thread meet some error DataNode process will exit too. Do you mind to offer some more information or log for this issue? Thanks. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. > Therefore, in this patch, two things will be done: > 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread > thread which is 5 by default and configurable; > 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor > thread is dead owing to too many times non-fatal error, it should not be > simply removed from BPServviceActor lists stored in BPOfferService, instead, > the monitor thread will periodically try to start these special dead > BPServiceActor thread. the interval is also configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=619376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619376 ] ASF GitHub Bot logged work on HDFS-16088: - Author: ASF GitHub Bot Created on: 06/Jul/21 14:07 Start Date: 06/Jul/21 14:07 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3140: URL: https://github.com/apache/hadoop/pull/3140#issuecomment-874792521 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 18s | | trunk passed | | +1 :green_heart: | compile | 1m 23s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 14s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 3s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 23s | | trunk passed | | +1 :green_heart: | javadoc | 0m 54s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 28s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 21s | | trunk passed | | +1 :green_heart: | shadedclient | 18m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 18s | | the patch passed | | +1 :green_heart: | compile | 1m 18s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 18s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 54s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 17s | | the patch passed | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 19s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 22s | | the patch passed | | +1 :green_heart: | shadedclient | 18m 42s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 360m 59s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3140/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 54s | | The patch does not generate ASF License warnings. | | | | 453m 3s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3140/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3140 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 9663799cf29b 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d80028d40add35d007191d18a972441307f7a814 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3140/6/testReport/ | | Max. process+thread count | 1956 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage
[ https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=619342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619342 ] ASF GitHub Bot logged work on HDFS-16087: - Author: ASF GitHub Bot Created on: 06/Jul/21 13:06 Start Date: 06/Jul/21 13:06 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3141: URL: https://github.com/apache/hadoop/pull/3141#issuecomment-874743045 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 3s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 44s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 4s | | trunk passed | | +1 :green_heart: | compile | 27m 37s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 24m 6s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 4m 29s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 43s | | trunk passed | | +1 :green_heart: | javadoc | 1m 35s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 52s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 34s | | trunk passed | | +1 :green_heart: | shadedclient | 19m 28s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 5s | | the patch passed | | +1 :green_heart: | compile | 26m 47s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 26m 47s | | the patch passed | | +1 :green_heart: | compile | 23m 1s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 23m 1s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 4m 28s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/3/artifact/out/results-checkstyle-root.txt) | root: The patch generated 41 new + 1 unchanged - 0 fixed = 42 total (was 1) | | +1 :green_heart: | mvnsite | 1m 36s | | the patch passed | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 52s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 55s | | the patch passed | | +1 :green_heart: | shadedclient | 18m 39s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 20m 53s | [/patch-unit-hadoop-tools_hadoop-federation-balance.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/3/artifact/out/patch-unit-hadoop-tools_hadoop-federation-balance.txt) | hadoop-federation-balance in the patch passed. | | -1 :x: | unit | 40m 22s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 1m 8s | | The patch does not generate ASF License warnings. | | | | 271m 6s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler | | | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3141 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux
[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage
[ https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=619327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619327 ] ASF GitHub Bot logged work on HDFS-16087: - Author: ASF GitHub Bot Created on: 06/Jul/21 12:21 Start Date: 06/Jul/21 12:21 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3141: URL: https://github.com/apache/hadoop/pull/3141#issuecomment-874712397 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 32s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 33s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 14s | | trunk passed | | +1 :green_heart: | compile | 23m 44s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 19m 23s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 3m 49s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 45s | | trunk passed | | +1 :green_heart: | javadoc | 1m 30s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 47s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 21s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 14s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 0m 59s | | the patch passed | | +1 :green_heart: | compile | 21m 7s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 21m 7s | | the patch passed | | +1 :green_heart: | compile | 18m 42s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 42s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 3m 50s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/5/artifact/out/results-checkstyle-root.txt) | root: The patch generated 39 new + 1 unchanged - 0 fixed = 40 total (was 1) | | +1 :green_heart: | mvnsite | 1m 40s | | the patch passed | | +1 :green_heart: | javadoc | 1m 35s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 44s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 17s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 6m 50s | | hadoop-federation-balance in the patch passed. | | +1 :green_heart: | unit | 20m 39s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 203m 56s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3141 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 7beea82002b2 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 97f2f3502975b60a8304b87bc7e4ca9b9914db0c | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results |
[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage
[ https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=619326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619326 ] ASF GitHub Bot logged work on HDFS-16087: - Author: ASF GitHub Bot Created on: 06/Jul/21 12:19 Start Date: 06/Jul/21 12:19 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3141: URL: https://github.com/apache/hadoop/pull/3141#issuecomment-874710808 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 40s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 35s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 22m 10s | | trunk passed | | +1 :green_heart: | compile | 23m 17s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 19m 0s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 3m 51s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 29s | | trunk passed | | +1 :green_heart: | javadoc | 1m 30s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 19s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 19s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 9s | | the patch passed | | +1 :green_heart: | compile | 22m 27s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 22m 27s | | the patch passed | | +1 :green_heart: | compile | 19m 11s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 19m 11s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 3m 46s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/4/artifact/out/results-checkstyle-root.txt) | root: The patch generated 39 new + 1 unchanged - 0 fixed = 40 total (was 1) | | +1 :green_heart: | mvnsite | 1m 44s | | the patch passed | | +1 :green_heart: | javadoc | 1m 35s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 52s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 52s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 51s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 7m 19s | | hadoop-federation-balance in the patch passed. | | +1 :green_heart: | unit | 21m 57s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 210m 20s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3141 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux cd3278928f94 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 83fa66c987843b8549d997b7213d3f3f3c3e0957 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results |
[jira] [Updated] (HDFS-16100) HA: Improve performance of Standby node transition to Active
[ https://issues.apache.org/jira/browse/HDFS-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wudeyu updated HDFS-16100: -- Attachment: HDFS-16100.001.patch > HA: Improve performance of Standby node transition to Active > - > > Key: HDFS-16100 > URL: https://issues.apache.org/jira/browse/HDFS-16100 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1 >Reporter: wudeyu >Assignee: wudeyu >Priority: Major > Attachments: HDFS-16100.001.patch, HDFS-16100.patch > > > pendingDNMessages in Standby is used to support process postponed block > reports. Block reports in pendingDNMessages would be processed: > # If GS of replica is in the future, Standby Node will process it when > corresponding edit log(e.g add_block) is loaded. > # If replica is corrupted, Standby Node will process it while it transfer to > Active. > # If DataNode is removed, corresponding of block reports will be removed in > pendingDNMessages. > Obviously, if num of corrupted replica grows, more time cost during > transferring. In out situation, there're 60 millions block reports in > pendingDNMessages before transfer. Processing block reports cost almost 7mins > and it's killed by zkfc. The replica state of the most block reports is RBW > with wrong GS(less than storedblock in Standby Node). > In my opinion, Standby Node could ignore the block reports that replica state > is RBW with wrong GS. Because Active node/DataNode will remove it later. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage
[ https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=619310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619310 ] ASF GitHub Bot logged work on HDFS-16087: - Author: ASF GitHub Bot Created on: 06/Jul/21 11:51 Start Date: 06/Jul/21 11:51 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3141: URL: https://github.com/apache/hadoop/pull/3141#issuecomment-874693976 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 45s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 45s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 40s | | trunk passed | | +1 :green_heart: | compile | 22m 39s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 19m 45s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 3m 50s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 28s | | trunk passed | | +1 :green_heart: | javadoc | 1m 26s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 27s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 6s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 0m 59s | | the patch passed | | +1 :green_heart: | compile | 21m 28s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 21m 28s | | the patch passed | | +1 :green_heart: | compile | 20m 8s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 20m 8s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 3m 46s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/2/artifact/out/results-checkstyle-root.txt) | root: The patch generated 41 new + 1 unchanged - 0 fixed = 42 total (was 1) | | +1 :green_heart: | mvnsite | 1m 29s | | the patch passed | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 47s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 51s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 59s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 7m 0s | | hadoop-federation-balance in the patch passed. | | +1 :green_heart: | unit | 22m 9s | | hadoop-hdfs-rbf in the patch passed. | | -1 :x: | asflicense | 1m 0s | [/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/2/artifact/out/results-asflicense.txt) | The patch generated 1 ASF License warnings. | | | | 208m 7s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3141 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 7cd7d12d662d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / c5486937397250a74b25e7bd7954af3313fa8644 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
[jira] [Work logged] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-16110?focusedWorklogId=619282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619282 ] ASF GitHub Bot logged work on HDFS-16110: - Author: ASF GitHub Bot Created on: 06/Jul/21 11:42 Start Date: 06/Jul/21 11:42 Worklog Time Spent: 10m Work Description: jojochuang merged pull request #3174: URL: https://github.com/apache/hadoop/pull/3174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619282) Time Spent: 1.5h (was: 1h 20m) > Remove unused method reportChecksumFailure in DFSClient > --- > > Key: HDFS-16110 > URL: https://issues.apache.org/jira/browse/HDFS-16110 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Remove unused method reportChecksumFailure and fix some code styles by the > way in DFSClient. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=619221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619221 ] ASF GitHub Bot logged work on HDFS-16088: - Author: ASF GitHub Bot Created on: 06/Jul/21 11:33 Start Date: 06/Jul/21 11:33 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3140: URL: https://github.com/apache/hadoop/pull/3140#issuecomment-874412101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619221) Time Spent: 3h 20m (was: 3h 10m) > Standby NameNode process getLiveDatanodeStorageReport request to reduce > Active load > --- > > Key: HDFS-16088 > URL: https://issues.apache.org/jira/browse/HDFS-16088 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: standyby-ipcserver.jpg > > Time Spent: 3h 20m > Remaining Estimate: 0h > > As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also > request to SNN to reduce the ANN load. > There are two points that need to be mentioned: > 1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, > so we can access SNN directly. > 2. We can share the same UT(testBalancerRequestSBNWithHA) with > NameNodeConnector#getBlocks(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?focusedWorklogId=619211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619211 ] ASF GitHub Bot logged work on HDFS-16114: - Author: ASF GitHub Bot Created on: 06/Jul/21 11:32 Start Date: 06/Jul/21 11:32 Worklog Time Spent: 10m Work Description: JiaguodongF opened a new pull request #3179: URL: https://github.com/apache/hadoop/pull/3179 public String toString() { return String.format("%s.%s [%s," + " threshold = %s," + " max idle iteration = %s," + " #excluded nodes = %s," + " #included nodes = %s," + " #source nodes = %s," + " #blockpools = %s," + " run during upgrade = %s," + " hot block time interval = %s]" + " sort top nodes = %s", Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, threshold, maxIdleIteration, excludedNodes.size(), includedNodes.size(), sourceNodes.size(), blockpools.size(), runDuringUpgrade, sortTopNodes, hotBlockTimeInterval); } print error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619211) Time Spent: 20m (was: 10m) > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Minor > Labels: balancer, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > public String toString() { > return String.format("%s.%s [%s," + " threshold = %s," > + " max idle iteration = %s," + " #excluded nodes = %s," > + " #included nodes = %s," + " #source nodes = %s," > + " #blockpools = %s," + " run during upgrade = %s," > {color:#FF}+ " hot block time interval = %s]"{color} > {color:#FF} + " sort top nodes = %s",{color} > Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, > threshold, maxIdleIteration, excludedNodes.size(), > includedNodes.size(), sourceNodes.size(), blockpools.size(), > runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color}); > } > print error. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-16110?focusedWorklogId=619180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619180 ] ASF GitHub Bot logged work on HDFS-16110: - Author: ASF GitHub Bot Created on: 06/Jul/21 11:28 Start Date: 06/Jul/21 11:28 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3174: URL: https://github.com/apache/hadoop/pull/3174#issuecomment-874394011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 619180) Time Spent: 1h 20m (was: 1h 10m) > Remove unused method reportChecksumFailure in DFSClient > --- > > Key: HDFS-16110 > URL: https://issues.apache.org/jira/browse/HDFS-16110 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Remove unused method reportChecksumFailure and fix some code styles by the > way in DFSClient. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=619157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619157 ] ASF GitHub Bot logged work on HDFS-16088: - Author: ASF GitHub Bot Created on: 06/Jul/21 11:24 Start Date: 06/Jul/21 11:24 Worklog Time Spent: 10m Work Description: Hexiaoqiao commented on a change in pull request #3140: URL: https://github.com/apache/hadoop/pull/3140#discussion_r664253973 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithHANameNodes.java ## @@ -236,4 +241,93 @@ private void testBalancerWithObserver(boolean withObserverFailure) } } } + + /** + * Comparing the results of getLiveDatanodeStorageReport() + * from the active and standby NameNodes, + * the results should be the same. + */ + @Test(timeout = 6) + public void testGetLiveDatanodeStorageReport() throws Exception { +Configuration conf = new HdfsConfiguration(); +TestBalancer.initConf(conf); +assertEquals(TEST_CAPACITIES.length, TEST_RACKS.length); +NNConf nn1Conf = new MiniDFSNNTopology.NNConf("nn1"); +nn1Conf.setIpcPort(HdfsClientConfigKeys.DFS_NAMENODE_RPC_PORT_DEFAULT); +Configuration copiedConf = new Configuration(conf); +// Try capture NameNodeConnector log. +LogCapturer log =LogCapturer.captureLogs( +LoggerFactory.getLogger(NameNodeConnector.class)); +// We needs to assert datanode info from ANN and SNN, so the +// heartbeat should disabled for the duration of method execution +copiedConf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 6); +cluster = new MiniDFSCluster.Builder(copiedConf) +.nnTopology(MiniDFSNNTopology.simpleHATopology()) +.numDataNodes(TEST_CAPACITIES.length) +.racks(TEST_RACKS) +.simulatedCapacities(TEST_CAPACITIES) +.build(); +HATestUtil.setFailoverConfigurations(cluster, conf); +try { + cluster.waitActive(); + cluster.transitionToActive(0); + URI namenode = (URI) DFSUtil.getInternalNsRpcUris(conf) + .toArray()[0]; + String nsId = DFSUtilClient.getNameServiceIds(conf) + .toArray()[0].toString(); + + // request to active namenode + NameNodeConnector nncActive = new NameNodeConnector( + "nncActive", namenode, + nsId, new Path("/test"), + null, conf, NameNodeConnector.DEFAULT_MAX_IDLE_ITERATIONS); + DatanodeStorageReport[] ldspFromAnn = Review comment: `ldspFromAnn` here is not very explicit IMO, is `datanodeStorageReports` more clear here? ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithHANameNodes.java ## @@ -236,4 +241,93 @@ private void testBalancerWithObserver(boolean withObserverFailure) } } } + + /** + * Comparing the results of getLiveDatanodeStorageReport() + * from the active and standby NameNodes, + * the results should be the same. + */ + @Test(timeout = 6) + public void testGetLiveDatanodeStorageReport() throws Exception { +Configuration conf = new HdfsConfiguration(); +TestBalancer.initConf(conf); +assertEquals(TEST_CAPACITIES.length, TEST_RACKS.length); +NNConf nn1Conf = new MiniDFSNNTopology.NNConf("nn1"); +nn1Conf.setIpcPort(HdfsClientConfigKeys.DFS_NAMENODE_RPC_PORT_DEFAULT); +Configuration copiedConf = new Configuration(conf); +// Try capture NameNodeConnector log. +LogCapturer log =LogCapturer.captureLogs( +LoggerFactory.getLogger(NameNodeConnector.class)); +// We needs to assert datanode info from ANN and SNN, so the +// heartbeat should disabled for the duration of method execution +copiedConf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 6); +cluster = new MiniDFSCluster.Builder(copiedConf) +.nnTopology(MiniDFSNNTopology.simpleHATopology()) +.numDataNodes(TEST_CAPACITIES.length) +.racks(TEST_RACKS) +.simulatedCapacities(TEST_CAPACITIES) +.build(); +HATestUtil.setFailoverConfigurations(cluster, conf); +try { + cluster.waitActive(); + cluster.transitionToActive(0); + URI namenode = (URI) DFSUtil.getInternalNsRpcUris(conf) + .toArray()[0]; + String nsId = DFSUtilClient.getNameServiceIds(conf) + .toArray()[0].toString(); + + // request to active namenode Review comment: It is better to begin with uppercase character and end with period for annotation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at:
[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375453#comment-17375453 ] Stephen O'Donnell commented on HDFS-15796: -- [~Daniel Ma] I understand how a ConcurrentModificationException occurs, but targets and excluded nodes are local variables to the method, and the calls to getTargets is already synchronized (at least on trunk). Can you please highlight the line the exception occurs on so we can see the exact call which is giving the problem? Thanks. > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Priority: Critical > Attachments: 0001-HDFS-15796.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16114: -- Labels: balancer pull-request-available (was: balancer) > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Minor > Labels: balancer, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > public String toString() { > return String.format("%s.%s [%s," + " threshold = %s," > + " max idle iteration = %s," + " #excluded nodes = %s," > + " #included nodes = %s," + " #source nodes = %s," > + " #blockpools = %s," + " run during upgrade = %s," > {color:#FF}+ " hot block time interval = %s]"{color} > {color:#FF} + " sort top nodes = %s",{color} > Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, > threshold, maxIdleIteration, excludedNodes.size(), > includedNodes.size(), sourceNodes.size(), blockpools.size(), > runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color}); > } > print error. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?focusedWorklogId=618990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618990 ] ASF GitHub Bot logged work on HDFS-16114: - Author: ASF GitHub Bot Created on: 06/Jul/21 10:18 Start Date: 06/Jul/21 10:18 Worklog Time Spent: 10m Work Description: JiaguodongF opened a new pull request #3179: URL: https://github.com/apache/hadoop/pull/3179 public String toString() { return String.format("%s.%s [%s," + " threshold = %s," + " max idle iteration = %s," + " #excluded nodes = %s," + " #included nodes = %s," + " #source nodes = %s," + " #blockpools = %s," + " run during upgrade = %s," + " hot block time interval = %s]" + " sort top nodes = %s", Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, threshold, maxIdleIteration, excludedNodes.size(), includedNodes.size(), sourceNodes.size(), blockpools.size(), runDuringUpgrade, sortTopNodes, hotBlockTimeInterval); } print error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 618990) Remaining Estimate: 0h Time Spent: 10m > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Minor > Labels: balancer > Time Spent: 10m > Remaining Estimate: 0h > > public String toString() { > return String.format("%s.%s [%s," + " threshold = %s," > + " max idle iteration = %s," + " #excluded nodes = %s," > + " #included nodes = %s," + " #source nodes = %s," > + " #blockpools = %s," + " run during upgrade = %s," > {color:#FF}+ " hot block time interval = %s]"{color} > {color:#FF} + " sort top nodes = %s",{color} > Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, > threshold, maxIdleIteration, excludedNodes.size(), > includedNodes.size(), sourceNodes.size(), blockpools.size(), > runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color}); > } > print error. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16098) ERROR tools.DiskBalancerCLI: java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/HDFS-16098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375427#comment-17375427 ] Daniel Ma commented on HDFS-16098: -- [~wangyanfu] Thanks for reporting this issue. Could you pls share more details about the error stack. > ERROR tools.DiskBalancerCLI: java.lang.IllegalArgumentException > --- > > Key: HDFS-16098 > URL: https://issues.apache.org/jira/browse/HDFS-16098 > Project: Hadoop HDFS > Issue Type: Bug > Components: diskbalancer >Affects Versions: 2.6.0 > Environment: VERSION info: > Hadoop 2.6.0-cdh5.14.4 >Reporter: wangyanfu >Priority: Blocker > Labels: diskbalancer > Fix For: 2.6.0 > > Attachments: image-2021-07-01-18-34-54-905.png, on-branch-3.1.jpg > > Original Estimate: 504h > Remaining Estimate: 504h > > when i tried to run > hdfs diskbalancer -plan $(hostname -f) > > > > i get this notice: > 21/06/30 11:30:41 ERROR tools.DiskBalancerCLI: > java.lang.IllegalArgumentException > > then i tried write the real hostname into my command , not work and same > error notice > i also tried use --plan instead of -plan , not work and same error notice > i found this > [link|https://community.cloudera.com/t5/Support-Questions/Error-trying-to-balance-disks-on-node/m-p/59989#M54850] > but there's no resolve solution , can somebody help me? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375421#comment-17375421 ] Daniel Ma edited comment on HDFS-15796 at 7/6/21, 9:57 AM: --- [~sodonnell] Thanks for reviewing, Actually you missed the for loop here: {code:java} //代码占位符 synchronized (pendingReconstruction) { List targets = pendingReconstruction .getTargets(rw.getBlock()); if (targets != null) { for (DatanodeStorageInfo dn : targets) { if (!excludedNodes.contains(dn.getDatanodeDescriptor())) { excludedNodes.add(dn.getDatanodeDescriptor()); } } } } {code} The problem happens when the code above try to travel the DataNodes stored in pendingReconstruction object, while the DataNode list is also been modifing elsewhere. In other words, if you modify a List(delete or add an element) and visit it in the same time, ConcurrentModificationException will be casted. was (Author: daniel ma): [~sodonnell] Thanks for reviewing, Actually you missed the for loop here: {code:java} //代码占位符 synchronized (pendingReconstruction) { List targets = pendingReconstruction .getTargets(rw.getBlock()); if (targets != null) { for (DatanodeStorageInfo dn : targets) { if (!excludedNodes.contains(dn.getDatanodeDescriptor())) { excludedNodes.add(dn.getDatanodeDescriptor()); } } } } {code} The problem happens when the code above try to travel the DataNodes stored in pendingReconstruction object, while the DataNode list is also be modified elsewhere. In other words, if you modify a List(delete or add an element) and visit it in the same time, ConcurrentModificationException will be casted. > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Priority: Critical > Attachments: 0001-HDFS-15796.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375421#comment-17375421 ] Daniel Ma commented on HDFS-15796: -- [~sodonnell] Thanks for reviewing, Actually you missed the for loop here: {code:java} //代码占位符 synchronized (pendingReconstruction) { List targets = pendingReconstruction .getTargets(rw.getBlock()); if (targets != null) { for (DatanodeStorageInfo dn : targets) { if (!excludedNodes.contains(dn.getDatanodeDescriptor())) { excludedNodes.add(dn.getDatanodeDescriptor()); } } } } {code} The problem happens when the code above try to travel the DataNodes stored in pendingReconstruction object, while the DataNode list is also be modified elsewhere. In other words, if you modify a List(delete or add an element) and visit it in the same time, ConcurrentModificationException will be casted. > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Priority: Critical > Attachments: 0001-HDFS-15796.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefore, in this patch, two things will be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply removed from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPServiceActor thread. the interval is also configurable. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefore, in this patch, two things will be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply removed from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > currently, Datanode BPServiceActor
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefore, in this patch, two things will be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply removed from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefore, in this patch, two things will be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > currently, Datanode BPServiceActor
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefore, in this patch, two things will be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefor, in this patch, two things was be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > currently, Datanode BPServiceActor
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefor, in this patch, two things was be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefor, in this patch, two things was be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much torlerance instead of simply shudown the thread and never recover > automatically, because the non-fatal errors mentioned above probably can be > recovered soon by itself, > currently, Datanode BPServiceActor cannot turn to
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much more torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefor, in this patch, two things was be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed in OS, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefor, in this patch, two things was be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed in OS, this kind of problem should be > given much more torlerance instead of simply shudown the thread and never > recover automatically, because the non-fatal errors mentioned above probably > can be recovered soon by itself, > currently, Datanode BPServiceActor cannot
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefor, in this patch, two things was be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. the interval is also configurable. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefor, in this patch, two things was be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed on the node, this kind of problem > should be given much torlerance instead of simply shudown the thread and > never recover automatically, because the non-fatal errors mentioned above > probably can be recovered soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. Therefor, in this patch, two things was be done: 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread thread which is 5 by default and configurable; 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor thread is dead owing to too many times non-fatal error, it should not be simply remove from BPServviceActor lists stored in BPOfferService, instead, the monitor thread will periodically try to start these special dead BPService Actor thread. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed on the node, this kind of problem > should be given much torlerance instead of simply shudown the thread and > never recover automatically, because the non-fatal errors mentioned above > probably can be recovered soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. > Therefor, in this patch, two things was be done: > 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread > thread which is 5 by default and configurable; > 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor > thread is dead owing to too many times non-fatal error, it should not be > simply remove from BPServviceActor lists stored in BPOfferService, instead, > the monitor thread will periodically try to start these special dead > BPService
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exception or errors happens in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep put commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread fails owing to some non-fatal error like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal eror mention above may recover soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed on the node, this kind of problem > should be given much torlerance instead of simply shudown the thread and > never recover automatically, because the non-fatal errors mentioned above > probably can be recovered soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375401#comment-17375401 ] Daniel Ma edited comment on HDFS-16115 at 7/6/21, 9:33 AM: --- Hello [~brahmareddy], [~ayush] Pls help to review this patch. thanks. was (Author: daniel ma): [~brahmareddy] [~ayush] Pls help to review this patch. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exception or > errors happens in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep put commands from > namenode into queues waiting to be handled by CommandProcessThread, actually > CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread > fails owing to some non-fatal error like "can not create native thread" which > is caused by too many threads existed on the node, this kind of problem > should be given much torlerance instead of simply shudown the thread and > never recover automatically, because the non-fatal eror mention above may > recover soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375401#comment-17375401 ] Daniel Ma commented on HDFS-16115: -- [~brahmareddy] [~ayush] Pls help to review this patch. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exception or > errors happens in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep put commands from > namenode into queues waiting to be handled by CommandProcessThread, actually > CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread > fails owing to some non-fatal error like "can not create native thread" which > is caused by too many threads existed on the node, this kind of problem > should be given much torlerance instead of simply shudown the thread and > never recover automatically, because the non-fatal eror mention above may > recover soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaguodong updated HDFS-16114: -- Description: public String toString() { return String.format("%s.%s [%s," + " threshold = %s," + " max idle iteration = %s," + " #excluded nodes = %s," + " #included nodes = %s," + " #source nodes = %s," + " #blockpools = %s," + " run during upgrade = %s," {color:#FF}+ " hot block time interval = %s]"{color} {color:#FF} + " sort top nodes = %s",{color} Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, threshold, maxIdleIteration, excludedNodes.size(), includedNodes.size(), sourceNodes.size(), blockpools.size(), runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color}); } print error. > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Minor > Labels: balancer > > public String toString() { > return String.format("%s.%s [%s," + " threshold = %s," > + " max idle iteration = %s," + " #excluded nodes = %s," > + " #included nodes = %s," + " #source nodes = %s," > + " #blockpools = %s," + " run during upgrade = %s," > {color:#FF}+ " hot block time interval = %s]"{color} > {color:#FF} + " sort top nodes = %s",{color} > Balancer.class.getSimpleName(), getClass().getSimpleName(), policy, > threshold, maxIdleIteration, excludedNodes.size(), > includedNodes.size(), sourceNodes.size(), blockpools.size(), > runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color}); > } > print error. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exception or errors happens in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep put commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread fails owing to some non-fatal error like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal eror mention above may recover soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exception or errors happens in thread CommandProcessthread resulting the thread fails and stop, which is not aware of it and still keep put command from namenode into queues to be handled by CommandProcessThread 2-the second sub issue is based on the first one, if CommandProcessThread fails owing to some non-fatal error like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal eror mention above may recover soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exception or > errors happens in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep put commands from > namenode into queues waiting to be handled by CommandProcessThread, actually > CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread > fails owing to some non-fatal error like "can not create native thread" which > is caused by too many threads existed on the node, this kind of problem > should be given much torlerance instead of simply shudown the thread and > never recover automatically, because the non-fatal eror mention above may > recover soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaguodong updated HDFS-16114: -- Labels: balancer (was: ) > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Minor > Labels: balancer > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaguodong updated HDFS-16114: -- Priority: Minor (was: Major) > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16114) the balancer parameters print error
[ https://issues.apache.org/jira/browse/HDFS-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaguodong updated HDFS-16114: -- Summary: the balancer parameters print error (was: the balancer will exit) > the balancer parameters print error > --- > > Key: HDFS-16114 > URL: https://issues.apache.org/jira/browse/HDFS-16114 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: jiaguodong >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-16115: - Attachment: 0001-HDFS-16115.patch > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exception or > errors happens in thread CommandProcessthread resulting the thread fails and > stop, which is not aware of it and still keep put command from namenode into > queues to be handled by CommandProcessThread > 2-the second sub issue is based on the first one, if CommandProcessThread > fails owing to some non-fatal error like "can not create native thread" which > is caused by too many threads existed on the node, this kind of problem > should be given much torlerance instead of simply shudown the thread and > never recover automatically, because the non-fatal eror mention above may > recover soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.
Daniel Ma created HDFS-16115: Summary: Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error. Key: HDFS-16115 URL: https://issues.apache.org/jira/browse/HDFS-16115 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.3.1 Reporter: Daniel Ma Fix For: 3.3.1 It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exception or errors happens in thread CommandProcessthread resulting the thread fails and stop, which is not aware of it and still keep put command from namenode into queues to be handled by CommandProcessThread 2-the second sub issue is based on the first one, if CommandProcessThread fails owing to some non-fatal error like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal eror mention above may recover soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16114) the balancer will exit
jiaguodong created HDFS-16114: - Summary: the balancer will exit Key: HDFS-16114 URL: https://issues.apache.org/jira/browse/HDFS-16114 Project: Hadoop HDFS Issue Type: Improvement Reporter: jiaguodong -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375386#comment-17375386 ] Stephen O'Donnell commented on HDFS-15796: -- I don't think the patch will help anything here. All the methods on PendingReconstruction are already synchronized, eg: {code} List getTargets(BlockInfo block) { synchronized (pendingReconstructions) { PendingBlockInfo found = pendingReconstructions.get(block); if (found != null) { return found.targets; } } return null; } {code} You posted a code snippet above, but there are no line numbers with it, so I cannot see what line the exception was thrown. Can you tell me which is line is 1907 in your code snippet above? > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Priority: Critical > Attachments: 0001-HDFS-15796.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375331#comment-17375331 ] Daniel Ma commented on HDFS-15796: -- [~weichiu],[~hexiaoqiao] Pls help to review this patch, thanks > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Priority: Critical > Attachments: 0001-HDFS-15796.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-15796: - Attachment: 0001-HDFS-15796.patch > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Priority: Critical > Attachments: 0001-HDFS-15796.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Ma updated HDFS-15796: - Target Version/s: 3.3.1 (was: 3.4.0) > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Priority: Critical > Attachments: 0001-HDFS-15796.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16095) Add lsQuotaList command and getQuotaListing api for hdfs quota
[ https://issues.apache.org/jira/browse/HDFS-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375286#comment-17375286 ] Xiangyi Zhu commented on HDFS-16095: [~weichiu],[~ayushtkn],[~hexiaoqiao],[~kihwal] Looking forward to your comments. > Add lsQuotaList command and getQuotaListing api for hdfs quota > -- > > Key: HDFS-16095 > URL: https://issues.apache.org/jira/browse/HDFS-16095 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Currently hdfs does not support obtaining all quota information. The > administrator may need to check which quotas have been added to a certain > directory, or the quotas of the entire cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=618891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618891 ] ASF GitHub Bot logged work on HDFS-16088: - Author: ASF GitHub Bot Created on: 06/Jul/21 06:35 Start Date: 06/Jul/21 06:35 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3140: URL: https://github.com/apache/hadoop/pull/3140#issuecomment-874499453 Thanks @Hexiaoqiao for your comments and suggestions. I fixed the problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 618891) Time Spent: 3h (was: 2h 50m) > Standby NameNode process getLiveDatanodeStorageReport request to reduce > Active load > --- > > Key: HDFS-16088 > URL: https://issues.apache.org/jira/browse/HDFS-16088 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: standyby-ipcserver.jpg > > Time Spent: 3h > Remaining Estimate: 0h > > As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also > request to SNN to reduce the ANN load. > There are two points that need to be mentioned: > 1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, > so we can access SNN directly. > 2. We can share the same UT(testBalancerRequestSBNWithHA) with > NameNodeConnector#getBlocks(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org