[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caijialiang updated HDFS-17287: --- Attachment: patch11-HDFS-17287.diff Status: Patch Available (was: Open) > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HDFS-17287 > URL: https://issues.apache.org/jira/browse/HDFS-17287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 3.3.6 >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > Here's the translation of the Hadoop PR description into English: > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caijialiang updated HDFS-17287: --- Description: The reason for the slow compilation: The Hadoop project has many modules, and the inability to compile them in parallel results in a slow process. For instance, the first compilation of Hadoop might take several hours, and even with local Maven dependencies, a subsequent compilation can still take close to 40 minutes, which is very slow. How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to investigate the dependency issues that prevent parallel compilation. * Investigate the dependencies between project modules. * Analyze the dependencies in multi-module Maven projects. * Download {{{}maven-to-plantuml{}}}: {{wget [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} * Generate a dependency tree: {{mvn dependency:tree > dep.txt}} * Generate a UML diagram from the dependency tree: {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} For more information, visit: [maven-to-plantuml GitHub repository|https://github.com/phxql/maven-to-plantuml/tree/master]. Here's the translation of the Hadoop PR description into English: *Hadoop Parallel Compilation Submission Logic* # Reasons for Parallel Compilation Failure ** In sequential compilation, as modules are compiled one by one in order, there are no errors because the compilation follows the module sequence. ** However, in parallel compilation, all modules are compiled simultaneously. The compilation order during multi-module concurrent compilation depends on the inter-module dependencies. If Module A depends on Module B, then Module B will be compiled before Module A. This ensures that the compilation order follows the dependencies between modules. But when Hadoop compiles in parallel, for example, compiling {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. The issue arises during the dist package stage. {{dist}} packages all other compiled modules. *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* ** In serial compilation, it compiles modules in the pom one by one in sequence. After all modules are compiled, it compiles {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the {{maven-assembly-plugin}} plugin is executed for packaging. All packages are repackaged according to the description in {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* ** Parallel compilation compiles modules according to the dependency order among them. If modules do not declare dependencies on each other through {{{}dependency{}}}, they are compiled in parallel. According to the dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, executing its {{{}maven-assembly-plugin{}}}. ** However, the files needed for packaging in {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. Therefore, when compiling {{hadoop-yarn-project}} and executing {{{}maven-assembly-plugin{}}}, not all required modules are built yet, leading to errors in parallel compilation. *Solution:* ** The solution is relatively straightforward: organize all modules from {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, and then declare them as dependencies in the pom of {{{}hadoop-yarn-project{}}}. was: The reason for the slow compilation: The Hadoop project has many modules, and the inability to compile them in parallel results in a slow process. For instance, the first compilation of Hadoop might take several hours, and even with local Maven dependencies, a subsequent compilation can still take close to 40 minutes, which is very slow. How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to investigate the dependency issues that prevent parallel compilation. * Investigate the dependencies between project modules. * Analyze the dependencies in multi-module Maven projects. * Download {{{}maven-to-plantuml{}}}: {{wget https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar}} * Generate a dependency tree: {{mvn dependency:tree > dep.txt}} * Generate a UML diagram from the dependency tree: {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} For more information, visit: [maven-to-plantuml GitHub repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HDFS-17287 > URL:
[jira] [Commented] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799275#comment-17799275 ] ASF GitHub Bot commented on HDFS-17287: --- JiaLiangC opened a new pull request, #6373: URL: https://github.com/apache/hadoop/pull/6373 ### Description of PR https://issues.apache.org/jira/browse/HDFS-17287 Here's the translation of the Hadoop PR description into English: **Hadoop Parallel Compilation Submission Logic** 1. Reasons for Parallel Compilation Failure - In sequential compilation, as modules are compiled one by one in order, there are no errors because the compilation follows the module sequence. - However, in parallel compilation, all modules are compiled simultaneously. The compilation order during multi-module concurrent compilation depends on the inter-module dependencies. If Module A depends on Module B, then Module B will be compiled before Module A. This ensures that the compilation order follows the dependencies between modules. But when Hadoop compiles in parallel, for example, compiling `hadoop-yarn-project`, the dependencies between modules are correct. The issue arises during the dist package stage. `dist` packages all other compiled modules. **Behavior of `hadoop-yarn-project` in Serial Compilation:** - In serial compilation, it compiles modules in the pom one by one in sequence. After all modules are compiled, it compiles `hadoop-yarn-project`. During the `prepare-package` stage, the `maven-assembly-plugin` plugin is executed for packaging. All packages are repackaged according to the description in `hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml`. **Behavior of `hadoop-yarn-project` in Parallel Compilation:** - Parallel compilation compiles modules according to the dependency order among them. If modules do not declare dependencies on each other through `dependency`, they are compiled in parallel. According to the dependency definition in the pom of `hadoop-yarn-project`, the dependencies are compiled first, followed by `hadoop-yarn-project`, executing its `maven-assembly-plugin`. - However, the files needed for packaging in `hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml` are not all included in the `dependency` of `hadoop-yarn-project`. Therefore, when compiling `hadoop-yarn-project` and executing `maven-assembly-plugin`, not all required modules are built yet, leading to errors in parallel compilation. **Solution:** - The solution is relatively straightforward: organize all modules from `hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml`, and then declare them as dependencies in the pom of `hadoop-yarn-project`. ### How was this patch tested? manual test on centos8 ![image](https://github.com/apache/hadoop/assets/18082602/2f95c1df-6aeb-42fd-98d8-7fe9e47e9401) ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HDFS-17287 > URL: https://issues.apache.org/jira/browse/HDFS-17287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 3.3.6 >Reporter: caijialiang >Priority: Major > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input
[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17287: -- Labels: pull-request-available (was: ) > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HDFS-17287 > URL: https://issues.apache.org/jira/browse/HDFS-17287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 3.3.6 >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17285) RBF: Add a safe mode check period configuration
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan resolved HDFS-17285. --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Target Version/s: 3.4.0 Assignee: liuguanghua Resolution: Fixed > RBF: Add a safe mode check period configuration > --- > > Key: HDFS-17285 > URL: https://issues.apache.org/jira/browse/HDFS-17285 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > When dfsrouter start, it enters safe mode. And it will cost 1min to leave. > The log is blow: > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave > startup safe mode after 3 ms > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter > safe mode after 18 ms without reaching the State Store > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Entering safe mode > 14:35:24,996 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Delaying safemode exit for 28721 milliseconds... > 14:36:25,037 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Leaving safe mode after 61319 milliseconds > It depends on these configs. > DFS_ROUTER_SAFEMODE_EXTENSION 30s > DFS_ROUTER_SAFEMODE_EXPIRATION 3min > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) > Because in safe mode dfsrouter will reject write requests, so it should be > shorter in check period if refreshCaches is done. And we should remove > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17285) RBF: Add a safe mode check period configuration
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17285: -- Affects Version/s: 3.4.0 > RBF: Add a safe mode check period configuration > --- > > Key: HDFS-17285 > URL: https://issues.apache.org/jira/browse/HDFS-17285 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > When dfsrouter start, it enters safe mode. And it will cost 1min to leave. > The log is blow: > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave > startup safe mode after 3 ms > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter > safe mode after 18 ms without reaching the State Store > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Entering safe mode > 14:35:24,996 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Delaying safemode exit for 28721 milliseconds... > 14:36:25,037 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Leaving safe mode after 61319 milliseconds > It depends on these configs. > DFS_ROUTER_SAFEMODE_EXTENSION 30s > DFS_ROUTER_SAFEMODE_EXPIRATION 3min > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) > Because in safe mode dfsrouter will reject write requests, so it should be > shorter in check period if refreshCaches is done. And we should remove > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17285) RBF: Add a safe mode check period configuration
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799251#comment-17799251 ] ASF GitHub Bot commented on HDFS-17285: --- slfan1989 commented on PR #6347: URL: https://github.com/apache/hadoop/pull/6347#issuecomment-1865456100 @LiuGuH Thank you for your contribution! @goiri @ayushtkn Thanks for helping with the review! > RBF: Add a safe mode check period configuration > --- > > Key: HDFS-17285 > URL: https://issues.apache.org/jira/browse/HDFS-17285 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > When dfsrouter start, it enters safe mode. And it will cost 1min to leave. > The log is blow: > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave > startup safe mode after 3 ms > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter > safe mode after 18 ms without reaching the State Store > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Entering safe mode > 14:35:24,996 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Delaying safemode exit for 28721 milliseconds... > 14:36:25,037 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Leaving safe mode after 61319 milliseconds > It depends on these configs. > DFS_ROUTER_SAFEMODE_EXTENSION 30s > DFS_ROUTER_SAFEMODE_EXPIRATION 3min > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) > Because in safe mode dfsrouter will reject write requests, so it should be > shorter in check period if refreshCaches is done. And we should remove > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17285) RBF: Add a safe mode check period configuration
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799250#comment-17799250 ] ASF GitHub Bot commented on HDFS-17285: --- slfan1989 merged PR #6347: URL: https://github.com/apache/hadoop/pull/6347 > RBF: Add a safe mode check period configuration > --- > > Key: HDFS-17285 > URL: https://issues.apache.org/jira/browse/HDFS-17285 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > When dfsrouter start, it enters safe mode. And it will cost 1min to leave. > The log is blow: > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave > startup safe mode after 3 ms > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter > safe mode after 18 ms without reaching the State Store > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Entering safe mode > 14:35:24,996 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Delaying safemode exit for 28721 milliseconds... > 14:36:25,037 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Leaving safe mode after 61319 milliseconds > It depends on these configs. > DFS_ROUTER_SAFEMODE_EXTENSION 30s > DFS_ROUTER_SAFEMODE_EXPIRATION 3min > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) > Because in safe mode dfsrouter will reject write requests, so it should be > shorter in check period if refreshCaches is done. And we should remove > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799248#comment-17799248 ] ASF GitHub Bot commented on HDFS-17290: --- hadoop-yetus commented on PR #6359: URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1865450057 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 46m 16s | | trunk passed | | +1 :green_heart: | compile | 18m 15s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 16m 33s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 17s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 38s | | trunk passed | | +1 :green_heart: | javadoc | 1m 11s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 50s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 36s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 50s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 53s | | the patch passed | | +1 :green_heart: | compile | 17m 54s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 54s | | the patch passed | | +1 :green_heart: | compile | 16m 41s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 16m 41s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 13s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/6/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 197 unchanged - 0 fixed = 199 total (was 197) | | +1 :green_heart: | mvnsite | 1m 37s | | the patch passed | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 40s | | the patch passed | | +1 :green_heart: | shadedclient | 39m 10s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 10s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 235m 5s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6359 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux cc8ddfa74604 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2d761dddf33cf482c42d003e574afd858860d456 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/6/testReport/ | | Max.
[jira] [Commented] (HDFS-17291) DataNode metric bytesWritten is not totally accurate in some situations.
[ https://issues.apache.org/jira/browse/HDFS-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799080#comment-17799080 ] ASF GitHub Bot commented on HDFS-17291: --- hadoop-yetus commented on PR #6360: URL: https://github.com/apache/hadoop/pull/6360#issuecomment-1864737180 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 23s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 33s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 37s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 35s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 42s | | trunk passed | | +1 :green_heart: | javadoc | 0m 36s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 48s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 46s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 35s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 29s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 39s | | the patch passed | | +1 :green_heart: | javadoc | 0m 30s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 56s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 44s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 47s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 198m 15s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6360/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 27s | | The patch does not generate ASF License warnings. | | | | 291m 46s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile | | | hadoop.hdfs.server.namenode.TestAuditLogger | | | hadoop.hdfs.server.namenode.TestRefreshBlockPlacementPolicy | | | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6360/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6360 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 0bb4b257c314 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 05e31cdc69015aaac8a95a8687c931ed5aa55624 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions |
[jira] [Commented] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.
[ https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799054#comment-17799054 ] ASF GitHub Bot commented on HDFS-17289: --- hadoop-yetus commented on PR #6357: URL: https://github.com/apache/hadoop/pull/6357#issuecomment-1864644282 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 38s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 19m 17s | | trunk passed | | +1 :green_heart: | compile | 2m 50s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 2m 51s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 43s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 18s | | trunk passed | | +1 :green_heart: | javadoc | 1m 4s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 30s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 3s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 49s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 20s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 4s | | the patch passed | | +1 :green_heart: | compile | 2m 59s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 2m 59s | | the patch passed | | +1 :green_heart: | compile | 2m 47s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 2m 47s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 38s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 8s | | the patch passed | | +1 :green_heart: | javadoc | 0m 52s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 13s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 47s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 46s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 200m 10s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6357/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 24s | | The patch does not generate ASF License warnings. | | | | 306m 46s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStream | | | hadoop.hdfs.server.namenode.ha.TestHAAppend | | | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor | | | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.TestDFSStripedOutputStream | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6357/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6357 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 712d4d7c0d1c 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4372ca350b3de12e01b2e4b4158fd608919739b8 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions |
[jira] [Commented] (HDFS-17285) RBF: Add a safe mode check period configuration
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798986#comment-17798986 ] ASF GitHub Bot commented on HDFS-17285: --- hadoop-yetus commented on PR #6347: URL: https://github.com/apache/hadoop/pull/6347#issuecomment-1864425525 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 17m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 46m 40s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 41s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 30s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 21s | | trunk passed | | +1 :green_heart: | shadedclient | 38m 8s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 30s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 30s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 18s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | javadoc | 0m 28s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 37m 53s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 22m 51s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 177m 21s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6347 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 6fc9ed377839 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / daa3b80e837400c69e1afd929b3dabffed742d75 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/2/testReport/ | | Max. process+thread count | 2230 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6347/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. >
[jira] [Updated] (HDFS-17298) Fix NPE in DataNode.handleBadBlock and BlockSender
[ https://issues.apache.org/jira/browse/HDFS-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-17298: -- Description: There are some NPE issues on the DataNode side of our online environment. The detailed exception information is {code:java} 2023-12-20 13:58:25,449 ERROR datanode.DataNode (DataXceiver.java:run(330)) [DataXceiver for client DFSClient_NONMAPREDUCE_xxx at /xxx:41452 [Sending block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing READ_BLOCK operation src: /xxx:41452 dst: /xxx:50010 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:301) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:607) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) at java.lang.Thread.run(Thread.java:748) {code} NPE Code logic: {code:java} if (!fromScanner && blockScanner.isEnabled()) { // data.getVolume(block) is null blockScanner.markSuspectBlock(data.getVolume(block).getStorageID(), block); } {code} {code:java} 2023-12-20 13:52:18,844 ERROR datanode.DataNode (DataXceiver.java:run(330)) [DataXceiver for client /xxx:61052 [Copying block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing COPY_BLOCK operation src: /xxx:61052 dst: /xxx:50010 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.handleBadBlock(DataNode.java:4045) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1163) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) at java.lang.Thread.run(Thread.java:748) {code} NPE Code logic: {code:java} // Obtain a reference before reading data volumeRef = datanode.data.getVolume(block).obtainReference(); //datanode.data.getVolume(block) is null {code} We need to fix it. > Fix NPE in DataNode.handleBadBlock and BlockSender > -- > > Key: HDFS-17298 > URL: https://issues.apache.org/jira/browse/HDFS-17298 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > > There are some NPE issues on the DataNode side of our online environment. > The detailed exception information is > {code:java} > 2023-12-20 13:58:25,449 ERROR datanode.DataNode (DataXceiver.java:run(330)) > [DataXceiver for client DFSClient_NONMAPREDUCE_xxx at /xxx:41452 [Sending > block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing READ_BLOCK > operation src: /xxx:41452 dst: /xxx:50010 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:301) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:607) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) > at java.lang.Thread.run(Thread.java:748) > {code} > NPE Code logic: > {code:java} > if (!fromScanner && blockScanner.isEnabled()) { > // data.getVolume(block) is null > blockScanner.markSuspectBlock(data.getVolume(block).getStorageID(), > block); > } > {code} > {code:java} > 2023-12-20 13:52:18,844 ERROR datanode.DataNode (DataXceiver.java:run(330)) > [DataXceiver for client /xxx:61052 [Copying block BP-xxx:blk_xxx]] - > xxx:50010:DataXceiver error processing COPY_BLOCK operation src: /xxx:61052 > dst: /xxx:50010 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.DataNode.handleBadBlock(DataNode.java:4045) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1163) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) > at java.lang.Thread.run(Thread.java:748) > {code} > NPE Code logic: > {code:java} > // Obtain a reference before reading data > volumeRef = datanode.data.getVolume(block).obtainReference(); > //datanode.data.getVolume(block) is null >
[jira] [Created] (HDFS-17298) Fix NPE in DataNode.handleBadBlock and BlockSender
Haiyang Hu created HDFS-17298: - Summary: Fix NPE in DataNode.handleBadBlock and BlockSender Key: HDFS-17298 URL: https://issues.apache.org/jira/browse/HDFS-17298 Project: Hadoop HDFS Issue Type: Bug Reporter: Haiyang Hu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798951#comment-17798951 ] ASF GitHub Bot commented on HDFS-17297: --- haiyang1987 commented on PR #6369: URL: https://github.com/apache/hadoop/pull/6369#issuecomment-1864355241 Hi @ayushtkn @Hexiaoqiao @ZanderXu @zhangshuyan0 @tomscut Could you please help me review this pr when you have free time ? Thank you very much. > The NameNode should remove block from the BlocksMap if the block is marked as > deleted. > -- > > Key: HDFS-17297 > URL: https://issues.apache.org/jira/browse/HDFS-17297 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When call internalReleaseLease method: > {code:java} > boolean internalReleaseLease( > ... > int minLocationsNum = 1; > if (lastBlock.isStriped()) { > minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); > } > if (uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0) { > // There is no datanode reported to this block. > // may be client have crashed before writing data to pipeline. > // This blocks doesn't need any recovery. > // We can remove this block and close the file. > pendingFile.removeLastBlock(lastBlock); > finalizeINodeFileUnderConstruction(src, pendingFile, > iip.getLatestSnapshotId(), false); > ... > } > {code} > if the condition `uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY > logic, the block is removed from the block list in the inode file and marked > as deleted. > However it is not removed from the BlocksMap, it may cause memory leak. > Therefore it is necessary to remove the block from the BlocksMap at this > point as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17291) DataNode metric bytesWritten is not totally accurate in some situations.
[ https://issues.apache.org/jira/browse/HDFS-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798904#comment-17798904 ] ASF GitHub Bot commented on HDFS-17291: --- zhangshuyan0 commented on code in PR #6360: URL: https://github.com/apache/hadoop/pull/6360#discussion_r1432538651 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java: ## @@ -747,7 +747,6 @@ private int receivePacket() throws IOException { // Actual number of data bytes to write. int numBytesToDisk = (int)(offsetInBlock-onDiskLen); - Review Comment: Please undo this. > DataNode metric bytesWritten is not totally accurate in some situations. > > > Key: HDFS-17291 > URL: https://issues.apache.org/jira/browse/HDFS-17291 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > > As the title described, dataNode metric bytesWritten is not totally accurate > in some situations, such as failure recovery, re-send data. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.
[ https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798897#comment-17798897 ] ASF GitHub Bot commented on HDFS-17289: --- hfutatzhanghb commented on PR #6357: URL: https://github.com/apache/hadoop/pull/6357#issuecomment-1864187912 @zhangshuyan0 Sir, Thanks a lot for your reviewing. Will fix it using `try wtih` soonly. > Considering the size of non-lastBlocks equals to complete block size can > cause append failure. > -- > > Key: HDFS-17289 > URL: https://issues.apache.org/jira/browse/HDFS-17289 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.
[ https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798894#comment-17798894 ] ASF GitHub Bot commented on HDFS-17289: --- zhangshuyan0 commented on PR #6357: URL: https://github.com/apache/hadoop/pull/6357#issuecomment-1864184821 LGTM +1. Need to fix checkstyle. > Considering the size of non-lastBlocks equals to complete block size can > cause append failure. > -- > > Key: HDFS-17289 > URL: https://issues.apache.org/jira/browse/HDFS-17289 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17285) RBF: Add a safe mode check period configuration
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798889#comment-17798889 ] ASF GitHub Bot commented on HDFS-17285: --- LiuGuH commented on code in PR #6347: URL: https://github.com/apache/hadoop/pull/6347#discussion_r1432498292 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RBFConfigKeys.java: ## @@ -279,6 +279,10 @@ public class RBFConfigKeys extends CommonConfigurationKeysPublic { FEDERATION_ROUTER_PREFIX + "safemode.expiration"; public static final long DFS_ROUTER_SAFEMODE_EXPIRATION_DEFAULT = 3 * DFS_ROUTER_CACHE_TIME_TO_LIVE_MS_DEFAULT; + public static final String DFS_ROUTER_SAFEMODE_CHECKPERIOD = Review Comment: @slfan1989 Thanks for your advise. Add MS for variable. > RBF: Add a safe mode check period configuration > --- > > Key: HDFS-17285 > URL: https://issues.apache.org/jira/browse/HDFS-17285 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Minor > Labels: pull-request-available > > When dfsrouter start, it enters safe mode. And it will cost 1min to leave. > The log is blow: > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave > startup safe mode after 3 ms > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter > safe mode after 18 ms without reaching the State Store > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Entering safe mode > 14:35:24,996 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Delaying safemode exit for 28721 milliseconds... > 14:36:25,037 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Leaving safe mode after 61319 milliseconds > It depends on these configs. > DFS_ROUTER_SAFEMODE_EXTENSION 30s > DFS_ROUTER_SAFEMODE_EXPIRATION 3min > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) > Because in safe mode dfsrouter will reject write requests, so it should be > shorter in check period if refreshCaches is done. And we should remove > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.
[ https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798862#comment-17798862 ] ASF GitHub Bot commented on HDFS-17292: --- huangzhaobo99 commented on PR #6364: URL: https://github.com/apache/hadoop/pull/6364#issuecomment-1864057960 Hi! @ayushtkn @slfan1989 @tomscut,I have repushed this PR, Please help review this pr when you are available. Thanks very much. > Show the number of times the slowPeerCollectorDaemon thread has collected > SlowNodes. > > > Key: HDFS-17292 > URL: https://issues.apache.org/jira/browse/HDFS-17292 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17292) Show the number of times the slowPeerCollectorDaemon thread has collected SlowNodes.
[ https://issues.apache.org/jira/browse/HDFS-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798859#comment-17798859 ] ASF GitHub Bot commented on HDFS-17292: --- hadoop-yetus commented on PR #6364: URL: https://github.com/apache/hadoop/pull/6364#issuecomment-1864042970 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 31s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 33s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 31m 30s | | trunk passed | | +1 :green_heart: | compile | 5m 42s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 6m 4s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 22s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 58s | | trunk passed | | +1 :green_heart: | javadoc | 1m 36s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 10s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 4m 39s | | trunk passed | | +1 :green_heart: | shadedclient | 36m 53s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 32s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 51s | | the patch passed | | +1 :green_heart: | compile | 5m 52s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 5m 52s | | the patch passed | | +1 :green_heart: | compile | 5m 55s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 5m 55s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 15s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 48s | | the patch passed | | +1 :green_heart: | javadoc | 1m 23s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 54s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 5m 2s | | the patch passed | | +1 :green_heart: | shadedclient | 37m 24s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 212m 4s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | unit | 22m 8s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 409m 9s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/7/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6364 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 26286a49a545 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / e3fb4883e6f03da8d7cd22062eb88b83e7194d6e | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/7/testReport/ | | Max. process+thread count | 3283 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6364/7/console | |