[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17636765#comment-17636765 ] Erik Krogen commented on HDFS-16732: Note that a bug with this change was reported, and now fixed, in HDFS-16832. > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > ... 4 more {code} > As describe in MAPREDUCE-7082, when the block is missing, then will throw > this exception, but my cluster had no missing block. > In this example, I found getListing return location information. When block > report of observer is delayed, will return the block without location. > HDFS-13924 is introduce to solve this problem, but only consider > getBlockLocations. > In observer node, all method which may return location should check whether > locations is empty or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584962#comment-17584962 ] ASF GitHub Bot commented on HDFS-16732: --- xkrogen commented on PR #4756: URL: https://github.com/apache/hadoop/pull/4756#issuecomment-1227581149 New test is very clean :) Many thanks for the contribution @zhengchenyu ! I've merged this to trunk and branch-3.3. > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > ... 4 more {code} > As describe in MAPREDUCE-7082, when the block is missing, then will throw > this exception, but my cluster had no missing block. > In this example, I found getListing return location information. When block > report of observer is delayed, will return the block without location. > HDFS-13924 is introduce to solve this problem, but only consider > getBlockLocations. > In observer node, all method which may return location should check whether > locations is empty or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584955#comment-17584955 ] ASF GitHub Bot commented on HDFS-16732: --- xkrogen merged PR #4756: URL: https://github.com/apache/hadoop/pull/4756 > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > ... 4 more {code} > As describe in MAPREDUCE-7082, when the block is missing, then will throw > this exception, but my cluster had no missing block. > In this example, I found getListing return location information. When block > report of observer is delayed, will return the block without location. > HDFS-13924 is introduce to solve this problem, but only consider > getBlockLocations. > In observer node, all method which may return location should check whether > locations is empty or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584779#comment-17584779 ] ASF GitHub Bot commented on HDFS-16732: --- hadoop-yetus commented on PR #4756: URL: https://github.com/apache/hadoop/pull/4756#issuecomment-1227113493 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 33s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 40m 27s | | trunk passed | | +1 :green_heart: | compile | 1m 46s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 39s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 23s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 44s | | trunk passed | | +1 :green_heart: | javadoc | 1m 22s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 45s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 8s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 54s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 33s | | the patch passed | | +1 :green_heart: | compile | 1m 38s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 38s | | the patch passed | | +1 :green_heart: | compile | 1m 26s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 26s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 4s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 32s | | the patch passed | | +1 :green_heart: | javadoc | 1m 3s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 36s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 38s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 27s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 248m 54s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 1s | | The patch does not generate ASF License warnings. | | | | 369m 7s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4756 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux bcf6b87183ae 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / bae43f8dba103156707a6d077f13e32dc8a5b551 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/4/testReport/ | | Max. process+thread count | 3545 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/4/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > [SBN READ]
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584607#comment-17584607 ] ASF GitHub Bot commented on HDFS-16732: --- zhengchenyu commented on code in PR #4756: URL: https://github.com/apache/hadoop/pull/4756#discussion_r954499783 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestObserverNodeWhenReportDelay.java: ## @@ -0,0 +1,153 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.namenode.ha; + +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_STATE_CONTEXT_ENABLED_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_REPLICATION_KEY; +import static org.apache.hadoop.hdfs.server.namenode.NameNodeAdapter.getServiceState; +import static org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.OBSERVER_PROBE_RETRY_PERIOD_KEY; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.io.IOException; +import java.util.List; + +import org.apache.commons.lang3.ArrayUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.ha.HAServiceProtocol.HAServiceState; +import org.apache.hadoop.hdfs.DistributedFileSystem; +import org.apache.hadoop.hdfs.MiniDFSCluster; +import org.apache.hadoop.hdfs.protocol.DirectoryListing; +import org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus; +import org.apache.hadoop.hdfs.protocol.LocatedBlock; +import org.apache.hadoop.hdfs.protocol.LocatedBlocks; +import org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster; +import org.apache.hadoop.hdfs.server.datanode.DataNode; +import org.junit.After; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TestObserverNodeWhenReportDelay { Review Comment: Thanks for carefully review, the new test is overkill indeed. I tested, testObserverNodeBlockMissingRetry can also reproduce this bug with below code. ``` dfs.getClient().listPaths("/", new byte[0], true); assertSentTo(0); dfs.getClient().getLocatedFileInfo(testPath.toString(), false); assertSentTo(0); ``` > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at >
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584604#comment-17584604 ] ASF GitHub Bot commented on HDFS-16732: --- zhengchenyu commented on code in PR #4756: URL: https://github.com/apache/hadoop/pull/4756#discussion_r954478351 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestObserverNodeWhenReportDelay.java: ## @@ -0,0 +1,153 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.namenode.ha; + +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_STATE_CONTEXT_ENABLED_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_REPLICATION_KEY; +import static org.apache.hadoop.hdfs.server.namenode.NameNodeAdapter.getServiceState; +import static org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.OBSERVER_PROBE_RETRY_PERIOD_KEY; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.io.IOException; +import java.util.List; + +import org.apache.commons.lang3.ArrayUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.ha.HAServiceProtocol.HAServiceState; +import org.apache.hadoop.hdfs.DistributedFileSystem; +import org.apache.hadoop.hdfs.MiniDFSCluster; +import org.apache.hadoop.hdfs.protocol.DirectoryListing; +import org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus; +import org.apache.hadoop.hdfs.protocol.LocatedBlock; +import org.apache.hadoop.hdfs.protocol.LocatedBlocks; +import org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster; +import org.apache.hadoop.hdfs.server.datanode.DataNode; +import org.junit.After; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TestObserverNodeWhenReportDelay { Review Comment: I add some new config in TestObserverNodeWhenReportDelay, I worried about affect other unit test in TestObserverNode. I will try to add this new test in TestObserverNode. > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at >
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584585#comment-17584585 ] ASF GitHub Bot commented on HDFS-16732: --- zhengchenyu commented on code in PR #4756: URL: https://github.com/apache/hadoop/pull/4756#discussion_r954478351 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestObserverNodeWhenReportDelay.java: ## @@ -0,0 +1,153 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.namenode.ha; + +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_STATE_CONTEXT_ENABLED_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_REPLICATION_KEY; +import static org.apache.hadoop.hdfs.server.namenode.NameNodeAdapter.getServiceState; +import static org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.OBSERVER_PROBE_RETRY_PERIOD_KEY; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.io.IOException; +import java.util.List; + +import org.apache.commons.lang3.ArrayUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.ha.HAServiceProtocol.HAServiceState; +import org.apache.hadoop.hdfs.DistributedFileSystem; +import org.apache.hadoop.hdfs.MiniDFSCluster; +import org.apache.hadoop.hdfs.protocol.DirectoryListing; +import org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus; +import org.apache.hadoop.hdfs.protocol.LocatedBlock; +import org.apache.hadoop.hdfs.protocol.LocatedBlocks; +import org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster; +import org.apache.hadoop.hdfs.server.datanode.DataNode; +import org.junit.After; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TestObserverNodeWhenReportDelay { Review Comment: I add some new config in TestObserverNodeWhenReportDelay, I worried about affect other unit test in TestObserverNode. I will try to add this new test in TestObserverNode. > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at >
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584507#comment-17584507 ] ASF GitHub Bot commented on HDFS-16732: --- xkrogen commented on code in PR #4756: URL: https://github.com/apache/hadoop/pull/4756#discussion_r954390770 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestObserverNodeWhenReportDelay.java: ## @@ -0,0 +1,153 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.namenode.ha; + +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_STATE_CONTEXT_ENABLED_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_REPLICATION_KEY; +import static org.apache.hadoop.hdfs.server.namenode.NameNodeAdapter.getServiceState; +import static org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.OBSERVER_PROBE_RETRY_PERIOD_KEY; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +import java.io.IOException; +import java.util.List; + +import org.apache.commons.lang3.ArrayUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.ha.HAServiceProtocol.HAServiceState; +import org.apache.hadoop.hdfs.DistributedFileSystem; +import org.apache.hadoop.hdfs.MiniDFSCluster; +import org.apache.hadoop.hdfs.protocol.DirectoryListing; +import org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus; +import org.apache.hadoop.hdfs.protocol.LocatedBlock; +import org.apache.hadoop.hdfs.protocol.LocatedBlocks; +import org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster; +import org.apache.hadoop.hdfs.server.datanode.DataNode; +import org.junit.After; +import org.junit.AfterClass; +import org.junit.Assert; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TestObserverNodeWhenReportDelay { Review Comment: Can we just add new tests in `TestObserverNode` similar to `testObserverNodeBlockMissingRetry`? I am wondering if this new test might be overkill. > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at >
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582235#comment-17582235 ] ASF GitHub Bot commented on HDFS-16732: --- hadoop-yetus commented on PR #4756: URL: https://github.com/apache/hadoop/pull/4756#issuecomment-1221327075 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 58s | | trunk passed | | +1 :green_heart: | compile | 1m 36s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 27s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 35s | | trunk passed | | +1 :green_heart: | javadoc | 1m 15s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 42s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 45s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 37s | | the patch passed | | +1 :green_heart: | compile | 1m 36s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 36s | | the patch passed | | +1 :green_heart: | compile | 1m 25s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 8s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 36s | | the patch passed | | +1 :green_heart: | javadoc | 1m 2s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 41s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 43s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 253m 2s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 4s | | The patch does not generate ASF License warnings. | | | | 370m 2s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4756 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 0ece423b518d 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 140b9e2fb3d9b12f81ae575a0119c5f9f156af69 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/3/testReport/ | | Max. process+thread count | 3005 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/3/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > [SBN READ]
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582143#comment-17582143 ] ASF GitHub Bot commented on HDFS-16732: --- zhengchenyu commented on code in PR #4756: URL: https://github.com/apache/hadoop/pull/4756#discussion_r950658450 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java: ## @@ -3470,6 +3471,16 @@ HdfsFileStatus getFileInfo(final String src, boolean resolveLink, logAuditEvent(false, operationName, src); throw e; } +if (needLocation && haEnabled && haContext != null && +haContext.getState().getServiceState() == OBSERVER && +stat instanceof HdfsLocatedFileStatus) { + LocatedBlocks lbs = ((HdfsLocatedFileStatus) stat).getLocatedBlocks(); + for (LocatedBlock b : lbs.getLocatedBlocks()) { +if (b.getLocations() == null || b.getLocations().length == 0) { + throw new ObserverRetryOnActiveException("Zero blocklocations for " + src); +} + } +} Review Comment: > Can we pull this into a common method like: > > ```java > private void checkBlockLocationsIfObserver(Iterator blocksIter) throws ObserverRetryOnActive { > if (haEnabled && haContext != null && haContext.getState().getServiceState() == OBSERVER) { > ... > } > } > ``` > > or two methods like > > ```java > private boolean isObserver() { return haEnabled && haContext != null && haContext.getState().getServiceState() == OBSERVER; } > private void checkBlockLocationsForObserver(LocatedBlocks blocks) throws ObserverRetryOnActive { ... } > ``` > > Point being that we have 3 places with almost identical logic here, we should try to consolidate. good idea! > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at >
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581969#comment-17581969 ] ASF GitHub Bot commented on HDFS-16732: --- xkrogen commented on code in PR #4756: URL: https://github.com/apache/hadoop/pull/4756#discussion_r950420437 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java: ## @@ -3470,6 +3471,16 @@ HdfsFileStatus getFileInfo(final String src, boolean resolveLink, logAuditEvent(false, operationName, src); throw e; } +if (needLocation && haEnabled && haContext != null && +haContext.getState().getServiceState() == OBSERVER && +stat instanceof HdfsLocatedFileStatus) { + LocatedBlocks lbs = ((HdfsLocatedFileStatus) stat).getLocatedBlocks(); + for (LocatedBlock b : lbs.getLocatedBlocks()) { +if (b.getLocations() == null || b.getLocations().length == 0) { + throw new ObserverRetryOnActiveException("Zero blocklocations for " + src); +} + } +} Review Comment: Can we pull this into a common method like: ```java private void checkBlockLocationsIfObserver(Iterator blocksIter) throws ObserverRetryOnActive { if (haEnabled && haContext != null && haContext.getState().getServiceState() == OBSERVER) { ... } } ``` or two methods like ```java private boolean isObserver() { return haEnabled && haContext != null && haContext.getState().getServiceState() == OBSERVER; } private void checkBlockLocationsForObserver(LocatedBlocks blocks) throws ObserverRetryOnActive { ... } ``` Point being that we have 3 places with almost identical logic here, we should try to consolidate. > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581765#comment-17581765 ] ASF GitHub Bot commented on HDFS-16732: --- hadoop-yetus commented on PR #4756: URL: https://github.com/apache/hadoop/pull/4756#issuecomment-1220516674 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 35s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 31s | | trunk passed | | +1 :green_heart: | compile | 1m 46s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 34s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 16s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 47s | | trunk passed | | +1 :green_heart: | javadoc | 1m 13s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 39s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 34s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 16s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 28s | | the patch passed | | +1 :green_heart: | compile | 1m 29s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 29s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 59s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 27s | | the patch passed | | +1 :green_heart: | javadoc | 0m 55s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 23s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 253m 23s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 14s | | The patch does not generate ASF License warnings. | | | | 364m 52s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4756 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 4e3da7d52423 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / c54f61799d318ee7b22e0c159576c054dbdd66e8 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/2/testReport/ | | Max. process+thread count | 3480 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > [SBN READ]
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581645#comment-17581645 ] zhengchenyu commented on HDFS-16732: [~sunchao] [~xkrogen] [~zero45] Can you please review this? > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > Labels: pull-request-available > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > ... 4 more {code} > As describe in MAPREDUCE-7082, when the block is missing, then will throw > this exception, but my cluster had no missing block. > In this example, I found getListing return location information. When block > report of observer is delayed, will return the block without location. > HDFS-13924 is introduce to solve this problem, but only consider > getBlockLocations. > In observer node, all method which may return location should check whether > locations is empty or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581365#comment-17581365 ] ASF GitHub Bot commented on HDFS-16732: --- hadoop-yetus commented on PR #4756: URL: https://github.com/apache/hadoop/pull/4756#issuecomment-1219507930 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | -1 :x: | mvninstall | 38m 5s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/1/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | +1 :green_heart: | compile | 1m 41s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 38s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 43s | | trunk passed | | +1 :green_heart: | javadoc | 1m 23s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 46s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 48s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 0s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 1m 28s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 28s | | the patch passed | | +1 :green_heart: | compile | 1m 22s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 22s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 0s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 111 unchanged - 0 fixed = 112 total (was 111) | | +1 :green_heart: | mvnsite | 1m 26s | | the patch passed | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 32s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 52s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 240m 24s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 13s | | The patch does not generate ASF License warnings. | | | | 353m 7s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4756 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 43d62b41e7d7 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 187fe453d034f81252f5829071521ed02725ce0d | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4756/1/testReport/ | | Max. process+thread count |
[jira] [Commented] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.
[ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581210#comment-17581210 ] ASF GitHub Bot commented on HDFS-16732: --- zhengchenyu opened a new pull request, #4756: URL: https://github.com/apache/hadoop/pull/4756 https://issues.apache.org/jira/browse/HDFS-16732 > [SBN READ] Avoid get location from observer when the block report is delayed. > - > > Key: HDFS-16732 > URL: https://issues.apache.org/jira/browse/HDFS-16732 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Critical > > Hive on tez application fail occasionally after observer is enable, log show > below. > {code:java} > 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] > |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, > vertex=vertex_1660618571916_4839_1_00 [Map 1] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329) > at > com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056) > at > com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) > at > com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138) > at > com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958) > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > ... 4 more {code} > As describe in MAPREDUCE-7082, when the block is missing, then will throw > this exception, but my cluster had no missing block. > In this example, I found getListing return location information. When block > report of observer is delayed, will return the block without location. > HDFS-13924 is introduce to solve this problem, but only consider > getBlockLocations. > In observer node, all method which may return location should check whether > locations is empty or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: