[ https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230029#comment-15230029 ]
Alina Abramova commented on TEZ-3074: ------------------------------------- After my investigating work I see that cause of this issue is simultaneously writing to file which splits trying to get Tez. When files start being read before they complete writing and/or splits calculation finishes. Do splits are calculated separately from reading files in Tez? > Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while > working with Tez > ---------------------------------------------------------------------------------------- > > Key: TEZ-3074 > URL: https://issues.apache.org/jira/browse/TEZ-3074 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.5.3 > Reporter: Oleksiy Sayankin > Fix For: 0.5.3 > > Attachments: tempsource.data > > > *STEP 1. Install and configure Tez on yarn* > *STEP 2. Configure hive for tez* > *STEP 3. Create test tables in Hive and fill it with data* > Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart > Hive. > {code:xml} > <!-- DYNAMIC PARTITION --> > <property> > <name>hive.exec.dynamic.partition</name> > <value>true</value> > </property> > <property> > <name>hive.exec.dynamic.partition.mode</name> > <value>nonstrict</value> > </property> > <property> > <name>hive.exec.max.dynamic.partitions.pernode</name> > <value>2000</value> > </property> > <property> > <name>hive.exec.max.dynamic.partitions</name> > <value>2000</value> > </property> > {code} > Execute in command line > {code} > hadoop fs -put tempsource.data / > {code} > Execute in command line. Use attached file {{tempsource.data}} > {code} > hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ','; > hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW > FORMAT DELIMITED FIELDS TERMINATED BY ','; > hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT > DELIMITED FIELDS TERMINATED BY ','; > hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource; > hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM > tempsource; > {code} > *STEP 4. Mount NFS on cluster* > *STEP 5. Run teragen test application* > Use separate console > {code} > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar > teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 > -Dmapreduce.map.cpu.vcores=0 1000000000 /user/hdfs/input > {code} > *STEP 6. Create many test files* > Use separate console > {code} > cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66 > for i in `seq 1 10000`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1; > done > {code} > *STEP 7. Run the following query repeatedly in other console* > Use separate console > {code} > hive> insert overwrite table test3 select x,y from ( select x,y,z from > (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z > from ptest1 where x > 5 and x < 1000) a)b; > {code} > After some time of working it gives an exception. > {noformat} > Status: Failed > Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01, > diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed, > vertex=vertex_1443452487059_0426_1_01 [Map 3], > java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395) > at > org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > ] > Vertex killed, vertexName=Map 1, vertexId=vertex_1443452487059_0426_1_00, > diagnostics=[Vertex received Kill in INITED state., Vertex > vertex_1443452487059_0426_1_00 [Map 1] killed/failed due to:null] > DAG failed due to vertex failure. failedVertices:1 killedVertices:1 > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)