[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-04-07 Thread Alina Abramova (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230029#comment-15230029
 ] 

Alina Abramova commented on TEZ-3074:
-

After my investigating work I see that cause of this issue is simultaneously 
writing to file which splits trying to get Tez. When files start being read 
before they complete writing and/or splits calculation finishes. 

Do splits are calculated separately from reading files in Tez?



> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ]
> 

[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-02-10 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141602#comment-15141602
 ] 

Siddharth Seth commented on TEZ-3074:
-

Are these client nodes ? Do you mean configuring hive to run with MR when 
launched from Node3 and Node4, and running Hive with Tez when running from 
Node1 and Node2 ?
It's simple enough to change the configuration on a single node - or within the 
Hive script to run with either MR on Tez.

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 

[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-02-10 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140692#comment-15140692
 ] 

Oleksiy Sayankin commented on TEZ-3074:
---

Siddharth, is acceptable to configure cluster with Tez in following manner:

* Node 1. Configured for Tez
* Node 2. Configured for Tez
* Node 3. Configured for MapReduce
* Node 4. Configured for MapReduce

and then run job in following configuration?

PS: I remember your questions, our tests team preparing logs for you.

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 

[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122115#comment-15122115
 ] 

Siddharth Seth commented on TEZ-3074:
-

These methods aren't used by Hive or Tez when AM split generation is enabled. 
They're primarily for client side split generation. The trace shows AM split 
generation being used.
What's happening here is that Hive submits a payload which contains the paths 
to the Tez AM. It then runs the Hive Split Generator - which actually invokes 
getSplits. These splits are sent to tasks via RPC. localFiles are not used 
anywhere in the process.
The ideal place to log this would be FileInputFormat itself. Tez includes a 
version of hadoop-mapreduce jars in it's assembly. So this would involve 
recompiling hadoop, and rebuilding Tez with the new hadoop bits.
You could also try fetching the list of files for which splits are being 
generated by logging conf.get("mapred.input.dir") in HiveInputFormat before it 
invokes getSplits. Alternately invoke inputFormat.getInputPaths in 
HiveInputFormat.

Another possible option to try is to set 
"mapreduce.input.fileinputformat.list-status.num-threads" - which will cause 
the splits to be generated in a single thread in FileInputFormat. This is the 
default behaviour though.

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> 

[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-28 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121261#comment-15121261
 ] 

Oleksiy Sayankin commented on TEZ-3074:
---

Yes, turning off Tez and using just MapReduce fixes the issue. But our customer 
wants to use Tez to speed up Hive queries. 

Actually, these steps only simulate production cluster behavior, but not the 
exactly the same. They were found by our support team. To figure out what is 
going on with block location and why blkLocations.length = 0, we have added 
logging statements into Tes sources. Here they are:

{code:title=org.apache.tez.dag.api.DAG.java|borderStyle=solid}
  public synchronized DAG addTaskLocalFiles(Map 
localFiles) {
Preconditions.checkNotNull(localFiles);
logLocalFiles(localFiles);
logCommonTaskLocalFiles(commonTaskLocalFiles);
TezCommonUtils.addAdditionalLocalResources(localFiles, 
commonTaskLocalFiles, "DAG " + getName());
return this;
  }

  private static void logLocalFiles(Map localFiles){
LOG.info("###@@@ localFiles:");
 for(Map.Entry entry : localFiles.entrySet()){
   String key = entry.getKey();
   LocalResource localRecourse = entry.getValue();
   LOG.info("###@@@001 key = " + key + ", localRecourse.getSize() = " + 
localRecourse.getSize() + ", localRecourse.getType() = " + 
localRecourse.getType() + ", localRecourse.getVisibility() = " + 
localRecourse.getVisibility());
}
  }

  private static void logCommonTaskLocalFiles(Map 
commonTaskLocalFiles){
LOG.info("###@@@ commonTaskLocalFiles:");
for(Map.Entry entry : 
commonTaskLocalFiles.entrySet()){
  String key = entry.getKey();
  LocalResource localRecourse = entry.getValue();
  LOG.info("###@@@002 key = " + key + ", localRecourse.getSize() = " + 
localRecourse.getSize() + ", localRecourse.getType() = " + 
localRecourse.getType() + ", localRecourse.getVisibility() = " + 
localRecourse.getVisibility());
}
  }
{code}

and 

{code:title=org.apache.tez.mapreduce.hadoop.MRInputHelpers.java|borderStyle=solid}
  private static void updateLocalResourcesForInputSplits(
  FileSystem fs,
  InputSplitInfo inputSplitInfo,
  Map localResources) throws IOException {
if (localResources.containsKey(JOB_SPLIT_RESOURCE_NAME)) {
  throw new RuntimeException("LocalResources already contains a"
  + " resource named " + JOB_SPLIT_RESOURCE_NAME);
}
if (localResources.containsKey(JOB_SPLIT_METAINFO_RESOURCE_NAME)) {
  throw new RuntimeException("LocalResources already contains a"
  + " resource named " + JOB_SPLIT_METAINFO_RESOURCE_NAME);
}

LOG.info("###@@@003 inputSplitInfo.getSplitsFile() = " + 
inputSplitInfo.getSplitsFile());
{code}

But it gave nothing. Exception happened before any tag 
{noformat}###@@@{noformat} was printed out.

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate 

[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-26 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117590#comment-15117590
 ] 

Hitesh Shah commented on TEZ-3074:
--

\cc [~hagleitn] [~sseth]

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ]
> Vertex killed, vertexName=Map 1, vertexId=vertex_1443452487059_0426_1_00,
> diagnostics=[Vertex received Kill in INITED state., Vertex
> vertex_1443452487059_0426_1_00 [Map 1] killed/failed due to:null]
> DAG failed due to vertex failure. failedVertices:1 

[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-26 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117588#comment-15117588
 ] 

Hitesh Shah commented on TEZ-3074:
--

What version of Hive was this run against? 

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ]
> Vertex killed, vertexName=Map 1, vertexId=vertex_1443452487059_0426_1_00,
> diagnostics=[Vertex received Kill in INITED state., Vertex
> vertex_1443452487059_0426_1_00 [Map 1] killed/failed due to:null]
> DAG failed due to vertex failure. 

[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-26 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117653#comment-15117653
 ] 

Oleksiy Sayankin commented on TEZ-3074:
---

Hive-1.0

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ]
> Vertex killed, vertexName=Map 1, vertexId=vertex_1443452487059_0426_1_00,
> diagnostics=[Vertex received Kill in INITED state., Vertex
> vertex_1443452487059_0426_1_00 [Map 1] killed/failed due to:null]
> DAG failed due to vertex failure. failedVertices:1 killedVertices:1
> 

[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-26 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118393#comment-15118393
 ] 

Siddharth Seth commented on TEZ-3074:
-

[~osayankin] - this looks to be an issue in FileInputFormat in MapReduce 
itself. BlockLocations being an empty array would trigger this at 
{code}BlockLocation last = blkLocations[blkLocations.length -1];{code}
Have you tried running this with MR ?

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ]
> Vertex killed, vertexName=Map 1,