[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122115#comment-15122115
 ] 

Siddharth Seth commented on TEZ-3074:
-

These methods aren't used by Hive or Tez when AM split generation is enabled. 
They're primarily for client side split generation. The trace shows AM split 
generation being used.
What's happening here is that Hive submits a payload which contains the paths 
to the Tez AM. It then runs the Hive Split Generator - which actually invokes 
getSplits. These splits are sent to tasks via RPC. localFiles are not used 
anywhere in the process.
The ideal place to log this would be FileInputFormat itself. Tez includes a 
version of hadoop-mapreduce jars in it's assembly. So this would involve 
recompiling hadoop, and rebuilding Tez with the new hadoop bits.
You could also try fetching the list of files for which splits are being 
generated by logging conf.get("mapred.input.dir") in HiveInputFormat before it 
invokes getSplits. Alternately invoke inputFormat.getInputPaths in 
HiveInputFormat.

Another possible option to try is to set 
"mapreduce.input.fileinputformat.list-status.num-threads" - which will cause 
the splits to be generated in a single thread in FileInputFormat. This is the 
default behaviour though.

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate console
> {code}
> cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
> for i in `seq 1 1`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
> done
> {code}
> *STEP 7. Run the following query repeatedly in other console*
> Use separate console
> {code}
> hive> insert overwrite table test3 select x,y from ( select x,y,z from 
> (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z 
> from ptest1 where x > 5 and x < 1000) a)b;
> {code}
> After some time of working it gives an exception.
> {noformat}
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
> diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
> vertex=vertex_1443452487059_0426_1_01 [Map 3],
> java.lang.ArrayIndexOutOfBoundsException: -1
> at
> org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
> at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at
> 

[jira] [Updated] (TEZ-3076) Reduce merge memory overhead to support large number of in-memory mapoutputs

2016-01-28 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3076:
-
Attachment: TEZ-3076.4.patch
TEZ-3076.4-branch-0.7.patch

> Reduce merge memory overhead to support large number of in-memory mapoutputs
> 
>
> Key: TEZ-3076
> URL: https://issues.apache.org/jira/browse/TEZ-3076
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3076.1.patch, TEZ-3076.2.patch, 
> TEZ-3076.3-branch-0.7.patch, TEZ-3076.3.patch, TEZ-3076.4-branch-0.7.patch, 
> TEZ-3076.4.patch
>
>
> Here is a typical stack trace, though sometimes it occurs with final merge 
> (since in-memory segment overhead > mapout overhead)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at org.apache.hadoop.io.DataInputBuffer.(DataInputBuffer.java:68)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.(InMemoryReader.java:42)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.createInMemorySegments(MergeManager.java:837)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.access$200(MergeManager.java:75)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$InMemoryMerger.merge(MergeManager.java:642)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
> Details
>   around 1,000,000 spills were fetched committing around 100MB to the memory 
> budget (500,000 in memory). However, actual memory used for 500,000 segments 
> (50-350 bytes) is 480MB (expected 100-200MB)
> Mapout overhead is not budgeted
>   Each mapoutput needs around 50 bytes in addition to the data
> In Memory Segment overhead is not budgeted
>   Each In memory segment needs around 80 bytes in addition to the data
> Interaction with auto reduce parallelism
>   In this scenario, the upstream vertex was assuming 999 (pig's default hint 
> to use auto-reduce parallelism) downstream tasks. However, was reduced to 24 
> due to auto-reduce parallelism. This is putting 40 times more segments per 
> downstream task. Should auto-reduce parallelism consider merge overhead when 
> calculating parallelism?
> Legacy Default Sorter Empty Segment
>   Default sorter does not optimize empty segments like pipeline sorter does 
> and shows this symptom more.
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.MergeManager|: closeInMemoryFile - map-output of size: 
> 116, inMemoryMapOutputs.size() - 571831, commitMemory - 91503730, 
> usedMemory -91503846, mapOutput=MapOutput( AttemptIdentifier: 
> InputAttemptIdentifier [inputIdentifier=Input
> Identifier [inputIndex=763962], attemptNumber=0, 
> pathComponent=attempt_1444791925832_10460712_1_00_017766_0_10003, 
> spillType=0, spillId=-1], Type: MEMORY)
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: Completed fetch for attempt: {763962, 0, 
> attempt_1444791925832_10460712_1_00_017766_0_10003} to MEMORY, csize=128, 
> dsize=116, EndTime=1452426361208, TimeTaken=0, Rate=0.00 MB/s
> 2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: scope_601: All inputs fetched for input 
> vertex : scope-601
> 2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: copy(1091856 (spillsFetched=1091856) of 
> 1091856. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.68 
> MB/s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3078) Provide a mechanism for AM to let Client know about the reason for failure

2016-01-28 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created TEZ-3078:
--

 Summary: Provide a mechanism for AM to let Client know about the 
reason for failure
 Key: TEZ-3078
 URL: https://issues.apache.org/jira/browse/TEZ-3078
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.8.3
Reporter: Prasanth Jayachandran


When working on HIVE-12959 for LLAP, the requirement is when we submit a query 
to LLAP task scheduler and if there are no LLAP daemons we should fail the 
query instead of waiting indefinitely for daemons to show up. For this to work, 
the task scheduler has to provide a mechanism to let the AM know that the 
scheduler service cannot proceed further as there are no daemons running. 
Currently there is no way for the task scheduler to let AM know about this 
information. The only way right now is to send back exception using 
TaskSchedulerContext.onError() API. This will kill the AM but AM will restart 
to recover the DAG. It will be better if there a way to let AM know about 
daemon status via some status response based on which AM should avoid 
restarting. It will be even better if we can provide a way for AM to 
communicate this information back to the client (hive CLI or HiveServer2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3078) Provide a mechanism for AM to let Client know about the reason for failure

2016-01-28 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122449#comment-15122449
 ] 

Prasanth Jayachandran commented on TEZ-3078:


[~sseth] fyi..

> Provide a mechanism for AM to let Client know about the reason for failure
> --
>
> Key: TEZ-3078
> URL: https://issues.apache.org/jira/browse/TEZ-3078
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.8.3
>Reporter: Prasanth Jayachandran
>
> When working on HIVE-12959 for LLAP, the requirement is when we submit a 
> query to LLAP task scheduler and if there are no LLAP daemons we should fail 
> the query instead of waiting indefinitely for daemons to show up. For this to 
> work, the task scheduler has to provide a mechanism to let the AM know that 
> the scheduler service cannot proceed further as there are no daemons running. 
> Currently there is no way for the task scheduler to let AM know about this 
> information. The only way right now is to send back exception using 
> TaskSchedulerContext.onError() API. This will kill the AM but AM will restart 
> to recover the DAG. It will be better if there a way to let AM know about 
> daemon status via some status response based on which AM should avoid 
> restarting. It will be even better if we can provide a way for AM to 
> communicate this information back to the client (hive CLI or HiveServer2).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3076) Reduce merge memory overhead to support large number of in-memory mapoutputs

2016-01-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122231#comment-15122231
 ] 

Siddharth Seth commented on TEZ-3076:
-

I think it's better to separate the fastutil conversion to a separate patch. 
Any information on stability, correctness and speed of this library ?

- {code} completedInputSet = IntSets.synchronize(new 
IntOpenHashSet(numInputs)); {code}
I think this can be replaced with a BitSet - which would be more efficient than 
a map. Since this is Shuffle - it will receive 0/1 through maxInputs - a very 
limited set of values.

- The new DataInput implementation in InMemoryReader. This should throw an 
UnsupportedOperationException on all the methods which are not used - that'll 
help catch errors in case these are used in the future.

- MapOutput  - can we retain Type ? The overhead is a single reference - and 
this is used in almost all operations on this object.
- MapOutput - getDisk() can only be invoked once with this change. Javadocs 
need to call this out, and a check. Once again - should throw an Exception if 
it is invoked twice. Ideally, it'd be better to avoid such a change. Would 
setting up sub-classes as is done with FetchedInput help with saving memory and 
avoiding such special cases ? That would take care of the Type field as well.

The rest of the patch mostly looks good to me. Are you already running this 
patch on clusters ?

Not sure why DefaultSorter makes this worse ? It writes out a file to disk, but 
also sets the emptyPartitions flag - so the behaviour on the Shuffle side 
should be the same.

On the overall memory accounting problem (not this jira)
1. Should we start accounting for these overheads in the memory calculations ?
2. Put a hard limit on how many in-mem segments will be maintained, before 
they're aggregated to disk.
3. Eventually, get the In-Memory merge functional so that we don't maintain 
500K tiny buffers.


> Reduce merge memory overhead to support large number of in-memory mapoutputs
> 
>
> Key: TEZ-3076
> URL: https://issues.apache.org/jira/browse/TEZ-3076
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
> Attachments: TEZ-3076.1.patch, TEZ-3076.2.patch, 
> TEZ-3076.3-branch-0.7.patch, TEZ-3076.3.patch
>
>
> Here is a typical stack trace, though sometimes it occurs with final merge 
> (since in-memory segment overhead > mapout overhead)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at org.apache.hadoop.io.DataInputBuffer.(DataInputBuffer.java:68)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.(InMemoryReader.java:42)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.createInMemorySegments(MergeManager.java:837)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.access$200(MergeManager.java:75)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$InMemoryMerger.merge(MergeManager.java:642)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
> Details
>   around 1,000,000 spills were fetched committing around 100MB to the memory 
> budget (500,000 in memory). However, actual memory used for 500,000 segments 
> (50-350 bytes) is 480MB (expected 100-200MB)
> Mapout overhead is not budgeted
>   Each mapoutput needs around 50 bytes in addition to the data
> In Memory Segment overhead is not budgeted
>   Each In memory segment needs around 80 bytes in addition to the data
> Interaction with auto reduce parallelism
>   In this scenario, the upstream vertex was assuming 999 (pig's default hint 
> to use auto-reduce parallelism) downstream tasks. However, was reduced to 24 
> due to auto-reduce parallelism. This is putting 40 times more segments per 
> downstream task. Should auto-reduce parallelism consider merge overhead when 
> calculating parallelism?
> Legacy Default Sorter Empty Segment
>   Default sorter does not optimize empty segments like pipeline sorter does 
> and shows this symptom more.
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.MergeManager|: closeInMemoryFile - map-output of size: 
> 116, inMemoryMapOutputs.size() - 571831, commitMemory - 91503730, 
> usedMemory -91503846, mapOutput=MapOutput( AttemptIdentifier: 
> InputAttemptIdentifier [inputIdentifier=Input
> Identifier [inputIndex=763962], attemptNumber=0, 
> pathComponent=attempt_1444791925832_10460712_1_00_017766_0_10003, 
> spillType=0, spillId=-1], Type: MEMORY)
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: Completed fetch for attempt: {763962, 0, 
> 

[jira] [Commented] (TEZ-3076) Reduce merge memory overhead to support large number of in-memory mapoutputs

2016-01-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122550#comment-15122550
 ] 

Jonathan Eagles commented on TEZ-3076:
--

[~sseth], thanks for the review. Addressed your comments with version 4 of the 
patch. I scaled the changes way down so that the main thrust (removal of 
DataInputStream its derivative DataInputBuffer) is the focus.

You are right, my comment regarding default sorted seems misplaced here. Please 
disregard.

Linke to the performance of the apache licensed fastutil.
http://java-performance.info/hashmap-overview-jdk-fastutil-goldman-sachs-hppc-koloboke-trove-january-2015/

Please have another looked and we can get started filing some follow-ons to 
this one. I have been running this on the problem job above successfully with 
no issues.

> Reduce merge memory overhead to support large number of in-memory mapoutputs
> 
>
> Key: TEZ-3076
> URL: https://issues.apache.org/jira/browse/TEZ-3076
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3076.1.patch, TEZ-3076.2.patch, 
> TEZ-3076.3-branch-0.7.patch, TEZ-3076.3.patch, TEZ-3076.4-branch-0.7.patch, 
> TEZ-3076.4.patch
>
>
> Here is a typical stack trace, though sometimes it occurs with final merge 
> (since in-memory segment overhead > mapout overhead)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at org.apache.hadoop.io.DataInputBuffer.(DataInputBuffer.java:68)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.(InMemoryReader.java:42)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.createInMemorySegments(MergeManager.java:837)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.access$200(MergeManager.java:75)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$InMemoryMerger.merge(MergeManager.java:642)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
> Details
>   around 1,000,000 spills were fetched committing around 100MB to the memory 
> budget (500,000 in memory). However, actual memory used for 500,000 segments 
> (50-350 bytes) is 480MB (expected 100-200MB)
> Mapout overhead is not budgeted
>   Each mapoutput needs around 50 bytes in addition to the data
> In Memory Segment overhead is not budgeted
>   Each In memory segment needs around 80 bytes in addition to the data
> Interaction with auto reduce parallelism
>   In this scenario, the upstream vertex was assuming 999 (pig's default hint 
> to use auto-reduce parallelism) downstream tasks. However, was reduced to 24 
> due to auto-reduce parallelism. This is putting 40 times more segments per 
> downstream task. Should auto-reduce parallelism consider merge overhead when 
> calculating parallelism?
> Legacy Default Sorter Empty Segment
>   Default sorter does not optimize empty segments like pipeline sorter does 
> and shows this symptom more.
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.MergeManager|: closeInMemoryFile - map-output of size: 
> 116, inMemoryMapOutputs.size() - 571831, commitMemory - 91503730, 
> usedMemory -91503846, mapOutput=MapOutput( AttemptIdentifier: 
> InputAttemptIdentifier [inputIdentifier=Input
> Identifier [inputIndex=763962], attemptNumber=0, 
> pathComponent=attempt_1444791925832_10460712_1_00_017766_0_10003, 
> spillType=0, spillId=-1], Type: MEMORY)
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: Completed fetch for attempt: {763962, 0, 
> attempt_1444791925832_10460712_1_00_017766_0_10003} to MEMORY, csize=128, 
> dsize=116, EndTime=1452426361208, TimeTaken=0, Rate=0.00 MB/s
> 2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: scope_601: All inputs fetched for input 
> vertex : scope-601
> 2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: copy(1091856 (spillsFetched=1091856) of 
> 1091856. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.68 
> MB/s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3076) Reduce merge memory overhead to support large number of in-memory mapoutputs

2016-01-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122621#comment-15122621
 ] 

TezQA commented on TEZ-3076:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12785040/TEZ-3076.4.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1439//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1439//console

This message is automatically generated.

> Reduce merge memory overhead to support large number of in-memory mapoutputs
> 
>
> Key: TEZ-3076
> URL: https://issues.apache.org/jira/browse/TEZ-3076
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3076.1.patch, TEZ-3076.2.patch, 
> TEZ-3076.3-branch-0.7.patch, TEZ-3076.3.patch, TEZ-3076.4-branch-0.7.patch, 
> TEZ-3076.4.patch
>
>
> Here is a typical stack trace, though sometimes it occurs with final merge 
> (since in-memory segment overhead > mapout overhead)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at org.apache.hadoop.io.DataInputBuffer.(DataInputBuffer.java:68)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.(InMemoryReader.java:42)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.createInMemorySegments(MergeManager.java:837)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.access$200(MergeManager.java:75)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$InMemoryMerger.merge(MergeManager.java:642)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
> Details
>   around 1,000,000 spills were fetched committing around 100MB to the memory 
> budget (500,000 in memory). However, actual memory used for 500,000 segments 
> (50-350 bytes) is 480MB (expected 100-200MB)
> Mapout overhead is not budgeted
>   Each mapoutput needs around 50 bytes in addition to the data
> In Memory Segment overhead is not budgeted
>   Each In memory segment needs around 80 bytes in addition to the data
> Interaction with auto reduce parallelism
>   In this scenario, the upstream vertex was assuming 999 (pig's default hint 
> to use auto-reduce parallelism) downstream tasks. However, was reduced to 24 
> due to auto-reduce parallelism. This is putting 40 times more segments per 
> downstream task. Should auto-reduce parallelism consider merge overhead when 
> calculating parallelism?
> Legacy Default Sorter Empty Segment
>   Default sorter does not optimize empty segments like pipeline sorter does 
> and shows this symptom more.
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.MergeManager|: closeInMemoryFile - map-output of size: 
> 116, inMemoryMapOutputs.size() - 571831, commitMemory - 91503730, 
> usedMemory -91503846, mapOutput=MapOutput( AttemptIdentifier: 
> InputAttemptIdentifier [inputIdentifier=Input
> Identifier [inputIndex=763962], attemptNumber=0, 
> pathComponent=attempt_1444791925832_10460712_1_00_017766_0_10003, 
> spillType=0, spillId=-1], Type: MEMORY)
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: Completed fetch for attempt: {763962, 0, 
> attempt_1444791925832_10460712_1_00_017766_0_10003} to MEMORY, csize=128, 
> dsize=116, EndTime=1452426361208, TimeTaken=0, Rate=0.00 MB/s
> 2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: scope_601: All inputs fetched for input 
> vertex : scope-601
> 2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: copy(1091856 (spillsFetched=1091856) of 
> 1091856. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.68 
> MB/s)



--
This message was sent by Atlassian JIRA

Failed: TEZ-3076 PreCommit Build #1439

2016-01-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3076
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1439/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3633 lines...]
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-tests
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12785040/TEZ-3076.4.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1439//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1439//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
5b89bd3803e2e80636c82d721b32e6af1dacf880 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Compressed 3.17 MB of artifacts by 31.6% relative to #1435
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
7 tests failed.
FAILED:  org.apache.tez.test.TestFaultTolerance.testRandomFailingInputs

Error Message:
expected: but was:

Stack Trace:
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:141)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120)
at 
org.apache.tez.test.TestFaultTolerance.testRandomFailingInputs(TestFaultTolerance.java:763)


FAILED:  org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120)
at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:261)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 

[jira] [Commented] (TEZ-3071) run tez task on local mode is success,but run on cluster occurred this problem,java.lang.NullPointerException at com.teradata.sddg.sgprocessor.TezDataGeneratorTes

2016-01-28 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120985#comment-15120985
 ] 

Jeff Zhang commented on TEZ-3071:
-

If you want to remote debug tez in cluster mode, you need to do some 
configuration, e.g. if you want to debug TEZ AM, configure 
tez.am.launch.cmd-opts in tez-site.xml as -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN -Xdebug 
-Xrunjdwp:transport=dt_socket,address=9009,server=y,suspend=y so that I can 
remote debug your AM using whatever your favorite IDE. for debugging task is 
the same, just configure tez.task.launch.cmd-opts,  But actually tez support 
local mode, so you can debug it local environment if that is fine with you. 

> run tez task on local mode is success,but run on cluster occurred this 
> problem,java.lang.NullPointerException at 
> com.teradata.sddg.sgprocessor.TezDataGeneratorTest$DataGenerationProcessor.run(TezDataGeneratorTest.java:78)
> -
>
> Key: TEZ-3071
> URL: https://issues.apache.org/jira/browse/TEZ-3071
> Project: Apache Tez
>  Issue Type: Task
> Environment: hadoop2.6.0+tez0.7.0
>Reporter: Lvpenglin
>
> 16/01/24 21:29:13 INFO client.DAGClientImpl: Waiting for DAG to start running
> 16/01/24 21:29:25 INFO client.DAGClientImpl: DAG initialized: 
> CurrentState=Running
> 16/01/24 21:29:26 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 16/01/24 21:29:26 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 
> Killed: 0
> 16/01/24 21:29:26 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 
> Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:42 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 1
> 16/01/24 21:29:42 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 1
> 16/01/24 21:29:42 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:47 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 1
> 16/01/24 21:29:47 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 1
> 16/01/24 21:29:47 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:52 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 2
> 16/01/24 21:29:52 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 2
> 16/01/24 21:29:52 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:57 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 2
> 16/01/24 21:29:57 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 

[jira] [Comment Edited] (TEZ-3071) run tez task on local mode is success,but run on cluster occurred this problem,java.lang.NullPointerException at com.teradata.sddg.sgprocessor.TezDataGenerat

2016-01-28 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120985#comment-15120985
 ] 

Jeff Zhang edited comment on TEZ-3071 at 1/28/16 8:09 AM:
--

If you want to remote debug tez in cluster mode, you need to do some 
configuration, e.g. if you want to debug TEZ AM, configure 
tez.am.launch.cmd-opts in tez-site.xml as -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN -Xdebug 
-Xrunjdwp:transport=dt_socket,address=9009,server=y,suspend=y so that you can 
remote debug your AM using whatever your favorite IDE. for debugging task is 
the same, just configure tez.task.launch.cmd-opts,  But actually tez support 
local mode, so you can debug it local environment if that is fine with you. 


was (Author: zjffdu):
If you want to remote debug tez in cluster mode, you need to do some 
configuration, e.g. if you want to debug TEZ AM, configure 
tez.am.launch.cmd-opts in tez-site.xml as -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN -Xdebug 
-Xrunjdwp:transport=dt_socket,address=9009,server=y,suspend=y so that I can 
remote debug your AM using whatever your favorite IDE. for debugging task is 
the same, just configure tez.task.launch.cmd-opts,  But actually tez support 
local mode, so you can debug it local environment if that is fine with you. 

> run tez task on local mode is success,but run on cluster occurred this 
> problem,java.lang.NullPointerException at 
> com.teradata.sddg.sgprocessor.TezDataGeneratorTest$DataGenerationProcessor.run(TezDataGeneratorTest.java:78)
> -
>
> Key: TEZ-3071
> URL: https://issues.apache.org/jira/browse/TEZ-3071
> Project: Apache Tez
>  Issue Type: Task
> Environment: hadoop2.6.0+tez0.7.0
>Reporter: Lvpenglin
>
> 16/01/24 21:29:13 INFO client.DAGClientImpl: Waiting for DAG to start running
> 16/01/24 21:29:25 INFO client.DAGClientImpl: DAG initialized: 
> CurrentState=Running
> 16/01/24 21:29:26 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 16/01/24 21:29:26 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 
> Killed: 0
> 16/01/24 21:29:26 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 
> Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:42 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 1
> 16/01/24 21:29:42 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 1
> 16/01/24 21:29:42 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:47 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 1
> 16/01/24 21:29:47 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 1
> 16/01/24 21:29:47 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:52 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 2
> 16/01/24 21:29:52 INFO client.DAGClientImpl:VertexStatus: 

[jira] [Commented] (TEZ-3071) run tez task on local mode is success,but run on cluster occurred this problem,java.lang.NullPointerException at com.teradata.sddg.sgprocessor.TezDataGeneratorTes

2016-01-28 Thread Lvpenglin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121295#comment-15121295
 ] 

Lvpenglin commented on TEZ-3071:


ok,thank you very much.I have some question want to ask for you,My tez program 
needs to parse the config file,which is parsed by the dom4j just same as parse 
the  local file.I only set the file path to local path,when run the program on 
the cluster,I wonder tez whether can load my local file to hdfs and share to 
the slave to use.

> run tez task on local mode is success,but run on cluster occurred this 
> problem,java.lang.NullPointerException at 
> com.teradata.sddg.sgprocessor.TezDataGeneratorTest$DataGenerationProcessor.run(TezDataGeneratorTest.java:78)
> -
>
> Key: TEZ-3071
> URL: https://issues.apache.org/jira/browse/TEZ-3071
> Project: Apache Tez
>  Issue Type: Task
> Environment: hadoop2.6.0+tez0.7.0
>Reporter: Lvpenglin
>
> 16/01/24 21:29:13 INFO client.DAGClientImpl: Waiting for DAG to start running
> 16/01/24 21:29:25 INFO client.DAGClientImpl: DAG initialized: 
> CurrentState=Running
> 16/01/24 21:29:26 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 16/01/24 21:29:26 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 
> Killed: 0
> 16/01/24 21:29:26 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 
> Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:42 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 1
> 16/01/24 21:29:42 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 1
> 16/01/24 21:29:42 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:47 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 1
> 16/01/24 21:29:47 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 1
> 16/01/24 21:29:47 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:52 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 2
> 16/01/24 21:29:52 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 2
> 16/01/24 21:29:52 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:57 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 2
> 16/01/24 21:29:57 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 2
> 16/01/24 21:29:57 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:30:02 INFO 

[jira] [Commented] (TEZ-3071) run tez task on local mode is success,but run on cluster occurred this problem,java.lang.NullPointerException at com.teradata.sddg.sgprocessor.TezDataGeneratorTes

2016-01-28 Thread Lvpenglin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121302#comment-15121302
 ] 

Lvpenglin commented on TEZ-3071:


My program imitates the tez's example, and it can run success on local mode.But 
when I change to the cluster mode,some objects can't get the correct vaule,and 
awlays is null,so it occured the NullPointerException.

> run tez task on local mode is success,but run on cluster occurred this 
> problem,java.lang.NullPointerException at 
> com.teradata.sddg.sgprocessor.TezDataGeneratorTest$DataGenerationProcessor.run(TezDataGeneratorTest.java:78)
> -
>
> Key: TEZ-3071
> URL: https://issues.apache.org/jira/browse/TEZ-3071
> Project: Apache Tez
>  Issue Type: Task
> Environment: hadoop2.6.0+tez0.7.0
>Reporter: Lvpenglin
>
> 16/01/24 21:29:13 INFO client.DAGClientImpl: Waiting for DAG to start running
> 16/01/24 21:29:25 INFO client.DAGClientImpl: DAG initialized: 
> CurrentState=Running
> 16/01/24 21:29:26 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 16/01/24 21:29:26 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 
> Killed: 0
> 16/01/24 21:29:26 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 
> Killed: 0
> 16/01/24 21:29:32 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0
> 16/01/24 21:29:37 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:42 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 1
> 16/01/24 21:29:42 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 1
> 16/01/24 21:29:42 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:47 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 1
> 16/01/24 21:29:47 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 1
> 16/01/24 21:29:47 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:52 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 2
> 16/01/24 21:29:52 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 2
> 16/01/24 21:29:52 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:29:57 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 
> 2
> 16/01/24 21:29:57 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> ASSOCIATE Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 
> Killed: 0 FailedTaskAttempts: 2
> 16/01/24 21:29:57 INFO client.DAGClientImpl:VertexStatus: VertexName: 
> SALES_TRANSACTION Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 
> 0 Killed: 0
> 16/01/24 21:30:02 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
> TotalTasks: 2 Succeeded: 0 Running: 1 Failed: 0 Killed: 0 FailedTaskAttempts: 

[jira] [Commented] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121370#comment-15121370
 ] 

TezQA commented on TEZ-2307:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12784898/TEZ-2307-3.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1437//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1437//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1437//console

This message is automatically generated.

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2307 PreCommit Build #1437

2016-01-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2307
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1437/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3769 lines...]
[INFO] Finished at: 2016-01-28T12:55:00+00:00
[INFO] Final Memory: 84M/1049M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12784898/TEZ-2307-3.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1437//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1437//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1437//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
2a3630e840c96cfb23c391caf9543caa3a92203f logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Compressed 3.17 MB of artifacts by 29.5% relative to #1435
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121148#comment-15121148
 ] 

TezQA commented on TEZ-2307:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12784857/TEZ-2307-2.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1436//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1436//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1436//console

This message is automatically generated.

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-28 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121261#comment-15121261
 ] 

Oleksiy Sayankin commented on TEZ-3074:
---

Yes, turning off Tez and using just MapReduce fixes the issue. But our customer 
wants to use Tez to speed up Hive queries. 

Actually, these steps only simulate production cluster behavior, but not the 
exactly the same. They were found by our support team. To figure out what is 
going on with block location and why blkLocations.length = 0, we have added 
logging statements into Tes sources. Here they are:

{code:title=org.apache.tez.dag.api.DAG.java|borderStyle=solid}
  public synchronized DAG addTaskLocalFiles(Map 
localFiles) {
Preconditions.checkNotNull(localFiles);
logLocalFiles(localFiles);
logCommonTaskLocalFiles(commonTaskLocalFiles);
TezCommonUtils.addAdditionalLocalResources(localFiles, 
commonTaskLocalFiles, "DAG " + getName());
return this;
  }

  private static void logLocalFiles(Map localFiles){
LOG.info("###@@@ localFiles:");
 for(Map.Entry entry : localFiles.entrySet()){
   String key = entry.getKey();
   LocalResource localRecourse = entry.getValue();
   LOG.info("###@@@001 key = " + key + ", localRecourse.getSize() = " + 
localRecourse.getSize() + ", localRecourse.getType() = " + 
localRecourse.getType() + ", localRecourse.getVisibility() = " + 
localRecourse.getVisibility());
}
  }

  private static void logCommonTaskLocalFiles(Map 
commonTaskLocalFiles){
LOG.info("###@@@ commonTaskLocalFiles:");
for(Map.Entry entry : 
commonTaskLocalFiles.entrySet()){
  String key = entry.getKey();
  LocalResource localRecourse = entry.getValue();
  LOG.info("###@@@002 key = " + key + ", localRecourse.getSize() = " + 
localRecourse.getSize() + ", localRecourse.getType() = " + 
localRecourse.getType() + ", localRecourse.getVisibility() = " + 
localRecourse.getVisibility());
}
  }
{code}

and 

{code:title=org.apache.tez.mapreduce.hadoop.MRInputHelpers.java|borderStyle=solid}
  private static void updateLocalResourcesForInputSplits(
  FileSystem fs,
  InputSplitInfo inputSplitInfo,
  Map localResources) throws IOException {
if (localResources.containsKey(JOB_SPLIT_RESOURCE_NAME)) {
  throw new RuntimeException("LocalResources already contains a"
  + " resource named " + JOB_SPLIT_RESOURCE_NAME);
}
if (localResources.containsKey(JOB_SPLIT_METAINFO_RESOURCE_NAME)) {
  throw new RuntimeException("LocalResources already contains a"
  + " resource named " + JOB_SPLIT_METAINFO_RESOURCE_NAME);
}

LOG.info("###@@@003 inputSplitInfo.getSplitsFile() = " + 
inputSplitInfo.getSplitsFile());
{code}

But it gave nothing. Exception happened before any tag 
{noformat}###@@@{noformat} was printed out.

> Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while 
> working with Tez
> 
>
> Key: TEZ-3074
> URL: https://issues.apache.org/jira/browse/TEZ-3074
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.3
>Reporter: Oleksiy Sayankin
> Fix For: 0.5.3
>
> Attachments: tempsource.data
>
>
> *STEP 1. Install and configure Tez on yarn*
> *STEP 2. Configure hive for tez*
> *STEP 3. Create test tables in Hive and fill it with data*
> Enable dynamic partitioning in Hive. Add to {{hive-site.xml}} and restart 
> Hive.
> {code:xml}
> 
> 
>   hive.exec.dynamic.partition
>   true
> 
> 
>   hive.exec.dynamic.partition.mode
>   nonstrict
> 
> 
>   hive.exec.max.dynamic.partitions.pernode
>   2000
> 
> 
>   hive.exec.max.dynamic.partitions
>   2000
> 
> {code}
> Execute in command line
> {code}
> hadoop fs -put tempsource.data /
> {code}
> Execute in command line. Use attached file {{tempsource.data}}
> {code}
> hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',';
> hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',';
> hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
> hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM 
> tempsource;
> {code}
> *STEP 4. Mount NFS on cluster*
> *STEP 5. Run teragen test application*
> Use separate console
> {code}
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar 
> teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 
> -Dmapreduce.map.cpu.vcores=0 10 /user/hdfs/input
> {code}
> *STEP 6. Create many test files*
> Use separate 

Failed: TEZ-2307 PreCommit Build #1436

2016-01-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2307
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1436/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3768 lines...]
[INFO] Finished at: 2016-01-28T10:07:38+00:00
[INFO] Final Memory: 83M/1046M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12784857/TEZ-2307-2.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1436//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1436//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1436//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
ad15bf3e4749d149f45daf6a414c20c6c784d20a logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Compressed 3.21 MB of artifacts by 29.2% relative to #1435
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Comment Edited] (TEZ-3074) Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

2016-01-28 Thread Oleksiy Sayankin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121261#comment-15121261
 ] 

Oleksiy Sayankin edited comment on TEZ-3074 at 1/28/16 11:45 AM:
-

Yes, turning off Tez and using just MapReduce fixes the issue. But our customer 
wants to use Tez to speed up Hive queries. 

Actually, these steps only simulate production cluster behavior, but they not 
exactly the same. They were found by our support team. To figure out what is 
going on with block location and why blkLocations.length = 0, we have added 
logging statements into Tes sources. Here they are:

{code:title=org.apache.tez.dag.api.DAG.java|borderStyle=solid}
  public synchronized DAG addTaskLocalFiles(Map 
localFiles) {
Preconditions.checkNotNull(localFiles);
logLocalFiles(localFiles);
logCommonTaskLocalFiles(commonTaskLocalFiles);
TezCommonUtils.addAdditionalLocalResources(localFiles, 
commonTaskLocalFiles, "DAG " + getName());
return this;
  }

  private static void logLocalFiles(Map localFiles){
LOG.info("###@@@ localFiles:");
 for(Map.Entry entry : localFiles.entrySet()){
   String key = entry.getKey();
   LocalResource localRecourse = entry.getValue();
   LOG.info("###@@@001 key = " + key + ", localRecourse.getSize() = " + 
localRecourse.getSize() + ", localRecourse.getType() = " + 
localRecourse.getType() + ", localRecourse.getVisibility() = " + 
localRecourse.getVisibility());
}
  }

  private static void logCommonTaskLocalFiles(Map 
commonTaskLocalFiles){
LOG.info("###@@@ commonTaskLocalFiles:");
for(Map.Entry entry : 
commonTaskLocalFiles.entrySet()){
  String key = entry.getKey();
  LocalResource localRecourse = entry.getValue();
  LOG.info("###@@@002 key = " + key + ", localRecourse.getSize() = " + 
localRecourse.getSize() + ", localRecourse.getType() = " + 
localRecourse.getType() + ", localRecourse.getVisibility() = " + 
localRecourse.getVisibility());
}
  }
{code}

and 

{code:title=org.apache.tez.mapreduce.hadoop.MRInputHelpers.java|borderStyle=solid}
  private static void updateLocalResourcesForInputSplits(
  FileSystem fs,
  InputSplitInfo inputSplitInfo,
  Map localResources) throws IOException {
if (localResources.containsKey(JOB_SPLIT_RESOURCE_NAME)) {
  throw new RuntimeException("LocalResources already contains a"
  + " resource named " + JOB_SPLIT_RESOURCE_NAME);
}
if (localResources.containsKey(JOB_SPLIT_METAINFO_RESOURCE_NAME)) {
  throw new RuntimeException("LocalResources already contains a"
  + " resource named " + JOB_SPLIT_METAINFO_RESOURCE_NAME);
}

LOG.info("###@@@003 inputSplitInfo.getSplitsFile() = " + 
inputSplitInfo.getSplitsFile());
{code}

But it gave nothing. Exception happened before any tag 
{noformat}###@@@{noformat} was printed out.


was (Author: osayankin):
Yes, turning off Tez and using just MapReduce fixes the issue. But our customer 
wants to use Tez to speed up Hive queries. 

Actually, these steps only simulate production cluster behavior, but not the 
exactly the same. They were found by our support team. To figure out what is 
going on with block location and why blkLocations.length = 0, we have added 
logging statements into Tes sources. Here they are:

{code:title=org.apache.tez.dag.api.DAG.java|borderStyle=solid}
  public synchronized DAG addTaskLocalFiles(Map 
localFiles) {
Preconditions.checkNotNull(localFiles);
logLocalFiles(localFiles);
logCommonTaskLocalFiles(commonTaskLocalFiles);
TezCommonUtils.addAdditionalLocalResources(localFiles, 
commonTaskLocalFiles, "DAG " + getName());
return this;
  }

  private static void logLocalFiles(Map localFiles){
LOG.info("###@@@ localFiles:");
 for(Map.Entry entry : localFiles.entrySet()){
   String key = entry.getKey();
   LocalResource localRecourse = entry.getValue();
   LOG.info("###@@@001 key = " + key + ", localRecourse.getSize() = " + 
localRecourse.getSize() + ", localRecourse.getType() = " + 
localRecourse.getType() + ", localRecourse.getVisibility() = " + 
localRecourse.getVisibility());
}
  }

  private static void logCommonTaskLocalFiles(Map 
commonTaskLocalFiles){
LOG.info("###@@@ commonTaskLocalFiles:");
for(Map.Entry entry : 
commonTaskLocalFiles.entrySet()){
  String key = entry.getKey();
  LocalResource localRecourse = entry.getValue();
  LOG.info("###@@@002 key = " + key + ", localRecourse.getSize() = " + 
localRecourse.getSize() + ", localRecourse.getType() = " + 
localRecourse.getType() + ", localRecourse.getVisibility() = " + 
localRecourse.getVisibility());

[jira] [Updated] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2307:

Attachment: TEZ-2307-3.patch

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-3060) Tez UI 2: Activate auto-refresh

2016-01-28 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram reassigned TEZ-3060:
---

Assignee: Sreenath Somarajapuram

> Tez UI 2: Activate auto-refresh
> ---
>
> Key: TEZ-3060
> URL: https://issues.apache.org/jira/browse/TEZ-3060
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
>
> - When auto refresh is selected the following fields must update
> -- Status
> -- Progress
> -- Task counts
> -- Counters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3060) Tez UI 2: Activate auto-refresh

2016-01-28 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-3060:

Attachment: TEZ-3060.2.patch

Correcting a jshint failure.

> Tez UI 2: Activate auto-refresh
> ---
>
> Key: TEZ-3060
> URL: https://issues.apache.org/jira/browse/TEZ-3060
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3060.1.patch, TEZ-3060.2.patch
>
>
> - When auto refresh is selected the following fields must update
> -- Status
> -- Progress
> -- Task counts
> -- Counters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-3060) Tez UI 2: Activate auto-refresh

2016-01-28 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram resolved TEZ-3060.
-
Resolution: Fixed

Committed to TEZ-2980

> Tez UI 2: Activate auto-refresh
> ---
>
> Key: TEZ-3060
> URL: https://issues.apache.org/jira/browse/TEZ-3060
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3060.1.patch, TEZ-3060.2.patch
>
>
> - When auto refresh is selected the following fields must update
> -- Status
> -- Progress
> -- Task counts
> -- Counters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122063#comment-15122063
 ] 

Siddharth Seth edited comment on TEZ-2307 at 1/28/16 6:36 PM:
--

bq. I think make the submit RPC call wait might not be a good option because it 
is confused that user can not submit new dag even after previous dag is 
completed. So I suggest that user can still submit new dag, but keep the dag in 
NEW state until the cleanup of previous dag is done.
This is an option. Couple of things which will need to be considered though. 
The user will consider submitDag as successful. What happens if there's an 
error during the cleanup of the previous DAG ? That would have to be sent back 
as part of dag status monitoring. This can get fairly confusing for users - DAG 
accepted, but then notified about failure due to a cleanup error from the 
previous DAG.
Also, in case of an error during previous DAG cleanup - we should send back a 
specific error, which the user can act on. SessionNotRunning itself, or a new 
Exception - which users can use to launch a new application.

On the patch itself.
Instead of using a field - dagCleanupDone, I think it'll be better to move the 
DAGAppMaster into IDLE state only after the cleanup is done. My bad here, I 
should have fixed this in the patch which added the cleanup state. submitDag 
can wait on the DAG entering IDLE state instead of waiting on dagCleanup. A 
notification can be sent out once the DAG enters cleanup state. This also gets 
rid of the call from DAGImpl to set the dagCleanupedFlag to false.
- In the current patch, calling setDagCleanupDone races with handling of the 
DAGCleanupEvent if concurrent dispatchers are used. It'd be better to avoid 
this for when we support concurrent dispatchers as the default.
- A boolean field (maybe volatile) is sufficient instead of an AtomicBoolean 
since we're synchronizing on it.


was (Author: sseth):
bq. I think make the submit RPC call wait might not be a good option because it 
is confused that user can not submit new dag even after previous dag is 
completed. So I suggest that user can still submit new dag, but keep the dag in 
NEW state until the cleanup of previous dag is done.
This is an option. Couple of things which will need to be considered though. 
The user will consider submitDag as successful. What happens if there's an 
error during the cleanup of the previous DAG ? That would have to be sent back 
as part of dag status monitoring. This can get fairly confusing for users - DAG 
accepted, but then notified about failure due to a cleanup error from the 
previous DAG.

On the patch itself.
Instead of using a field - dagCleanupDone, I think it'll be better to move the 
DAGAppMaster into IDLE state only after the cleanup is done. My bad here, I 
should have fixed this in the patch which added the cleanup state. submitDag 
can wait on the DAG entering IDLE state instead of waiting on dagCleanup. A 
notification can be sent out once the DAG enters cleanup state. This also gets 
rid of the call from DAGImpl to set the dagCleanupedFlag to false.
- In the current patch, calling setDagCleanupDone races with handling of the 
DAGCleanupEvent if concurrent dispatchers are used. It'd be better to avoid 
this for when we support concurrent dispatchers as the default.
- A boolean field (maybe volatile) is sufficient instead of an AtomicBoolean 
since we're synchronizing on it.

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch, 
> TEZ-2307-4.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> 

[jira] [Updated] (TEZ-3060) Tez UI 2: Activate auto-refresh

2016-01-28 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-3060:

Attachment: TEZ-3060.1.patch

The patch auto refreshes all pages - Details, counters & tables.

> Tez UI 2: Activate auto-refresh
> ---
>
> Key: TEZ-3060
> URL: https://issues.apache.org/jira/browse/TEZ-3060
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3060.1.patch
>
>
> - When auto refresh is selected the following fields must update
> -- Status
> -- Progress
> -- Task counts
> -- Counters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122063#comment-15122063
 ] 

Siddharth Seth commented on TEZ-2307:
-

bq. I think make the submit RPC call wait might not be a good option because it 
is confused that user can not submit new dag even after previous dag is 
completed. So I suggest that user can still submit new dag, but keep the dag in 
NEW state until the cleanup of previous dag is done.
This is an option. Couple of things which will need to be considered though. 
The user will consider submitDag as successful. What happens if there's an 
error during the cleanup of the previous DAG ? That would have to be sent back 
as part of dag status monitoring. This can get fairly confusing for users - DAG 
accepted, but then notified about failure due to a cleanup error from the 
previous DAG.

On the patch itself.
Instead of using a field - dagCleanupDone, I think it'll be better to move the 
DAGAppMaster into IDLE state only after the cleanup is done. My bad here, I 
should have fixed this in the patch which added the cleanup state. submitDag 
can wait on the DAG entering IDLE state instead of waiting on dagCleanup. A 
notification can be sent out once the DAG enters cleanup state. This also gets 
rid of the call from DAGImpl to set the dagCleanupedFlag to false.
- In the current patch, calling setDagCleanupDone races with handling of the 
DAGCleanupEvent if concurrent dispatchers are used. It'd be better to avoid 
this for when we support concurrent dispatchers as the default.
- A boolean field (maybe volatile) is sufficient instead of an AtomicBoolean 
since we're synchronizing on it.

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch, 
> TEZ-2307-4.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-3061) Tez UI 2: Display in-progress vertex table in DAG details

2016-01-28 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram reassigned TEZ-3061:
---

Assignee: Sreenath Somarajapuram

> Tez UI 2: Display in-progress vertex table in DAG details
> -
>
> Key: TEZ-3061
> URL: https://issues.apache.org/jira/browse/TEZ-3061
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122733#comment-15122733
 ] 

Jeff Zhang edited comment on TEZ-2307 at 1/29/16 1:56 AM:
--

bq.  I think it'll be better to move the DAGAppMaster into IDLE state only 
after the cleanup is done. 
I thought about that. but it would make user confused that the last dag is 
completed but he still can not submit another dag due to AM is still in 
RUNNING. For now it seems dag clean up won't take too much, have you thought to 
put it in DAGImpl.finish ?  I think the root cause is that the dag state view 
on the client side is not consistent with that in AM side. So if we put dag 
clean up in DAGImpl.finish, then the 2 sides are consistent. 


was (Author: zjffdu):
bq.  I think it'll be better to move the DAGAppMaster into IDLE state only 
after the cleanup is done. 
I thought about that. but it would make user confused that the last dag is 
completed but he still can not submit another dag due to AM is still in 
RUNNING. For now it seems dag clean up won't take too much, have you thought to 
put it in DAGImpl.finish ?  I think the root cause is that the dag view on the 
client side is not consistent with that in AM side. So if we put dag clean up 
in DAGImpl.finish, then the 2 sides are consistent. 

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch, 
> TEZ-2307-4.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3076) Reduce merge memory overhead to support large number of in-memory mapoutputs

2016-01-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122921#comment-15122921
 ] 

TezQA commented on TEZ-3076:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12785040/TEZ-3076.4.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1440//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1440//console

This message is automatically generated.

> Reduce merge memory overhead to support large number of in-memory mapoutputs
> 
>
> Key: TEZ-3076
> URL: https://issues.apache.org/jira/browse/TEZ-3076
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3076.1.patch, TEZ-3076.2.patch, 
> TEZ-3076.3-branch-0.7.patch, TEZ-3076.3.patch, TEZ-3076.4-branch-0.7.patch, 
> TEZ-3076.4.patch
>
>
> Here is a typical stack trace, though sometimes it occurs with final merge 
> (since in-memory segment overhead > mapout overhead)
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at org.apache.hadoop.io.DataInputBuffer.(DataInputBuffer.java:68)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.(InMemoryReader.java:42)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.createInMemorySegments(MergeManager.java:837)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.access$200(MergeManager.java:75)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager$InMemoryMerger.merge(MergeManager.java:642)
>   at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeThread.run(MergeThread.java:89)
> Details
>   around 1,000,000 spills were fetched committing around 100MB to the memory 
> budget (500,000 in memory). However, actual memory used for 500,000 segments 
> (50-350 bytes) is 480MB (expected 100-200MB)
> Mapout overhead is not budgeted
>   Each mapoutput needs around 50 bytes in addition to the data
> In Memory Segment overhead is not budgeted
>   Each In memory segment needs around 80 bytes in addition to the data
> Interaction with auto reduce parallelism
>   In this scenario, the upstream vertex was assuming 999 (pig's default hint 
> to use auto-reduce parallelism) downstream tasks. However, was reduced to 24 
> due to auto-reduce parallelism. This is putting 40 times more segments per 
> downstream task. Should auto-reduce parallelism consider merge overhead when 
> calculating parallelism?
> Legacy Default Sorter Empty Segment
>   Default sorter does not optimize empty segments like pipeline sorter does 
> and shows this symptom more.
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.MergeManager|: closeInMemoryFile - map-output of size: 
> 116, inMemoryMapOutputs.size() - 571831, commitMemory - 91503730, 
> usedMemory -91503846, mapOutput=MapOutput( AttemptIdentifier: 
> InputAttemptIdentifier [inputIdentifier=Input
> Identifier [inputIndex=763962], attemptNumber=0, 
> pathComponent=attempt_1444791925832_10460712_1_00_017766_0_10003, 
> spillType=0, spillId=-1], Type: MEMORY)
> 2016-01-10 11:46:01,208 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: Completed fetch for attempt: {763962, 0, 
> attempt_1444791925832_10460712_1_00_017766_0_10003} to MEMORY, csize=128, 
> dsize=116, EndTime=1452426361208, TimeTaken=0, Rate=0.00 MB/s
> 2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: scope_601: All inputs fetched for input 
> vertex : scope-601
> 2016-01-10 11:46:01,209 [INFO] [fetcher {scope_601} #7] 
> |orderedgrouped.ShuffleScheduler|: copy(1091856 (spillsFetched=1091856) of 
> 1091856. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.68 
> MB/s)



--
This message was sent by Atlassian JIRA

Failed: TEZ-3076 PreCommit Build #1440

2016-01-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3076
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1440/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3820 lines...]
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-dag
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12785040/TEZ-3076.4.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.rm.TestContainerReuse

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1440//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1440//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
c6d8c6a5b8c9a7eafcf8021e850e68f91f990d39 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Compressed 3.19 MB of artifacts by 10.8% relative to #1435
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption

Error Message:

Wanted but not invoked:
taskSchedulerManagerForTest.taskAllocated(
0,
Mock for TaskAttempt, hashCode: 429644181,
,
Container: [ContainerId: container_1_0001_01_02, NodeId: host3:0, 
NodeHttpAddress: host3:0, Resource: , Priority: 1, 
Token: null, ]
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:686)

However, there were other interactions with this mock:
taskSchedulerManagerForTest.init(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:537)

taskSchedulerManagerForTest.setConfig(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:537)

taskSchedulerManagerForTest.serviceInit(
Configuration: core-default.xml, core-site.xml, yarn-default.xml, 
yarn-site.xml
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:537)

taskSchedulerManagerForTest.start();
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:538)

taskSchedulerManagerForTest.serviceStart();
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:538)

taskSchedulerManagerForTest.instantiateSchedulers(
"host",
0,
"",
Mock for AppContext, hashCode: 268900926
);
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:538)

taskSchedulerManagerForTest.getContainerSignatureMatcher();
-> at 
org.apache.tez.dag.app.rm.TestContainerReuse.testReuseWithTaskSpecificLaunchCmdOption(TestContainerReuse.java:538)


Failed: TEZ-2307 PreCommit Build #1438

2016-01-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2307
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1438/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3768 lines...]
[INFO] Total time: 52:38 min
[INFO] Finished at: 2016-01-28T14:38:43+00:00
[INFO] Final Memory: 74M/905M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12784914/TEZ-2307-4.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1438//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1438//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
90ead25e0adc03afe73e08937676367622759d41 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Compressed 3.17 MB of artifacts by 29.6% relative to #1435
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121595#comment-15121595
 ] 

TezQA commented on TEZ-2307:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12784914/TEZ-2307-4.patch
  against master revision 2bf27de.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1438//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1438//console

This message is automatically generated.

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch, 
> TEZ-2307-4.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122733#comment-15122733
 ] 

Jeff Zhang edited comment on TEZ-2307 at 1/29/16 1:56 AM:
--

bq.  I think it'll be better to move the DAGAppMaster into IDLE state only 
after the cleanup is done. 
I thought about that. but it would make user confused that the last dag is 
completed but he still can not submit another dag due to AM is still in 
RUNNING. For now it seems dag clean up won't take too much, have you thought to 
put it in DAGImpl.finish ?  I think the root cause is that the dag view on the 
client side is not consistent with that in AM side. So if we put dag clean up 
in DAGImpl.finish, then the 2 sides are consistent. 


was (Author: zjffdu):
bq.  I think it'll be better to move the DAGAppMaster into IDLE state only 
after the cleanup is done. 
I thought about that. but it would make user confused that the last dag is 
completed but he still can not submit another dag due to AM is still in 
RUNNING. For now it seems dag clean up won't take too much, have you thought to 
put it in DAGImpl.finish ?

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch, 
> TEZ-2307-4.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2307) Possible wrong error message when submitting new dag

2016-01-28 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122733#comment-15122733
 ] 

Jeff Zhang commented on TEZ-2307:
-

bq.  I think it'll be better to move the DAGAppMaster into IDLE state only 
after the cleanup is done. 
I thought about that. but it would make user confused that the last dag is 
completed but he still can not submit another dag due to AM is still in 
RUNNING. For now it seems dag clean up won't take too much, have you thought to 
put it in DAGImpl.finish ?

> Possible wrong error message when submitting new dag
> 
>
> Key: TEZ-2307
> URL: https://issues.apache.org/jira/browse/TEZ-2307
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2307-1.patch, TEZ-2307-2.patch, TEZ-2307-3.patch, 
> TEZ-2307-4.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client 
> ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>   at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)