[jira] [Commented] (TEZ-2298) Ignore sending failure message, when TaskReporter$HeartbeatCallable is shutdown

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487586#comment-14487586
 ] 

Hitesh Shah commented on TEZ-2298:
--

I believe [~sseth] fixed this recently to remove the NPE. 

> Ignore sending failure message, when TaskReporter$HeartbeatCallable is 
> shutdown
> ---
>
> Key: TEZ-2298
> URL: https://issues.apache.org/jira/browse/TEZ-2298
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>
> {noformat}
> 2015-04-09 03:18:37,002 INFO [TezChild] task.TezTaskRunner: Ignoring the 
> following exception since a previous exception is already registered
> java.lang.NullPointerException
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$300(TaskReporter.java:121)
> at 
> org.apache.tez.runtime.task.TaskReporter.taskFailed(TaskReporter.java:383)
> at 
> org.apache.tez.runtime.task.TezTaskRunner.sendFailure(TezTaskRunner.java:265)
> at 
> org.apache.tez.runtime.task.TezTaskRunner.access$600(TezTaskRunner.java:51)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:227)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Lot of these messages are seen in logs.  Even though it is harmless, this can 
> be very misleading when trying to debug the reason for some other exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2233) Allow EdgeProperty of an edge to be changed by VertexManager

2015-04-09 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486871#comment-14486871
 ] 

Giridharan Kesavan commented on TEZ-2233:
-

looks like someone tried logging into jira as hadoopqa user multiple times with 
random password and hadoopqa jira account got locked. 
logging into the jira and answering the CAPTCHA manually once seem to have 
solved this issue. 

> Allow EdgeProperty of an edge to be changed by VertexManager
> 
>
> Key: TEZ-2233
> URL: https://issues.apache.org/jira/browse/TEZ-2233
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 0.7.0
>
> Attachments: TEZ-2233.1.patch, TEZ-2233.2.patch
>
>
> Currently, all edge managers set during setParallelism end up becoming custom 
> edges. However, just like during dag creation, it should be possible to 
> specify standard edge types like scatter_gather if that is what the final 
> user decision is. More broadly, allowing the complete EdgeProperty to be 
> specified at runtime would make that action at par with compile time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2300) TezClient.stop() takes a lot of time

2015-04-09 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-2300:
---

 Summary: TezClient.stop() takes a lot of time
 Key: TEZ-2300
 URL: https://issues.apache.org/jira/browse/TEZ-2300
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy


  Noticed this with a couple of pig scripts which were not behaving well (AM 
close to OOM, etc) and even with some that were running fine. Pig calls 
Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
immediately or is hung. In both cases it either takes a long time for the yarn 
application to go to KILLED state. Many times I just end up calling yarn 
application -kill separately after waiting for 5 mins or more for it to get 
killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2225) Remove instances of LOG.isDebugEnabled

2015-04-09 Thread Vasanth kumar RJ (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488052#comment-14488052
 ] 

Vasanth kumar RJ commented on TEZ-2225:
---

[~sseth], String will be created and logging framework (log4j) have own way of 
handling string. Not sure how the logging framework take care.

> Remove instances of LOG.isDebugEnabled
> --
>
> Key: TEZ-2225
> URL: https://issues.apache.org/jira/browse/TEZ-2225
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Vasanth kumar RJ
>Assignee: Vasanth kumar RJ
>Priority: Minor
>  Labels: performance
> Attachments: TEZ-2225.1.patch, TEZ-2225.2.patch, TEZ-2225.3.patch
>
>
> Remove LOG.isDebugEnabled() and use parameterized debug logging



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2119) Counter for launched containers

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488082#comment-14488082
 ] 

Hitesh Shah commented on TEZ-2119:
--

Yes. That sounds right.

As an initial step, given that we end up surfacing most counters only at a dag 
level and not really at the AM level. we could try the following: 
- a counter to track the initial no. of held containers before the dag 
starts. This may contain both allocated and launched containers. We should 
probably separate out the two.
- Over the dag lifetime, how many containers were allocated, launched and 
released. And potentially a counter for each time a container is re-used.

\cc [~bikassaha] [~rajesh.balamohan] [~sseth] any comments?




> Counter for launched containers
> ---
>
> Key: TEZ-2119
> URL: https://issues.apache.org/jira/browse/TEZ-2119
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
>
> org.apache.tez.common.counters.DAGCounter
> NUM_SUCCEEDED_TASKS=32976
> TOTAL_LAUNCHED_TASKS=32976
> OTHER_LOCAL_TASKS=2
> DATA_LOCAL_TASKS=9147
> RACK_LOCAL_TASKS=23761
> It would be very nice to have TOTAL_LAUNCHED_CONTAINERS counter added to 
> this. The difference between TOTAL_LAUNCHED_CONTAINERS and 
> TOTAL_LAUNCHED_TASKS should make it easy to see how much container reuse is 
> happening. It is very hard to find out now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2302) Allow TaskCommunicators to subscribe for Vertex updates

2015-04-09 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-2302:
---

 Summary: Allow TaskCommunicators to subscribe for Vertex updates
 Key: TEZ-2302
 URL: https://issues.apache.org/jira/browse/TEZ-2302
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488143#comment-14488143
 ] 

Hitesh Shah commented on TEZ-2300:
--

[~rohini] Do you have the client and AM logs for this?

> TezClient.stop() takes a lot of time or does not work sometimes
> ---
>
> Key: TEZ-2300
> URL: https://issues.apache.org/jira/browse/TEZ-2300
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>
>   Noticed this with a couple of pig scripts which were not behaving well (AM 
> close to OOM, etc) and even with some that were running fine. Pig calls 
> Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
> immediately or is hung. In both cases it either takes a long time for the 
> yarn application to go to KILLED state. Many times I just end up calling yarn 
> application -kill separately after waiting for 5 mins or more for it to get 
> killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2299 PreCommit Build #428

2015-04-09 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2299
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/428/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2766 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12724288/TEZ-2299.1.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/428//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/428//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
2c9a0816f9c6f2e39ffced32b5554354b395795e logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #425
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2605350 bytes
Compression is 4.8%
Took 1.6 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2274) TEZ UI: Make TEZ-2236 & TEZ-2273 available for all pages, except 'All Dags'.

2015-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486926#comment-14486926
 ] 

Hadoop QA commented on TEZ-2274:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12724166/TEZ-2274.1.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/426//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/426//console

This message is automatically generated.

> TEZ UI: Make TEZ-2236 & TEZ-2273 available for all pages, except 'All Dags'.
> 
>
> Key: TEZ-2274
> URL: https://issues.apache.org/jira/browse/TEZ-2274
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-2274.1.patch
>
>
> 1. Make all tables use ember-table component
> 2. Support loading of all rows with caching
> 3. Support searching & sorting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)

2015-04-09 Thread Chris K Wensel (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487691#comment-14487691
 ] 

Chris K Wensel commented on TEZ-2237:
-

If I understand, the inputs to those two vertices are not being started 
upstream as outputs.

Here are the inputs from those two vertices.

{noformat}
2015-03-31 17:36:07,457 INFO [TezChild] tez.FlowProcessor: sourcing from: 
GroupBy(_pipe_346+_pipe_347)[by:[{1}:'key']] streamed: true
2015-03-31 17:36:07,457 INFO [TezChild] tez.FlowProcessor: sourcing from: 
Boundary(79A2D805A47147D087972D1BBB528A97) streamed: false
2015-04-01 02:49:10,119 INFO [TezChild] tez.FlowProcessor: sourcing from: 
GroupBy(_pipe_332+_pipe_333)[by:[{1}:'key']] streamed: true
2015-04-01 02:49:10,119 INFO [TezChild] tez.FlowProcessor: sourcing from: 
Boundary(ECCC5DB0C5C04B2EBED0FC3187C8487A) streamed: false
{noformat}

this grep shows all four outputs being started

{noformat}
cat application_142732418_1908.red.txt | grep -e 
"DEF94DA9BECF4A5BA6C85388B1EAAD41" -e "79A2D805A47147D087972D1BBB528A97" -e 
"ECCC5DB0C5C04B2EBED0FC3187C8487A" -e "7502F02C33714606AB4B03B7614469D0" | grep 
"Output#start()"
{noformat}

see the output-starts.txt attachment


> Complex DAG freezes and fails (was BufferTooSmallException raised in 
> UnorderedPartitionedKVWriter then DAG lingers)
> ---
>
> Key: TEZ-2237
> URL: https://issues.apache.org/jira/browse/TEZ-2237
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian Linux "jessie"
> OpenJDK Runtime Environment (build 1.8.0_40-internal-b27)
> OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
> 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system 
> disk + 4*1 or 2 TiB HDD for HDFS & local  (on-prem, dedicated hardware)
> Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 
> to run Cascading 3.0.0-wip-90 with TEZ 0.6.0
>Reporter: Cyrille Chépélov
> Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, 
> TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, 
> alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, 
> application_142732418_1908.red.txt.bz2, 
> application_1427964335235_2070.txt.red.txt.bz2, 
> appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, 
> appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, 
> gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, 
> oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, 
> output-starts.txt, start_containers.png, stop_containers.png, 
> syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, 
> syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png
>
>
> On a specific DAG with many vertices (actually part of a larger meta-DAG), 
> after about a hour of processing, several BufferTooSmallException are raised 
> in UnorderedPartitionedKVWriter (about one every two or three spills).
> Once these exceptions are raised, the DAG remains indefinitely "active", 
> tying up memory and CPU resources as far as YARN is concerned, while little 
> if any actual processing takes place. 
> It seems two separate issues are at hand:
>   1. BufferTooSmallException are raised even though, small as the actually 
> allocated buffers seem to be (around a couple megabytes were allotted whereas 
> 100MiB were requested), the actual keys and values are never bigger than 24 
> and 1024 bytes respectively.
>   2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop 
> (stop requests appear to be sent 7 hours after the BTSE exceptions are 
> raised, but 9 hours after these stop requests, the DAG was still lingering on 
> with all containers present tying up memory and CPU allocations)
> The emergence of the BTSE prevent the Cascade to complete, preventing from 
> validating the results compared to traditional MR1-based results. The lack of 
> conclusion renders the cluster queue unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488327#comment-14488327
 ] 

Hitesh Shah commented on TEZ-2305:
--

Can reproduce the issue with 

{code}
${HADOOP_COMMON_HOME}/bin/hadoop jar 
hadoop-mapreduce-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
 sleep -Dmapreduce.framework.name=yarn-tez  -m 1 -mt 1 -r 0 -rt 1
{code}

> MR compatibility sleep job fails with IOException: Undefined job output-path
> 
>
> Key: TEZ-2305
> URL: https://issues.apache.org/jira/browse/TEZ-2305
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Tassapol Athiapinya
>Priority: Critical
>
> Running MR sleep job has an IOException.
> {code}
> 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
> state FAILED due to: Vertex failed, vertexName=initialmap, 
> vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
> taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.io.IOException: 
> Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 1 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 2 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 3 failed, info=[Error: Failure while running 
> task

[jira] [Commented] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes

2015-04-09 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488203#comment-14488203
 ] 

Rohini Palaniswamy commented on TEZ-2300:
-

Recently, it was from too many job runs for issues addressed TEZ-776. I haven't 
kept track and the job logs are huge to dig and find out what I did to which 
job.  Will get some with future runs. 

> TezClient.stop() takes a lot of time or does not work sometimes
> ---
>
> Key: TEZ-2300
> URL: https://issues.apache.org/jira/browse/TEZ-2300
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>
>   Noticed this with a couple of pig scripts which were not behaving well (AM 
> close to OOM, etc) and even with some that were running fine. Pig calls 
> Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
> immediately or is hung. In both cases it either takes a long time for the 
> yarn application to go to KILLED state. Many times I just end up calling yarn 
> application -kill separately after waiting for 5 mins or more for it to get 
> killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1482) Fix memory issues for Local Mode running concurrent tasks

2015-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486952#comment-14486952
 ] 

Hadoop QA commented on TEZ-1482:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12723979/TEZ-1482.1.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/427//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/427//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/427//console

This message is automatically generated.

> Fix memory issues for Local Mode running concurrent tasks
> -
>
> Key: TEZ-1482
> URL: https://issues.apache.org/jira/browse/TEZ-1482
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Chen He
>Assignee: Prakash Ramachandran
> Attachments: TEZ-1482.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2119) Counter for launched containers

2015-04-09 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488222#comment-14488222
 ] 

Bikas Saha commented on TEZ-2119:
-

Held containers would be good to have. Idle containers? 

> Counter for launched containers
> ---
>
> Key: TEZ-2119
> URL: https://issues.apache.org/jira/browse/TEZ-2119
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
>
> org.apache.tez.common.counters.DAGCounter
> NUM_SUCCEEDED_TASKS=32976
> TOTAL_LAUNCHED_TASKS=32976
> OTHER_LOCAL_TASKS=2
> DATA_LOCAL_TASKS=9147
> RACK_LOCAL_TASKS=23761
> It would be very nice to have TOTAL_LAUNCHED_CONTAINERS counter added to 
> this. The difference between TOTAL_LAUNCHED_CONTAINERS and 
> TOTAL_LAUNCHED_TASKS should make it easy to see how much container reuse is 
> happening. It is very hard to find out now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2302) Allow TaskCommunicators to subscribe for Vertex updates

2015-04-09 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2302:

Attachment: TEZ-2302.1.txt

> Allow TaskCommunicators to subscribe for Vertex updates
> ---
>
> Key: TEZ-2302
> URL: https://issues.apache.org/jira/browse/TEZ-2302
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2302.1.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488243#comment-14488243
 ] 

Hitesh Shah commented on TEZ-2303:
--

\cc [~zjffdu]

> ConcurrentModificationException while processing recovery
> -
>
> Key: TEZ-2303
> URL: https://issues.apache.org/jira/browse/TEZ-2303
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Jason Lowe
>
> Saw a Tez AM log a few ConcurrentModificationException messages while trying 
> to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2299) Invalid dag creation in MRRSleepJob post TEZ-2293

2015-04-09 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2299:
-
Attachment: TEZ-2299.1.patch

[~sseth] review please. 

> Invalid dag creation in MRRSleepJob post TEZ-2293
> -
>
> Key: TEZ-2299
> URL: https://issues.apache.org/jira/browse/TEZ-2299
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-2299.1.patch
>
>
> When running: "mrrsleep -m 10 -mt 5000 -r 10 -irs 3 -ir 10 -irt 3000 -rt 5000"
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.createDAG(MRRSleepJob.java:584)
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.run(MRRSleepJob.java:748)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.main(MRRSleepJob.java:399)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2275) TEZ UI: Make dta loading faster and caching better

2015-04-09 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-2275:

Description: 
# Remove counter serialization for all entities to make loading faster.
# Make caching better: As records are shared, refreshing/reloading a record 
will reflect the changes everywhere. Hence refreshing task details will update 
the respective task row in dag/vertex -> tasks tables. 

  was:
# Remove counter serialization for all entities to make loading faster.
# Make caching better.
- 


> TEZ UI: Make dta loading faster and caching better
> --
>
> Key: TEZ-2275
> URL: https://issues.apache.org/jira/browse/TEZ-2275
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>
> # Remove counter serialization for all entities to make loading faster.
> # Make caching better: As records are shared, refreshing/reloading a record 
> will reflect the changes everywhere. Hence refreshing task details will 
> update the respective task row in dag/vertex -> tasks tables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488323#comment-14488323
 ] 

Hitesh Shah edited comment on TEZ-2305 at 4/9/15 9:33 PM:
--

[~tassapola] Can you provide the parameters to the sleep job? 

{code}
${HADOOP_COMMON_HOME}/bin/hadoop jar 
hadoop-mapreduce-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
 sleep -Dmapreduce.framework.name=yarn-tez  -m 1 -mt 1 -r 1 -rt 1
{code}

The above command works for me when running against master branch. 


was (Author: hitesh):
[~tassapola] Can you provide the parameters to the sleep job? 

{code}
${HADOOP_COMMON_HOME}/bin/hadoop jar 
hadoop-mapreduce-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
 sleep -Dmapreduce.framework.name=yarn-tez  -m 1 -mt 1 -r 1 -rt 1
{code}

The above command works for me. 

> MR compatibility sleep job fails with IOException: Undefined job output-path
> 
>
> Key: TEZ-2305
> URL: https://issues.apache.org/jira/browse/TEZ-2305
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Tassapol Athiapinya
>Priority: Critical
>
> Running MR sleep job has an IOException.
> {code}
> 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
> state FAILED due to: Vertex failed, vertexName=initialmap, 
> vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
> taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.io.IOException: 
> Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 1 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 2 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.

[jira] [Commented] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488323#comment-14488323
 ] 

Hitesh Shah commented on TEZ-2305:
--

[~tassapola] Can you provide the parameters to the sleep job? 

{code}
${HADOOP_COMMON_HOME}/bin/hadoop jar 
hadoop-mapreduce-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
 sleep -Dmapreduce.framework.name=yarn-tez  -m 1 -mt 1 -r 1 -rt 1
{code}

The above command works for me. 

> MR compatibility sleep job fails with IOException: Undefined job output-path
> 
>
> Key: TEZ-2305
> URL: https://issues.apache.org/jira/browse/TEZ-2305
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Tassapol Athiapinya
>Priority: Critical
>
> Running MR sleep job has an IOException.
> {code}
> 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
> state FAILED due to: Vertex failed, vertexName=initialmap, 
> vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
> taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.io.IOException: 
> Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 1 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 2 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 

[jira] [Commented] (TEZ-2301) Switch Tez Pre-commit builds to use tezqa user

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488050#comment-14488050
 ] 

Hitesh Shah commented on TEZ-2301:
--

\cc [~gkesavan] Any suggestions on how to go about this? 

> Switch Tez Pre-commit builds to use tezqa user 
> ---
>
> Key: TEZ-2301
> URL: https://issues.apache.org/jira/browse/TEZ-2301
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> There are potential experiments happening in hadoop land that might cause 
> repercussions on tez pre-commit builds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-2275) TEZ UI: Make dta loading faster and caching better

2015-04-09 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram reassigned TEZ-2275:
---

Assignee: Sreenath Somarajapuram

> TEZ UI: Make dta loading faster and caching better
> --
>
> Key: TEZ-2275
> URL: https://issues.apache.org/jira/browse/TEZ-2275
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
>
> # Remove counter serialization for all entities to make loading faster.
> # Make caching better: As records are shared, refreshing/reloading a record 
> will reflect the changes everywhere. Hence refreshing task details will 
> update the respective task row in dag/vertex -> tasks tables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

2015-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486907#comment-14486907
 ] 

Hadoop QA commented on TEZ-714:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12724164/TEZ-714-11.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/425//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/425//console

This message is automatically generated.

> OutputCommitters should not run in the main AM dispatcher thread
> 
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Jeff Zhang
>Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-10.patch, 
> TEZ-714-11.patch, TEZ-714-2.patch, TEZ-714-3.patch, TEZ-714-4.patch, 
> TEZ-714-5.patch, TEZ-714-6.patch, TEZ-714-7.patch, TEZ-714-8.patch, 
> TEZ-714-9.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2274 PreCommit Build #426

2015-04-09 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2274
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/426/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2779 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12724166/TEZ-2274.1.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/426//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/426//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
73554b37891a94f786929794a778ca570ac1076e logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #425
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2606062 bytes
Compression is 4.8%
Took 0.61 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2234) Allow vertex managers to get output size per source vertex

2015-04-09 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488293#comment-14488293
 ] 

Rohini Palaniswamy commented on TEZ-2234:
-

How about adding output records as well to the statistics? I can see that 
coming handy in future. Or at least have the APIs in place. 

> Allow vertex managers to get output size per source vertex
> --
>
> Key: TEZ-2234
> URL: https://issues.apache.org/jira/browse/TEZ-2234
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch
>
>
> Vertex managers may need per source vertex output stats to make 
> reconfiguration decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2304) InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery

2015-04-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488258#comment-14488258
 ] 

Jason Lowe commented on TEZ-2304:
-

Log snippets showing state transitions and eventual invalid transition error 
for one of the task attempts:

{noformat}
2015-04-09 19:36:17,939 INFO [main] app.RecoveryParser: Recovering from event, 
eventType=TASK_ATTEMPT_STARTED, event=vertexName=null, 
taskAttemptId=attempt_1428329756093_168563_1_00_006728_1, 
startTime=1428606125475, containerId=container_1428329756093_168563_01_015423, 
nodeId=x, inProgressLogs=null, completedLogs=null
[...]
2015-04-09 19:36:27,464 INFO [main] app.RecoveryParser: Recovering from event, 
eventType=TASK_ATTEMPT_FINISHED, event=vertexName=null, 
taskAttemptId=attempt_1428329756093_168563_1_00_006728_1, startTime=0, 
finishTime=1428606858102, timeTaken=1428606858102, status=FAILED, 
errorEnum=TASK_HEARTBEAT_ERROR, 
diagnostics=AttemptID:attempt_1428329756093_168563_1_00_006728_1 Timed out 
after 300 secs, counters=Counters: 1, 
org.apache.tez.common.counters.DAGCounter, OTHER_LOCAL_TASKS=1
[...]
2015-04-09 20:05:42,055 INFO [AsyncDispatcher event handler] 
history.HistoryEventHandler: 
[HISTORY][DAG:dag_1428329756093_168563_1][Event:TASK_ATTEMPT_FINISHED]: 
vertexName=scope-1741, 
taskAttemptId=attempt_1428329756093_168563_1_00_006728_1, startTime=0, 
finishTime=1428609942055, timeTaken=1428609942055, status=KILLED, 
errorEnum=UNKNOWN_ERROR, diagnostics=, counters=Counters: 0
[...]
2015-04-09 20:05:42,055 INFO [AsyncDispatcher event handler] 
impl.TaskAttemptImpl: attempt_1428329756093_168563_1_00_006728_1 TaskAttempt 
Transitioned from NEW to KILLED due to event TA_RECOVER
[...]
2015-04-09 20:05:45,748 INFO [AsyncDispatcher event handler] 
impl.TaskAttemptImpl: remoteTaskSpec:DAGName : x, VertexName: scope-1741, 
VertexParallelism: 9072, 
TaskAttemptID:attempt_1428329756093_168563_1_00_006728_1, 
processorName=org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor,
 inputSpecListSize=1, outputSpecListSize=21, inputSpecList=[{{ 
sourceVertexName=scope-0, physicalEdgeCount=1, 
inputClassName=org.apache.tez.mapreduce.input.MRInput }}, ], 
outputSpecList=[xxx]
[...]
2015-04-09 20:05:45,748 INFO [AsyncDispatcher event handler] 
impl.TaskAttemptImpl: attempt_1428329756093_168563_1_00_006728_1 TaskAttempt 
Transitioned from NEW to START_WAIT due to event TA_SCHEDULE
[...]
2015-04-09 20:05:47,026 INFO [TaskSchedulerEventHandlerThread] 
rm.YarnTaskSchedulerService: Allocation request for task: 
attempt_1428329756093_168563_1_00_006728_1 with request: 
Capability[]Priority[1] host: null rack: null
[...]
2015-04-09 20:05:47,198 INFO [AsyncDispatcher event handler] impl.VertexImpl: 
Source task attempt completed for vertex: vertex_1428329756093_168563_1_14 
[scope-2020] attempt: attempt_1428329756093_168563_1_00_006728_1 with state: 
KILLED vertexState: RUNNING
[...]
2015-04-09 20:05:47,198 INFO [AsyncDispatcher event handler] impl.VertexImpl: 
Source task attempt completed for vertex: vertex_1428329756093_168563_1_16 
[scope-1924] attempt: attempt_1428329756093_168563_1_00_006728_1 with state: 
KILLED vertexState: RUNNING
[...Source task attempt completed logs removed for brevity...]
2015-04-09 20:05:47,199 ERROR [AsyncDispatcher event handler] 
impl.TaskAttemptImpl: Can't handle this event at current state for 
attempt_1428329756093_168563_1_00_006728_1
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
TA_SCHEDULE at START_WAIT
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:670)
at 
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:112)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:1779)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:1764)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:722)
{noformat}

> InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery
> 
>
> Key: TEZ-2304
> URL: https://issues.apache.org/jira/browse/TEZ-2304
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Jason Lowe
>
> I saw a Tez AM throw a few InvalidStateTransiton

[jira] [Comment Edited] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes

2015-04-09 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488203#comment-14488203
 ] 

Rohini Palaniswamy edited comment on TEZ-2300 at 4/9/15 8:26 PM:
-

Recently, it was from too many job runs for issues TEZ-776 is trying to 
address. I haven't kept track and the job logs are huge to dig and find out 
what I did to which job.  Will get some with future runs. 


was (Author: rohini):
Recently, it was from too many job runs for issues addressed TEZ-776. I haven't 
kept track and the job logs are huge to dig and find out what I did to which 
job.  Will get some with future runs. 

> TezClient.stop() takes a lot of time or does not work sometimes
> ---
>
> Key: TEZ-2300
> URL: https://issues.apache.org/jira/browse/TEZ-2300
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>
>   Noticed this with a couple of pig scripts which were not behaving well (AM 
> close to OOM, etc) and even with some that were running fine. Pig calls 
> Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
> immediately or is hung. In both cases it either takes a long time for the 
> yarn application to go to KILLED state. Many times I just end up calling yarn 
> application -kill separately after waiting for 5 mins or more for it to get 
> killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-09 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2303:
---

 Summary: ConcurrentModificationException while processing recovery
 Key: TEZ-2303
 URL: https://issues.apache.org/jira/browse/TEZ-2303
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jason Lowe


Saw a Tez AM log a few ConcurrentModificationException messages while trying to 
recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2119) Counter for launched containers

2015-04-09 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487925#comment-14487925
 ] 

Rohini Palaniswamy commented on TEZ-2119:
-

Still need the LAUNCHED_CONTAINERS. If there are going to be containers 
launched and were not used (no tasks submitted to them) then you should have 
both TOTAL_LAUNCHED_CONTAINERS and TOTAL_USED_CONTAINERS. Also is it possible 
to add something like AVG_CONTAINER_REUSE or some better statistics to see the 
amount of reuse?

> Counter for launched containers
> ---
>
> Key: TEZ-2119
> URL: https://issues.apache.org/jira/browse/TEZ-2119
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
>
> org.apache.tez.common.counters.DAGCounter
> NUM_SUCCEEDED_TASKS=32976
> TOTAL_LAUNCHED_TASKS=32976
> OTHER_LOCAL_TASKS=2
> DATA_LOCAL_TASKS=9147
> RACK_LOCAL_TASKS=23761
> It would be very nice to have TOTAL_LAUNCHED_CONTAINERS counter added to 
> this. The difference between TOTAL_LAUNCHED_CONTAINERS and 
> TOTAL_LAUNCHED_TASKS should make it easy to see how much container reuse is 
> happening. It is very hard to find out now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2275) TEZ UI: Make data loading faster and caching better

2015-04-09 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-2275:

Summary: TEZ UI: Make data loading faster and caching better  (was: TEZ UI: 
Make dta loading faster and caching better)

> TEZ UI: Make data loading faster and caching better
> ---
>
> Key: TEZ-2275
> URL: https://issues.apache.org/jira/browse/TEZ-2275
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
>
> # Remove counter serialization for all entities to make loading faster.
> # Make caching better: As records are shared, refreshing/reloading a record 
> will reflect the changes everywhere. Hence refreshing task details will 
> update the respective task row in dag/vertex -> tasks tables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2261) Should add diagnostics in DAGAppMaster when recovery error happens

2015-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486878#comment-14486878
 ] 

Hadoop QA commented on TEZ-2261:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12724156/TEZ-2261-3.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/424//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/424//console

This message is automatically generated.

> Should add diagnostics in DAGAppMaster when recovery error happens
> --
>
> Key: TEZ-2261
> URL: https://issues.apache.org/jira/browse/TEZ-2261
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2261-1.patch, TEZ-2261-2.patch, TEZ-2261-3.patch
>
>
> Should add diagnostics in DAGAppMaster when recovery error happens, otherwise 
> AM is shutdown and the next dag submission will just throw 
> SessionNotRunningException which would confuse users.
> {code}
> if (this.historyEventHandler.hasRecoveryFailed()) {
>   LOG.warn("Recovery had a fatal error, shutting down session after" +
>   " DAG completion");
>   sessionStopped.set(true);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2261 PreCommit Build #424

2015-04-09 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2261
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/424/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2767 lines...]
[INFO] Final Memory: 77M/1191M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12724156/TEZ-2261-3.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/424//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/424//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
764d06c5b591afd011188b75e67dc6ebb787da10 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #420
Archived 44 artifacts
Archive block size is 32768
Received 8 blocks and 2473347 bytes
Compression is 9.6%
Took 0.94 sec
Description set: TEZ-2261
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Created] (TEZ-2298) Ignore sending failure message, when TaskReporter$HeartbeatCallable is shutdown

2015-04-09 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-2298:
-

 Summary: Ignore sending failure message, when 
TaskReporter$HeartbeatCallable is shutdown
 Key: TEZ-2298
 URL: https://issues.apache.org/jira/browse/TEZ-2298
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


{noformat}

2015-04-09 03:18:37,002 INFO [TezChild] task.TezTaskRunner: Ignoring the 
following exception since a previous exception is already registered
java.lang.NullPointerException
at 
org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$300(TaskReporter.java:121)
at 
org.apache.tez.runtime.task.TaskReporter.taskFailed(TaskReporter.java:383)
at 
org.apache.tez.runtime.task.TezTaskRunner.sendFailure(TezTaskRunner.java:265)
at 
org.apache.tez.runtime.task.TezTaskRunner.access$600(TezTaskRunner.java:51)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:227)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Lot of these messages are seen in logs.  Even though it is harmless, this can 
be very misleading when trying to debug the reason for some other exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2298) Ignore sending failure message, when TaskReporter$HeartbeatCallable is shutdown

2015-04-09 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487741#comment-14487741
 ] 

Siddharth Seth commented on TEZ-2298:
-

This is a spurious log line which is logged and causes confusion. There was a 
fix to an actual issue to get rid of an NPE. We should probably just remove 
this log or log just the message.

> Ignore sending failure message, when TaskReporter$HeartbeatCallable is 
> shutdown
> ---
>
> Key: TEZ-2298
> URL: https://issues.apache.org/jira/browse/TEZ-2298
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>
> {noformat}
> 2015-04-09 03:18:37,002 INFO [TezChild] task.TezTaskRunner: Ignoring the 
> following exception since a previous exception is already registered
> java.lang.NullPointerException
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$300(TaskReporter.java:121)
> at 
> org.apache.tez.runtime.task.TaskReporter.taskFailed(TaskReporter.java:383)
> at 
> org.apache.tez.runtime.task.TezTaskRunner.sendFailure(TezTaskRunner.java:265)
> at 
> org.apache.tez.runtime.task.TezTaskRunner.access$600(TezTaskRunner.java:51)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:227)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Lot of these messages are seen in logs.  Even though it is harmless, this can 
> be very misleading when trying to debug the reason for some other exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488327#comment-14488327
 ] 

Hitesh Shah edited comment on TEZ-2305 at 4/9/15 9:35 PM:
--

Can reproduce the issue with a map-only sleep job

{code}
${HADOOP_COMMON_HOME}/bin/hadoop jar 
hadoop-mapreduce-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
 sleep -Dmapreduce.framework.name=yarn-tez  -m 1 -mt 1 -r 0 -rt 1
{code}


was (Author: hitesh):
Can reproduce the issue with 

{code}
${HADOOP_COMMON_HOME}/bin/hadoop jar 
hadoop-mapreduce-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
 sleep -Dmapreduce.framework.name=yarn-tez  -m 1 -mt 1 -r 0 -rt 1
{code}

> MR compatibility sleep job fails with IOException: Undefined job output-path
> 
>
> Key: TEZ-2305
> URL: https://issues.apache.org/jira/browse/TEZ-2305
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Tassapol Athiapinya
>Priority: Critical
>
> Running MR sleep job has an IOException.
> {code}
> 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
> state FAILED due to: Vertex failed, vertexName=initialmap, 
> vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
> taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.io.IOException: 
> Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 1 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 2 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.Future

[jira] [Commented] (TEZ-2234) Allow vertex managers to get output size per source vertex

2015-04-09 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488295#comment-14488295
 ] 

Bikas Saha commented on TEZ-2234:
-

Refreshed patch and updated based on comments.

Could you please confirm, based on above explanation, that the OUTPUT_BYTES 
counter and SHUFFLE_BYTES_DECOMPRESSED counters are the correct ones to use for 
logical data written/read. Any clues on why they would not match up in a job?

Please let me know if there are further comments. Thanks!

> Allow vertex managers to get output size per source vertex
> --
>
> Key: TEZ-2234
> URL: https://issues.apache.org/jira/browse/TEZ-2234
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch, TEZ-2234.3.patch
>
>
> Vertex managers may need per source vertex output stats to make 
> reconfiguration decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2299) Invalid dag creation in MRRSleepJob post TEZ-2293

2015-04-09 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487938#comment-14487938
 ] 

Siddharth Seth commented on TEZ-2299:
-

+1.

> Invalid dag creation in MRRSleepJob post TEZ-2293
> -
>
> Key: TEZ-2299
> URL: https://issues.apache.org/jira/browse/TEZ-2299
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-2299.1.patch
>
>
> When running: "mrrsleep -m 10 -mt 5000 -r 10 -irs 3 -ir 10 -irt 3000 -rt 5000"
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.createDAG(MRRSleepJob.java:584)
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.run(MRRSleepJob.java:748)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.main(MRRSleepJob.java:399)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-1482 PreCommit Build #427

2015-04-09 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-1482
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/427/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2766 lines...]


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12723979/TEZ-1482.1.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/427//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/427//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/427//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
8832851c10c90c19dbb5c03c394d9f3fc6b9e43b logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #425
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2608991 bytes
Compression is 4.8%
Took 0.64 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2304) InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488242#comment-14488242
 ] 

Hitesh Shah commented on TEZ-2304:
--

\cc [~zjffdu]

> InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery
> 
>
> Key: TEZ-2304
> URL: https://issues.apache.org/jira/browse/TEZ-2304
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Jason Lowe
>
> I saw a Tez AM throw a few InvalidStateTransitonException (sic) instances 
> during recovery complaining about TA_SCHEDULE arriving at the START_WAIT 
> state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-04-09 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487990#comment-14487990
 ] 

Rohini Palaniswamy commented on TEZ-776:


jmap -histo:live output of AM from one of the scripts I am desperately trying 
to get to run

num #instances #bytes  class name
--
   1:  95865187 3067685984  org.apache.tez.runtime.api.impl.TezEvent
   2:  95346787 3051097184  
org.apache.tez.runtime.api.events.DataMovementEvent

95 million events in memory is crazy.

> Reduce AM mem usage caused by storing TezEvents
> ---
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Bikas Saha
> Attachments: TEZ-776.1.patch, TEZ-776.ondemand.1.patch, 
> TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
> TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
> TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-04-09 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487776#comment-14487776
 ] 

Rohini Palaniswamy commented on TEZ-776:


[~bikassaha]/[~sseth],
   Any further progress on this? Looks like we badly need this for some jobs.

> Reduce AM mem usage caused by storing TezEvents
> ---
>
> Key: TEZ-776
> URL: https://issues.apache.org/jira/browse/TEZ-776
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Bikas Saha
> Attachments: TEZ-776.1.patch, TEZ-776.ondemand.1.patch, 
> TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
> TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
> TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)

2015-04-09 Thread Chris K Wensel (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris K Wensel updated TEZ-2237:

Attachment: output-starts.txt

> Complex DAG freezes and fails (was BufferTooSmallException raised in 
> UnorderedPartitionedKVWriter then DAG lingers)
> ---
>
> Key: TEZ-2237
> URL: https://issues.apache.org/jira/browse/TEZ-2237
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian Linux "jessie"
> OpenJDK Runtime Environment (build 1.8.0_40-internal-b27)
> OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
> 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system 
> disk + 4*1 or 2 TiB HDD for HDFS & local  (on-prem, dedicated hardware)
> Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 
> to run Cascading 3.0.0-wip-90 with TEZ 0.6.0
>Reporter: Cyrille Chépélov
> Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, 
> TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, 
> alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, 
> application_142732418_1908.red.txt.bz2, 
> application_1427964335235_2070.txt.red.txt.bz2, 
> appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, 
> appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, 
> gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, 
> oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, 
> output-starts.txt, start_containers.png, stop_containers.png, 
> syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, 
> syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png
>
>
> On a specific DAG with many vertices (actually part of a larger meta-DAG), 
> after about a hour of processing, several BufferTooSmallException are raised 
> in UnorderedPartitionedKVWriter (about one every two or three spills).
> Once these exceptions are raised, the DAG remains indefinitely "active", 
> tying up memory and CPU resources as far as YARN is concerned, while little 
> if any actual processing takes place. 
> It seems two separate issues are at hand:
>   1. BufferTooSmallException are raised even though, small as the actually 
> allocated buffers seem to be (around a couple megabytes were allotted whereas 
> 100MiB were requested), the actual keys and values are never bigger than 24 
> and 1024 bytes respectively.
>   2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop 
> (stop requests appear to be sent 7 hours after the BTSE exceptions are 
> raised, but 9 hours after these stop requests, the DAG was still lingering on 
> with all containers present tying up memory and CPU allocations)
> The emergence of the BTSE prevent the Cascade to complete, preventing from 
> validating the results compared to traditional MR1-based results. The lack of 
> conclusion renders the cluster queue unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-145) Support a combiner processor that can run non-local to map/reduce nodes

2015-04-09 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487627#comment-14487627
 ] 

Tsuyoshi Ozawa commented on TEZ-145:


[~gopalv] thanks for taking a look at my patch and your comment!

{quote}
but the edge connectivity is still shuffle + total-order merged for both edges
{quote}

You're right. We can use UnorderedPartitionedKVEdge for optimization since 
aggregation tasks don't need sorting as you know.

{quote}
I will write a more detailed design document tomorrow and upload it here which 
will expand on Bikas's earlier comment and I will draw out the runtime 
expansion graphs to indicate the sort-preserving combiner instead of re-sorting 
data along the way (since the combiner never mutates the keys or output 
ordering).
{quote}

OK, Looking forward. 

> Support a combiner processor that can run non-local to map/reduce nodes
> ---
>
> Key: TEZ-145
> URL: https://issues.apache.org/jira/browse/TEZ-145
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-145.2.patch, WIP-TEZ-145-001.patch
>
>
> For aggregate operators that can benefit by running in multi-level trees, 
> support of being able to run a combiner in a non-local mode would allow 
> performance efficiencies to be gained by running a combiner at a rack-level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2304) InvalidStateTransitonException TA_SCHEDULE at START_WAIT during recovery

2015-04-09 Thread Jason Lowe (JIRA)
Jason Lowe created TEZ-2304:
---

 Summary: InvalidStateTransitonException TA_SCHEDULE at START_WAIT 
during recovery
 Key: TEZ-2304
 URL: https://issues.apache.org/jira/browse/TEZ-2304
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jason Lowe


I saw a Tez AM throw a few InvalidStateTransitonException (sic) instances 
during recovery complaining about TA_SCHEDULE arriving at the START_WAIT state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2301) Switch Tez Pre-commit builds to use tezqa user

2015-04-09 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-2301:


 Summary: Switch Tez Pre-commit builds to use tezqa user 
 Key: TEZ-2301
 URL: https://issues.apache.org/jira/browse/TEZ-2301
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah


There are potential experiments happening in hadoop land that might cause 
repercussions on tez pre-commit builds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2297) TEZ UI: On clicking a tab with table, notify the user if updated records are available in ATS.

2015-04-09 Thread Sreenath Somarajapuram (JIRA)
Sreenath Somarajapuram created TEZ-2297:
---

 Summary: TEZ UI: On clicking a tab with table, notify the user if 
updated records are available in ATS.
 Key: TEZ-2297
 URL: https://issues.apache.org/jira/browse/TEZ-2297
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Sreenath Somarajapuram






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2119) Counter for launched containers

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488055#comment-14488055
 ] 

Hitesh Shah commented on TEZ-2119:
--

I think there probably needs to be an allocated containers counter. In case of 
no pre-warm, a container is only launched when a task is assigned to it. 


> Counter for launched containers
> ---
>
> Key: TEZ-2119
> URL: https://issues.apache.org/jira/browse/TEZ-2119
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
>
> org.apache.tez.common.counters.DAGCounter
> NUM_SUCCEEDED_TASKS=32976
> TOTAL_LAUNCHED_TASKS=32976
> OTHER_LOCAL_TASKS=2
> DATA_LOCAL_TASKS=9147
> RACK_LOCAL_TASKS=23761
> It would be very nice to have TOTAL_LAUNCHED_CONTAINERS counter added to 
> this. The difference between TOTAL_LAUNCHED_CONTAINERS and 
> TOTAL_LAUNCHED_TASKS should make it easy to see how much container reuse is 
> happening. It is very hard to find out now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path

2015-04-09 Thread Tassapol Athiapinya (JIRA)
Tassapol Athiapinya created TEZ-2305:


 Summary: MR compatibility sleep job fails with IOException: 
Undefined job output-path
 Key: TEZ-2305
 URL: https://issues.apache.org/jira/browse/TEZ-2305
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Priority: Critical


Running MR sleep job has an IOException.

{code}
15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
state FAILED due to: Vertex failed, vertexName=initialmap, 
vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Failure while running task:java.io.IOException: Undefined job 
output-path
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
at 
org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Error: Failure while running 
task:java.io.IOException: Undefined job output-path
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
at 
org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 2 failed, info=[Error: Failure while running 
task:java.io.IOException: Undefined job output-path
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
at 
org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 3 failed, info=[Error: Failure while running 
task:java.io.IOException: Undefined job output-path
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
at 
org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputC

[jira] [Comment Edited] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488323#comment-14488323
 ] 

Hitesh Shah edited comment on TEZ-2305 at 4/9/15 9:34 PM:
--

[~tassapola] Can you provide the parameters to the sleep job? 

{code}
${HADOOP_COMMON_HOME}/bin/hadoop jar 
hadoop-mapreduce-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
 sleep -Dmapreduce.framework.name=yarn-tez  -m 1 -mt 1 -r 1 -rt 1
{code}

The above command works for me when running against master branch. Are you 
running a map-only sleep job? 


was (Author: hitesh):
[~tassapola] Can you provide the parameters to the sleep job? 

{code}
${HADOOP_COMMON_HOME}/bin/hadoop jar 
hadoop-mapreduce-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
 sleep -Dmapreduce.framework.name=yarn-tez  -m 1 -mt 1 -r 1 -rt 1
{code}

The above command works for me when running against master branch. 

> MR compatibility sleep job fails with IOException: Undefined job output-path
> 
>
> Key: TEZ-2305
> URL: https://issues.apache.org/jira/browse/TEZ-2305
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Tassapol Athiapinya
>Priority: Critical
>
> Running MR sleep job has an IOException.
> {code}
> 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
> state FAILED due to: Vertex failed, vertexName=initialmap, 
> vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
> taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.io.IOException: 
> Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 1 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 2 failed, info=[Error: Failure while running 
> task:java.io.IOException: Undefined job output-path
>   at 
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
>   at 
> org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
>   at org.apache.tez.common.CallableWithNdc

[jira] [Updated] (TEZ-2300) TezClient.stop() takes a lot of time or does not work sometimes

2015-04-09 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated TEZ-2300:

Summary: TezClient.stop() takes a lot of time or does not work sometimes  
(was: TezClient.stop() takes a lot of time)

> TezClient.stop() takes a lot of time or does not work sometimes
> ---
>
> Key: TEZ-2300
> URL: https://issues.apache.org/jira/browse/TEZ-2300
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>
>   Noticed this with a couple of pig scripts which were not behaving well (AM 
> close to OOM, etc) and even with some that were running fine. Pig calls 
> Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
> immediately or is hung. In both cases it either takes a long time for the 
> yarn application to go to KILLED state. Many times I just end up calling yarn 
> application -kill separately after waiting for 5 mins or more for it to get 
> killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2119) Counter for launched containers

2015-04-09 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488060#comment-14488060
 ] 

Rohini Palaniswamy commented on TEZ-2119:
-

So ALLOCATED, LAUNCHED and USED containers?  Launched and Used will be same in 
case of no pre-warm but could differ with pre-warm. Is  my understanding right?

> Counter for launched containers
> ---
>
> Key: TEZ-2119
> URL: https://issues.apache.org/jira/browse/TEZ-2119
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Jeff Zhang
>
> org.apache.tez.common.counters.DAGCounter
> NUM_SUCCEEDED_TASKS=32976
> TOTAL_LAUNCHED_TASKS=32976
> OTHER_LOCAL_TASKS=2
> DATA_LOCAL_TASKS=9147
> RACK_LOCAL_TASKS=23761
> It would be very nice to have TOTAL_LAUNCHED_CONTAINERS counter added to 
> this. The difference between TOTAL_LAUNCHED_CONTAINERS and 
> TOTAL_LAUNCHED_TASKS should make it easy to see how much container reuse is 
> happening. It is very hard to find out now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-2302) Allow TaskCommunicators to subscribe for Vertex updates

2015-04-09 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-2302.
-
   Resolution: Fixed
Fix Version/s: TEZ-2003

> Allow TaskCommunicators to subscribe for Vertex updates
> ---
>
> Key: TEZ-2302
> URL: https://issues.apache.org/jira/browse/TEZ-2302
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: TEZ-2003
>
> Attachments: TEZ-2302.1.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2303) ConcurrentModificationException while processing recovery

2015-04-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488219#comment-14488219
 ] 

Jason Lowe commented on TEZ-2303:
-

{noformat}
2015-04-09 19:36:11,231 INFO [main] app.RecoveryParser: Recovering from event, 
eventType=VERTEX_INITIALIZED, event=vertexName=scope-1973, 
vertexId=vertex_1428329756093_168563_1_43, initRequestedTime=1428606011138, 
initedTime=1428606011166, numTasks=769, processorName=null, 
additionalInputsCount=0
2015-04-09 19:36:11,231 INFO [main] impl.VertexImpl: Setting vertexManager to 
ShuffleVertexManager for vertex_1428329756093_168563_1_43 [scope-1973]
2015-04-09 19:36:11,242 INFO [main] vertexmanager.ShuffleVertexManager: Shuffle 
Vertex Manager: settings minFrac:0.25 maxFrac:0.75 auto:false 
desiredTaskIput:104857600 minTasks:1
2015-04-09 19:36:11,251 WARN [IPC Server handler 0 on x] ipc.Server: IPC Server 
handler 0 on x, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from x Call#1965 Retry#0
java.util.ConcurrentModificationException
at 
java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394)
at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.getRunningTasks(VertexImpl.java:892)
at 
org.apache.tez.dag.app.dag.impl.VertexImpl.getVertexProgress(VertexImpl.java:988)
at 
org.apache.tez.dag.app.dag.impl.DAGImpl.getDAGStatus(DAGImpl.java:694)
at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAGStatus(DAGClientHandler.java:62)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:98)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
{noformat}

Looks like a client trying to obtain status from the new attempt is sneaking in 
and walking the list of tasks as the recovery process is building that list.

> ConcurrentModificationException while processing recovery
> -
>
> Key: TEZ-2303
> URL: https://issues.apache.org/jira/browse/TEZ-2303
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Jason Lowe
>
> Saw a Tez AM log a few ConcurrentModificationException messages while trying 
> to recover from a previous attempt that crashed.  Exception details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2299) Invalid dag creation in MRRSleepJob post TEZ-2293

2015-04-09 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-2299:


 Summary: Invalid dag creation in MRRSleepJob post TEZ-2293
 Key: TEZ-2299
 URL: https://issues.apache.org/jira/browse/TEZ-2299
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah


java.lang.ArrayIndexOutOfBoundsException: 3
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.createDAG(MRRSleepJob.java:584)
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.run(MRRSleepJob.java:748)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.main(MRRSleepJob.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2299) Invalid dag creation in MRRSleepJob post TEZ-2293

2015-04-09 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2299:
-
Description: 
When running: "mrrsleep -m 10 -mt 5000 -r 10 -irs 3 -ir 10 -irt 3000 -rt 5000"

java.lang.ArrayIndexOutOfBoundsException: 3
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.createDAG(MRRSleepJob.java:584)
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.run(MRRSleepJob.java:748)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.main(MRRSleepJob.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)

  was:
java.lang.ArrayIndexOutOfBoundsException: 3
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.createDAG(MRRSleepJob.java:584)
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.run(MRRSleepJob.java:748)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.tez.mapreduce.examples.MRRSleepJob.main(MRRSleepJob.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)


> Invalid dag creation in MRRSleepJob post TEZ-2293
> -
>
> Key: TEZ-2299
> URL: https://issues.apache.org/jira/browse/TEZ-2299
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>
> When running: "mrrsleep -m 10 -mt 5000 -r 10 -irs 3 -ir 10 -irt 3000 -rt 5000"
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.createDAG(MRRSleepJob.java:584)
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.run(MRRSleepJob.java:748)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.main(MRRSleepJob.java:399)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2259) Push additional data to Timeline for Recovery for better consumption in UI

2015-04-09 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2259:
-
Attachment: TEZ-2259.1.patch

[~zjffdu] [~pramachandran] review please. 

> Push additional data to Timeline for Recovery for better consumption in UI
> --
>
> Key: TEZ-2259
> URL: https://issues.apache.org/jira/browse/TEZ-2259
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-2259.1.patch
>
>
> Some things I can think of: 
>  
>- applicationAttemptId in which the dag was submitted
>- appAttemptId in which the dag was completed 
> Above provides implicit information on how many app attempts the dag spanned 
> ( and therefore recovered how many times ).
>   
>- Maybe an implicit event mentioning that the DAG was recovered and in 
> which attempt it was recovered. Possibly add information on what state was 
> recovered?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2234) Allow vertex managers to get output size per source vertex

2015-04-09 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488269#comment-14488269
 ] 

Bikas Saha commented on TEZ-2234:
-

bq. might want to remove unwanted import in LogicalIOProcessorRuntimeTask
Removed

bq.  SHUFFLE_BYTES_DECOMPRESSED is considered. For Outputs, OUTPUT_BYTES is 
considered. 
We want to give logical data written/read (at least for the current API's). 
Thats why OUTPUT_BYTES has been used and SHUFFLE_BYTES_DECOMPRESSED has been 
used. However, from a local run, there values don't match up - Wondering why? 
Do you have any clues?

bq. progress (with speculation on), TaskImpl.getStatistics() chooses the best 
progressed attempt 
Yes. the documentation says that these are point in time values and can change 
with a refresh.

bq. Why should IOIndices be a map and not a set?. Will indices be used later?
Its currently not used but a map is put in place to add more info later on if 
needed. The current integer value can be used to create an array of statistics 
values (per logical edge) instead of the using a map (in TaskStatistics 
object). However, memory overhead is small even with a map - so the array based 
impl with these indices was not needed.

bq. Can you plz share more details on the TODO in ShuffleUtils (or create a 
separate JIRA)?
The todo is orthogonal to the patch. I was not sure if finalMergeEnabled would 
cause multiple VM events to be sent out (one per spill) before isLastEvent 
becomes true. If that is the case, then I will open a new jira to track that. 
What do you think? The solution would be to move the VM event sending code to 
the close() method.
{code}if (finalMergeEnabled || isLastEvent) {
  ShuffleUserPayloads.VertexManagerEventPayloadProto.Builder vmBuilder =
  ShuffleUserPayloads.VertexManagerEventPayloadProto.newBuilder();

  long outputSize = 
context.getCounters().findCounter(TaskCounter.OUTPUT_BYTES).getValue();

  //Set this information only when required.  In pipelined shuffle, 
multiple events would end
  // up adding up to final outputsize.  This is needed for auto-reduce 
parallelism to work
  // properly.
  vmBuilder.setOutputSize(outputSize);
  VertexManagerEvent vmEvent = VertexManagerEvent.create(
  context.getDestinationVertexName(), 
vmBuilder.build().toByteString().asReadOnlyByteBuffer());
  eventList.add(vmEvent);
}{code}



> Allow vertex managers to get output size per source vertex
> --
>
> Key: TEZ-2234
> URL: https://issues.apache.org/jira/browse/TEZ-2234
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch
>
>
> Vertex managers may need per source vertex output stats to make 
> reconfiguration decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2299) Invalid dag creation in MRRSleepJob post TEZ-2293

2015-04-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488035#comment-14488035
 ] 

Hadoop QA commented on TEZ-2299:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12724288/TEZ-2299.1.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/428//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/428//console

This message is automatically generated.

> Invalid dag creation in MRRSleepJob post TEZ-2293
> -
>
> Key: TEZ-2299
> URL: https://issues.apache.org/jira/browse/TEZ-2299
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-2299.1.patch
>
>
> When running: "mrrsleep -m 10 -mt 5000 -r 10 -irs 3 -ir 10 -irt 3000 -rt 5000"
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.createDAG(MRRSleepJob.java:584)
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.run(MRRSleepJob.java:748)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.tez.mapreduce.examples.MRRSleepJob.main(MRRSleepJob.java:399)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2234) Allow vertex managers to get output size per source vertex

2015-04-09 Thread Bikas Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-2234:

Attachment: TEZ-2234.3.patch

> Allow vertex managers to get output size per source vertex
> --
>
> Key: TEZ-2234
> URL: https://issues.apache.org/jira/browse/TEZ-2234
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: TEZ-2234.1.patch, TEZ-2234.2.patch, TEZ-2234.3.patch
>
>
> Vertex managers may need per source vertex output stats to make 
> reconfiguration decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2301) Switch Tez Pre-commit builds to use tezqa user

2015-04-09 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2301:
-
Attachment: TEZ-2301.1.patch

> Switch Tez Pre-commit builds to use tezqa user 
> ---
>
> Key: TEZ-2301
> URL: https://issues.apache.org/jira/browse/TEZ-2301
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
> Attachments: TEZ-2301.1.patch
>
>
> There are potential experiments happening in hadoop land that might cause 
> repercussions on tez pre-commit builds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-714 PreCommit Build #425

2015-04-09 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-714
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/425/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2815 lines...]
[INFO] Final Memory: 68M/851M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12724164/TEZ-714-11.patch
  against master revision 936ff8d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/425//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/425//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
f0474320b75ba959bacd5438107cb2930de11d5e logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #424
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2612115 bytes
Compression is 4.8%
Took 0.74 sec
Description set: TEZ-714
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed