[jira] [Updated] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2758:

Attachment: TEZ-2758-3.patch

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934791#comment-14934791
 ] 

Jeff Zhang commented on TEZ-2758:
-

Comments addressed.

bq. Test ( testMultipleDAGFinishedEvent ) should contain atleast one assertTrue 
check in addition to the checks that stream is closed and removed.
Can not verify it is closed Because can't get the stream object as it already 
be removed. 

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2724) Tez Client keeps on showing old status when application is finished but RM is shutdown

2015-09-29 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935177#comment-14935177
 ] 

Jeff Zhang commented on TEZ-2724:
-

[~hitesh] Please help review

> Tez Client keeps on showing old status when application is finished but RM is 
> shutdown
> --
>
> Key: TEZ-2724
> URL: https://issues.apache.org/jira/browse/TEZ-2724
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.4
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2724-1.patch, TEZ-2724-2.patch, 
> amrecovery_mutlipleamrestart.txt
>
>
> From the logs, it seems the ipc retry interval is set as 20 seconds and ipc 
> max retries is 45. This means that the client will retry the RPC connection 
> for total 900 (20*45) seconds. And in this period, the application may 
> already complete and RM Restarting may be triggered as said in the jira 
> description. And I think the RM recovery is not enabled, so even the new RM 
> is restarted, the original application info is lost, that means the client 
> can never get the correct application report which makes it showing the old 
> status forever. 
> {code}
> 15/05/07 19:13:43 INFO ipc.Client: Retrying connect to server: 
> maint22-tez12/100.79.80.19:52822. Already tried 26 time(s); maxRetries=45
> Deleted /user/hadoopqa/Input1
> RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs -ls 
> /user/hadoopqa/Input2
> RUNNING: call D:\hdp\hadoop-2.6.0.2.2.6.0-2782\bin\hdfs.cmd dfs  -rm -r 
> -skipTrash /user/hadoopqa/Input2
> 15/05/07 19:14:03 INFO ipc.Client: Retrying connect to server: 
> maint22-tez12/100.79.80.19:52822. Already tried 27 time(s); maxRetries=45
> {code}
> Configuration to reproduce this issue
> * disable generic application history 
> (yarn.timeline-service.generic-application-history.enabled)
> * disable rm recovery (yarn.resourcemanager.recovery.enabled)
> * increase the ipc retry interval and max retry 
> (ipc.client.connect.retry.interval & ipc.client.connect.max.retries)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2096) TEZ-UI : Add link to view AM log of finished & running apps

2015-09-29 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-2096:
-
Attachment: TEZ-2096.1.patch

[~Sreenath], could you take a look at this patch to see if this is a good 
starter patch for this jira?

> TEZ-UI : Add link to view AM log of finished & running apps
> ---
>
> Key: TEZ-2096
> URL: https://issues.apache.org/jira/browse/TEZ-2096
> Project: Apache Tez
>  Issue Type: Improvement
>  Components: UI
>Reporter: Rajesh Balamohan
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-2096.1.patch
>
>
> Currently, user can view the logs of task attempts via tez-ui.  It would be 
> good to provide similar feature for viewing AM logs as well (e.g user wants 
> to view the AM log of a failed DAG for any exceptions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2716) DefaultSorter.isRleNeeded not thread safe

2015-09-29 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936097#comment-14936097
 ] 

Rajesh Balamohan commented on TEZ-2716:
---

It does not cause task failures. During spills, DefaultSorter determines 
whether RLE is needed or not based on the total number of keys gathered and the 
number of same keys seen so far.  These counters are updated in main thread.  
In spill thread, these counters would not be accurately visible without proper 
synchronization. This could cause invalid computation for isRLENeeded due to 
thread safety issues. Patch tries to address this scenario.

> DefaultSorter.isRleNeeded not thread safe
> -
>
> Key: TEZ-2716
> URL: https://issues.apache.org/jira/browse/TEZ-2716
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Siddharth Seth
>Assignee: Rajesh Balamohan
> Fix For: 0.7.1, 0.8.1
>
> Attachments: TEZ-2716.1.patch, TEZ-2716.2.patch, 
> TEZ-2716.branch-0.6-and-0.5.patch
>
>
> TEZ-1997.
> Should be targeted at the same set of versions that TEZ-1997 goes into.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs

2015-09-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935959#comment-14935959
 ] 

Gopal V commented on TEZ-2850:
--

I've been trying to understand why we even have a reference to IFileInputStream 
from the Segment.

The shuffleToMemory() should throw away the IFileInputStream as soon as it 
copies the data into memory.

>From my understanding of the merger code, for in-memory segments, this buffer 
>is assumed to be already thrown away after the reader pulls it into memory.

Only disk segments should be having 4kb chunks attached to them (a total of 4Mb 
with a 100 sort factor).

> Tez MergeManager OOM for small Map Outputs
> --
>
> Key: TEZ-2850
> URL: https://issues.apache.org/jira/browse/TEZ-2850
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Saikat
>Assignee: Saikat
> Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2855) NPE while routing events

2015-09-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2855:

Attachment: TEZ-2855.1.txt

Patch for master to fix the VM NPE.
On the logging changes - that's a bigger problem since we aren't handling 
RuntimeExceptions - created TEZ-2862 to track this. For exceptions we do handle 
- the vertex name and id is already logged.

[~bikassaha], [~hitesh], [~zjffdu] - please review.

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz, TEZ-2855.1.txt
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2781) Fallback to send only TaskAttemptFailedEvent if taskFailed heartbeat fails

2015-09-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936017#comment-14936017
 ] 

Hitesh Shah commented on TEZ-2781:
--

[~sseth] Mind taking a look? This will help improve diagnostics where counters 
exceed limits. 

> Fallback to send only TaskAttemptFailedEvent if taskFailed heartbeat fails
> --
>
> Key: TEZ-2781
> URL: https://issues.apache.org/jira/browse/TEZ-2781
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.4
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2781-1.patch, TEZ-2781-2.patch
>
>
> It is possible the taskFailed heartbeat fails to send to AM (due to counter 
> limitation exceed) . In that case client can not get the right diagnostic 
> info. 
> {code}
> hive> select gencounter(2500) from (select count(*) from abc) a;
> Query ID = hrt_qa_2015083122_1956a7d6-1d41-406b-9266-af56ed21883c
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1440915851419_0007)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1  SUCCEEDED  0  000   0  
>  0
> Reducer 2 FAILED  1  001   4  
>  0
> 
> VERTICES: 01/02  [>>--] 0%ELAPSED TIME: 25.44 s
> 
> Status: Failed
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1440915851419_0007_2_01, 
> diagnostics=[Task failed, taskId=task_1440915851419_0007_2_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Container 
> container_e02_1440915851419_0007_01_02 finished with diagnostics set to 
> [Container failed. ]], TaskAttempt 1 failed, info=[Container 
> container_e02_1440915851419_0007_01_03 finished with diagnostics set to 
> [Container failed. ]], TaskAttempt 2 failed, info=[Container 
> container_e02_1440915851419_0007_01_04 finished with diagnostics set to 
> [Container failed. ]], TaskAttempt 3 failed, info=[Container 
> container_e02_1440915851419_0007_01_05 finished with diagnostics set to 
> [Container failed. ]]], Vertex failed as one or more tasks failed. 
> failedTasks:1, Vertex vertex_1440915851419_0007_2_01 [Reducer 2] 
> killed/failed due to:null]
> DAG failed due to vertex failure. failedVertices:1 killedVertices:0
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask
> {code}
> {code}
> 2015-08-31 22:00:27,528 WARN [TezChild] task.TezTaskRunner: Heartbeat failure 
> caused by communication failure
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException):
>  IPC server unable to read call parameters: Too many counters: 2001 max=2000
> at org.apache.hadoop.ipc.Client.call(Client.java:1469)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
> at com.sun.proxy.$Proxy9.heartbeat(Unknown Source)
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.heartbeat(TaskReporter.java:249)
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.taskFailed(TaskReporter.java:344)
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$300(TaskReporter.java:119)
> at 
> org.apache.tez.runtime.task.TaskReporter.taskFailed(TaskReporter.java:381)
> at 
> org.apache.tez.runtime.task.TezTaskRunner.sendFailure(TezTaskRunner.java:257)
> at 
> org.apache.tez.runtime.task.TezTaskRunner.access$600(TezTaskRunner.java:51)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:224)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> 

[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs

2015-09-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936021#comment-14936021
 ] 

Siddharth Seth commented on TEZ-2850:
-

That is a very good point. The checksum has already been computed/verified 
while writing the segment to a buffer. Looks like setting up the constructors 
correctly will take care of this.

> Tez MergeManager OOM for small Map Outputs
> --
>
> Key: TEZ-2850
> URL: https://issues.apache.org/jira/browse/TEZ-2850
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Saikat
>Assignee: Saikat
> Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2855) NPE while routing events

2015-09-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2855:

 Assignee: Siddharth Seth
Affects Version/s: (was: 0.8.0-alpha)
   0.5.0
 Target Version/s: 0.7.1, 0.6.3, 0.8.1  (was: 0.8.1)

This goes all the way back to 0.5.
If a Vertex initialization is delayed - likely due to a large number of 
upstream vertices, and a task from a started vertex finishes very fast which 
generates an event for the uninitialized vertex - we try handling the event 
before the VM is setup.
InputInitializerEvents are not affected - since these events are cached while a 
vertex is in state NEW.

This was hit running LLAP unit tests - were task assignment and execution can 
be faster. The faster assignment and execution allows for the condition to be 
hit.
It is possible to hit this in regular jobs as well - but less likely since 
there's generally a delay in a container getting work. Hitting it in local mode 
is possible though. Targeting the fix up to 0.6.

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2855) NPE while routing events

2015-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936074#comment-14936074
 ] 

TezQA commented on TEZ-2855:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12764327/TEZ-2855.1.txt
  against master revision 773.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1184//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1184//console

This message is automatically generated.

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz, TEZ-2855.1.txt
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2855 PreCommit Build #1184

2015-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2855
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1184/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3451 lines...]
[INFO] Final Memory: 80M/857M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12764327/TEZ-2855.1.txt
  against master revision 773.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1184//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1184//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
3425c3ab66da2d1cf2475d77001e1efc4c4e6c03 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1183
Archived 53 artifacts
Archive block size is 32768
Received 6 blocks and 3060693 bytes
Compression is 6.0%
Took 1.3 sec
Description set: TEZ-2855
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2781) Fallback to send only TaskAttemptFailedEvent if taskFailed heartbeat fails

2015-09-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936404#comment-14936404
 ] 

Siddharth Seth commented on TEZ-2781:
-

Couple of things.
- The ordering of the events should be StatusUpdate followed by 
TaskAttemptFailedEvent will cause the attempt to move into a Failed state, at 
which status updates are ignored. We'll end up dropping counters for failed 
tasks. (It looks like there's no tests which cover this - will create a 
separate jira to create such a test).
- Instead of catch (Exception) - is it possible to catch 
(LimitExceededException) - will this harm anything ?
- The test would be more robust if the number of counters generated were to be 
based on values from TezConfiguration, instead of 2000.

[~zjffdu] - in the absence of this patch, what behaviour are you seeing ? The 
processor reports failure because of the LimitExceededException. The 
TaskReporter then fails while trying to report the error to the AM - and the AM 
waits for the timeout to kill the task ?

> Fallback to send only TaskAttemptFailedEvent if taskFailed heartbeat fails
> --
>
> Key: TEZ-2781
> URL: https://issues.apache.org/jira/browse/TEZ-2781
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.4
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2781-1.patch, TEZ-2781-2.patch
>
>
> It is possible the taskFailed heartbeat fails to send to AM (due to counter 
> limitation exceed) . In that case client can not get the right diagnostic 
> info. 
> {code}
> hive> select gencounter(2500) from (select count(*) from abc) a;
> Query ID = hrt_qa_2015083122_1956a7d6-1d41-406b-9266-af56ed21883c
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1440915851419_0007)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1  SUCCEEDED  0  000   0  
>  0
> Reducer 2 FAILED  1  001   4  
>  0
> 
> VERTICES: 01/02  [>>--] 0%ELAPSED TIME: 25.44 s
> 
> Status: Failed
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1440915851419_0007_2_01, 
> diagnostics=[Task failed, taskId=task_1440915851419_0007_2_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Container 
> container_e02_1440915851419_0007_01_02 finished with diagnostics set to 
> [Container failed. ]], TaskAttempt 1 failed, info=[Container 
> container_e02_1440915851419_0007_01_03 finished with diagnostics set to 
> [Container failed. ]], TaskAttempt 2 failed, info=[Container 
> container_e02_1440915851419_0007_01_04 finished with diagnostics set to 
> [Container failed. ]], TaskAttempt 3 failed, info=[Container 
> container_e02_1440915851419_0007_01_05 finished with diagnostics set to 
> [Container failed. ]]], Vertex failed as one or more tasks failed. 
> failedTasks:1, Vertex vertex_1440915851419_0007_2_01 [Reducer 2] 
> killed/failed due to:null]
> DAG failed due to vertex failure. failedVertices:1 killedVertices:0
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask
> {code}
> {code}
> 2015-08-31 22:00:27,528 WARN [TezChild] task.TezTaskRunner: Heartbeat failure 
> caused by communication failure
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException):
>  IPC server unable to read call parameters: Too many counters: 2001 max=2000
> at org.apache.hadoop.ipc.Client.call(Client.java:1469)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
> at com.sun.proxy.$Proxy9.heartbeat(Unknown Source)
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.heartbeat(TaskReporter.java:249)
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.taskFailed(TaskReporter.java:344)
> at 
> org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$300(TaskReporter.java:119)
> at 
> org.apache.tez.runtime.task.TaskReporter.taskFailed(TaskReporter.java:381)
> at 
> org.apache.tez.runtime.task.TezTaskRunner.sendFailure(TezTaskRunner.java:257)
> at 
> org.apache.tez.runtime.task.TezTaskRunner.access$600(TezTaskRunner.java:51)
> at 
> 

[jira] [Created] (TEZ-2862) State transitions do not handle unchecked exceptions

2015-09-29 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-2862:
---

 Summary: State transitions do not handle unchecked exceptions
 Key: TEZ-2862
 URL: https://issues.apache.org/jira/browse/TEZ-2862
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Siddharth Seth
Priority: Critical


An unchecked exception such as a NPE (TEZ-2855) goes all the way up to the 
AsyncDispatcher. This causes the AM to exit without unregistering from the RM 
and moving the DAG into an ERROR state.

Ideally, unchecked exceptions should cause the DAG to error out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935854#comment-14935854
 ] 

TezQA commented on TEZ-2758:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12764200/TEZ-2758-3.patch
  against master revision 773.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1182//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1182//console

This message is automatically generated.

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2758 PreCommit Build #1182

2015-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2758
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1182/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3450 lines...]
[INFO] Final Memory: 98M/1411M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12764200/TEZ-2758-3.patch
  against master revision 773.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1182//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1182//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
c58356fe20f1d5e4d315b1852858ea06c2ea72ad logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1177
Archived 53 artifacts
Archive block size is 32768
Received 6 blocks and 3057517 bytes
Compression is 6.0%
Took 1 sec
Description set: TEZ-2758
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935873#comment-14935873
 ] 

TezQA commented on TEZ-2758:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12764200/TEZ-2758-3.patch
  against master revision 773.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1183//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1183//console

This message is automatically generated.

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2758 PreCommit Build #1183

2015-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2758
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1183/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3453 lines...]
[INFO] Final Memory: 89M/860M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12764200/TEZ-2758-3.patch
  against master revision 773.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1183//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1183//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
aa80a4e2d53294095221a3c2aa92a3da99e12051 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1182
Archived 53 artifacts
Archive block size is 32768
Received 10 blocks and 2926395 bytes
Compression is 10.1%
Took 1.8 sec
Description set: TEZ-2758
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936314#comment-14936314
 ] 

TezQA commented on TEZ-2758:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12764360/TEZ-2758-4.patch
  against master revision 773.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1185//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1185//console

This message is automatically generated.

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.7.1, 0.6.3, 0.8.1
>
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch, 
> TEZ-2758-4.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2855) NPE while routing events

2015-09-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936394#comment-14936394
 ] 

Siddharth Seth commented on TEZ-2855:
-

Thanks for the review. I don't think it matters that events are processed in 
the middle of or at the end of initialization. All that matters is that the 
VMPlugin is setup and has been initialized.  VertexManagers will typically 
subscribe for CONFIGURED notifications, before they try to obtain information 
like num tasks. There's also additional notifications like parallelism updated 
which they can subscribe to. Moving it to the end of Initialized doesn't 
necessarily work - since initialization can be deferred till edge managers etc 
are setup.
Updating the patch with the unnecessary v3 INIT removed. Please take another 
look.

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz, TEZ-2855.1.txt
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2855) NPE while routing events

2015-09-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2855:

Attachment: TEZ-2855.2.txt

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz, TEZ-2855.1.txt, TEZ-2855.2.txt
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2855) NPE while routing events

2015-09-29 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936160#comment-14936160
 ] 

Jeff Zhang commented on TEZ-2855:
-

Comments:
* Should we put the VertexManager initialization code at the end of vertex 
initialization ? Because due to the code change of this patch, vertex may 
handle VMEvent when it is still uninitialized. That means 
VM#onVertexManagerEvent may not behave correctly as some info of the 
VertexManagerContext may not set correctly yet (such as numTasks)
* TestVertexImpl, it is not necessary to send init to V3 explicitly. V3's init 
should be triggered by the init of V1 & V2
{code}
dispatcher.getEventHandler().handle(new VertexEvent(v3.getVertexId(),
VertexEventType.V_INIT));  // remove this
{code}

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz, TEZ-2855.1.txt
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2758:

Attachment: TEZ-2758-4.patch

Minor update, committing soon

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch, 
> TEZ-2758-4.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-2758 PreCommit Build #1185

2015-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2758
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1185/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3451 lines...]
[INFO] Final Memory: 90M/932M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12764360/TEZ-2758-4.patch
  against master revision 773.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1185//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1185//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
784146cfa51cee62de74a8443b119b559aa8ae72 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #1184
Archived 53 artifacts
Archive block size is 32768
Received 24 blocks and 2469429 bytes
Compression is 24.2%
Took 1.1 sec
Description set: TEZ-2758
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2758:

Fix Version/s: 0.8.1
   0.7.1

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.7.1, 0.6.3, 0.8.1
>
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch, 
> TEZ-2758-4.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936277#comment-14936277
 ] 

Jeff Zhang commented on TEZ-2758:
-

Committed to 0.6/0.7/master

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.6.3
>
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch, 
> TEZ-2758-4.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2716) DefaultSorter.isRleNeeded not thread safe

2015-09-29 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935678#comment-14935678
 ] 

Jonathan Eagles commented on TEZ-2716:
--

[~rajesh.balamohan], I don't have context on the severity of this issue. Does 
this cause task failures in certain conditions?

> DefaultSorter.isRleNeeded not thread safe
> -
>
> Key: TEZ-2716
> URL: https://issues.apache.org/jira/browse/TEZ-2716
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Siddharth Seth
>Assignee: Rajesh Balamohan
> Fix For: 0.7.1, 0.8.1
>
> Attachments: TEZ-2716.1.patch, TEZ-2716.2.patch, 
> TEZ-2716.branch-0.6-and-0.5.patch
>
>
> TEZ-1997.
> Should be targeted at the same set of versions that TEZ-1997 goes into.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-2758 PreCommit Build #1181

2015-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-2758
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1181/

###
## LAST 60 LINES OF THE CONSOLE 
###
Started by user hitesh
[EnvInject] - Loading node environment variables.
Building remotely on H7 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs) in 
workspace /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://git-wip-us.apache.org/repos/asf/tez.git 
 > # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/tez.git
 > git --version # timeout=10
 > git fetch --tags --progress https://git-wip-us.apache.org/repos/asf/tez.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision 77312f8ea586939cb85c140c94251162e731 
(refs/remotes/origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 77312f8ea586939cb85c140c94251162e731
 > git rev-list 8b412ee66fe042db60a567ff71639839af5fa854 # timeout=10
No emails were triggered.
[PreCommit-TEZ-Build] $ /bin/bash /tmp/hudson3999682515022635994.sh
Running in Jenkins mode


==
==
Testing patch for TEZ-2758.
==
==


HEAD is now at 773 TEZ-2851. Support a way for upstream applications to 
pass in a caller context to Tez. (hitesh)
Previous HEAD position was 773... TEZ-2851. Support a way for upstream 
applications to pass in a caller context to Tez. (hitesh)
Switched to branch 'master'
Your branch is behind 'origin/master' by 5 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
First, rewinding head to replay your work on top of it...
Fast-forwarded master to 77312f8ea586939cb85c140c94251162e731.
TEZ-2758 is not "Patch Available".  Exiting.


==
==
Finished build.
==
==


Archiving artifacts
ERROR: No artifacts found that match the file pattern "patchprocess/*.*". 
Configuration error?
ERROR: ?patchprocess/*.*? doesn?t match anything, but ?*.*? does. Perhaps 
that?s what you mean?
Build step 'Archive the artifacts' changed build result to FAILURE
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (TEZ-2758) Remove append API in RecoveryService after TEZ-1909

2015-09-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935763#comment-14935763
 ] 

Hitesh Shah commented on TEZ-2758:
--

Re-triggered precommit.

> Remove append API in RecoveryService after TEZ-1909
> ---
>
> Key: TEZ-2758
> URL: https://issues.apache.org/jira/browse/TEZ-2758
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.6.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-2758-1.patch, TEZ-2758-2.patch, TEZ-2758-3.patch
>
>
> After TEZ-1909, there would be no case for append recovery file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)