[jira] [Updated] (TEZ-2855) NPE while routing events

2015-09-30 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2855:

Attachment: TEZ-2855.3.txt

Thanks for the reviews. Updated the patch to send one more event after moving 
into the INITED state. Committing in a bit, will post patches for 0.7 and 0.6 
in a while as well.

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz, TEZ-2855.1.txt, TEZ-2855.2.txt, 
> TEZ-2855.3.txt
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2855) NPE while routing events

2015-09-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2855:

Attachment: TEZ-2855.2.txt

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz, TEZ-2855.1.txt, TEZ-2855.2.txt
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2855) NPE while routing events

2015-09-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2855:

Attachment: TEZ-2855.1.txt

Patch for master to fix the VM NPE.
On the logging changes - that's a bigger problem since we aren't handling 
RuntimeExceptions - created TEZ-2862 to track this. For exceptions we do handle 
- the vertex name and id is already logged.

[~bikassaha], [~hitesh], [~zjffdu] - please review.

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz, TEZ-2855.1.txt
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2855) NPE while routing events

2015-09-29 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2855:

 Assignee: Siddharth Seth
Affects Version/s: (was: 0.8.0-alpha)
   0.5.0
 Target Version/s: 0.7.1, 0.6.3, 0.8.1  (was: 0.8.1)

This goes all the way back to 0.5.
If a Vertex initialization is delayed - likely due to a large number of 
upstream vertices, and a task from a started vertex finishes very fast which 
generates an event for the uninitialized vertex - we try handling the event 
before the VM is setup.
InputInitializerEvents are not affected - since these events are cached while a 
vertex is in state NEW.

This was hit running LLAP unit tests - were task assignment and execution can 
be faster. The faster assignment and execution allows for the condition to be 
hit.
It is possible to hit this in regular jobs as well - but less likely since 
there's generally a delay in a container getting work. Hitting it in local mode 
is possible though. Targeting the fix up to 0.6.

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2855) NPE while routing events

2015-09-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2855:

Attachment: 2855log.gz

Logs.

> NPE while routing events
> 
>
> Key: TEZ-2855
> URL: https://issues.apache.org/jira/browse/TEZ-2855
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.0-alpha
>Reporter: Siddharth Seth
>Priority: Critical
> Attachments: 2855log.gz
>
>
> Observed while running against 0.8.0-alpha. This will likely affect 0.7 as 
> well - that'll be known after debugging.
> {code}
> 2015-09-24T12:13:42,675 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:4429)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$4000(VertexImpl.java:203) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4175)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:4167)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  ~[hadoop-yarn-common-2.6.0.jar:?]
>   at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1906) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202) 
> ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2069)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2055)
>  ~[TezAppJar.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) 
> [tez-common-0.8.0-alpha.jar:0.8.0-alpha]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> 2015-09-24T12:13:42,681 INFO [HistoryEventHandlingThread] 
> impl.SimpleHistoryLoggingService: Writing event TASK_ATTEMPT_FINISHED to 
> history file
> {code}
> Looks like the VertexManager was null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)