date:20141029

[
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188050#comment-14188050
]

Siddharth Seth commented on TEZ-1703:
-

[~zjffdu] - I don't think we should be changing the StateChangeNotifier at all
as part of this patch. That's just a mechanism for notifying interested
entities on when the state of vertices / tasks changes. The StateChangeNotifier
has no context beyond this - whether the notification was sent out to a
VMPlugin / EMPlugin / InputInitializer or maybe other entities later on. It
can't really take a decision on what needs to be done in case of failure. It's
not a public API and is meant to be invoked by Tez internal components - which
would have context information on how to handle errors.

onStateUpdated(VertexStateUpdate) in RootInputInitializerManager can just catch
the exception from the user code and inform the Vertex via an event -
indicating ROOT_INPUT_INITIALIZER failures (VertexEventRootInputFailed). It
could potentially interrupt the corresponding Initializer thread as well - but
that will eventually happen via the state machines in any case.

Similarly for handleInputInitializerEvents and onTaskSucceeded (sendEvents).
These exceptions should not make it back to the stateChangeNotifier since it
wouldn't know how to handle them. Eventually, the InputInitializerManager will
likely have a separate thread to send the events to the user (instead of using
the AsyncDispatcher thread / statenotifier thread). It'll be better to use the
same mechanism of sending a VertexEventRootInputFailed event IMHO.

- Does DAG have to change to add getAppContext() ?

Exception handling for InputInitializer
---

Key: TEZ-1703
URL: https://issues.apache.org/jira/browse/TEZ-1703
Project: Apache Tez
Issue Type: Bug
Reporter: Jeff Zhang
Attachments: TEZ-1703.patch

For handleInputInitializerEvent - this should be fairly straightfoward to
handle. At the moment this is an inline call from within the AsyncDispatcher,
and will end up causing a RuntimeException. The RuntimeException can be
changed to a AMUserCodeException which will take care of this.
For onVertexStateUpdated, this eventually gets invoked from within
RootInputInitializerManager. Catching exceptions there and sending a
RootInputInitialzierFailedEvent should be enough to fix this ? May require
some state machine changes to handle this event on a few more states.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188079#comment-14188079
 ] 

Rajesh Balamohan commented on TEZ-1547:
---

Issue is already captured in TEZ-1714

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TEZ-1711) Don't cache outputSpecList in VertexImpl.getOutputSpecList(taskIndex)


 [ 
https://issues.apache.org/jira/browse/TEZ-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang reassigned TEZ-1711:
---

Assignee: Jeff Zhang

 Don't cache outputSpecList in VertexImpl.getOutputSpecList(taskIndex)
 -

 Key: TEZ-1711
 URL: https://issues.apache.org/jira/browse/TEZ-1711
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-1711.patch


 It would cache the outputSpecList in its 
 VertexImpl.getOutputSepcList(taskIndex), but I don't think we should cache it 
 as it depends on the taskIndex, although in all the EdgeManagerPlugin 
 Implementations, the value is the same no matter what the taskIndex is. But 
 it has risk that if we have a new EdgeManagerPlugin that has different 
 behavior. Or if this case would never happens, then just remove the taskIndex 
 from the method parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1711) Don't cache outputSpecList in VertexImpl.getOutputSpecList(taskIndex)


[ 
https://issues.apache.org/jira/browse/TEZ-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188285#comment-14188285
 ] 

Jeff Zhang commented on TEZ-1711:
-

Attach patch.  
* Just remove the cache in get getOutputSpecList
* One unit test is affected, fix it.

[~sseth], [~bikassaha] please help review.  

 Don't cache outputSpecList in VertexImpl.getOutputSpecList(taskIndex)
 -

 Key: TEZ-1711
 URL: https://issues.apache.org/jira/browse/TEZ-1711
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-1711.patch


 It would cache the outputSpecList in its 
 VertexImpl.getOutputSepcList(taskIndex), but I don't think we should cache it 
 as it depends on the taskIndex, although in all the EdgeManagerPlugin 
 Implementations, the value is the same no matter what the taskIndex is. But 
 it has risk that if we have a new EdgeManagerPlugin that has different 
 behavior. Or if this case would never happens, then just remove the taskIndex 
 from the method parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1703) Exception handling for InputInitializer


 [ 
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1703:

Attachment: TEZ-1703-2.patch

 Exception handling for InputInitializer
 ---

 Key: TEZ-1703
 URL: https://issues.apache.org/jira/browse/TEZ-1703
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
 Attachments: TEZ-1703-2.patch, TEZ-1703.patch


 For handleInputInitializerEvent - this should be fairly straightfoward to 
 handle. At the moment this is an inline call from within the AsyncDispatcher, 
 and will end up causing a RuntimeException. The RuntimeException can be 
 changed to a AMUserCodeException which will take care of this.
 For onVertexStateUpdated, this eventually gets invoked from within 
 RootInputInitializerManager. Catching exceptions there and sending a 
 RootInputInitialzierFailedEvent should be enough to fix this ? May require 
 some state machine changes to handle this event on a few more states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1703) Exception handling for InputInitializer


[ 
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188297#comment-14188297
 ] 

Jeff Zhang commented on TEZ-1703:
-

[~sseth] Thanks for your review and suggestion.

You are right, catch the exception in RootInputInitializerManager and send 
VertexEventRootInputFailed would be much more simple and clean.
I attach a new patch, please help review.

bq. Does DAG have to change to add getAppContext() ?
Revert the change as we don't need it in the new patch.

 Exception handling for InputInitializer
 ---

 Key: TEZ-1703
 URL: https://issues.apache.org/jira/browse/TEZ-1703
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
 Attachments: TEZ-1703-2.patch, TEZ-1703.patch


 For handleInputInitializerEvent - this should be fairly straightfoward to 
 handle. At the moment this is an inline call from within the AsyncDispatcher, 
 and will end up causing a RuntimeException. The RuntimeException can be 
 changed to a AMUserCodeException which will take care of this.
 For onVertexStateUpdated, this eventually gets invoked from within 
 RootInputInitializerManager. Catching exceptions there and sending a 
 RootInputInitialzierFailedEvent should be enough to fix this ? May require 
 some state machine changes to handle this event on a few more states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1689) Exception handling for EdgeManagerPlugin


 [ 
https://issues.apache.org/jira/browse/TEZ-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1689:

Attachment: TEZ-1689-addendum.patch

 Exception handling for EdgeManagerPlugin
 

 Key: TEZ-1689
 URL: https://issues.apache.org/jira/browse/TEZ-1689
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical
 Fix For: 0.5.2

 Attachments: TEZ-1689-2.patch, TEZ-1689-3.patch, TEZ-1689-4.patch, 
 TEZ-1689-addendum.patch, TEZ-1689.patch


 EdgeManagePlugin and InputInitializer are both user code which could lead 
 exception, we should handle it, fail the DAG and display the exception in 
 client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1689) Exception handling for EdgeManagerPlugin


[ 
https://issues.apache.org/jira/browse/TEZ-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188302#comment-14188302
 ] 

Jeff Zhang commented on TEZ-1689:
-

Attach the addendum patch to fix the unit test

commit 1ffbc1935646f7c422b551e6e0ffdc001311d074 (HEAD, origin/master, 
origin/HEAD, master, TEZ-1711, TEZ-1703, TEZ-1689)
Author: Jeff Zhang zjf...@apache.org
Date:   Wed Oct 29 18:58:49 2014 +0800

TEZ-1689. addendum - fix unit test failure. (zjffdu)


 Exception handling for EdgeManagerPlugin
 

 Key: TEZ-1689
 URL: https://issues.apache.org/jira/browse/TEZ-1689
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical
 Fix For: 0.5.2

 Attachments: TEZ-1689-2.patch, TEZ-1689-3.patch, TEZ-1689-4.patch, 
 TEZ-1689-addendum.patch, TEZ-1689.patch


 EdgeManagePlugin and InputInitializer are both user code which could lead 
 exception, we should handle it, fail the DAG and display the exception in 
 client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1703) Exception handling for InputInitializer


[ 
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188306#comment-14188306
 ] 

Jeff Zhang commented on TEZ-1703:
-

BTW, I think we should rename RootInputInitializerManager to 
InputInitializerManager because not only root vertex has InputInitializer, 
non-root vertex can also have InputInitializer.  

 Exception handling for InputInitializer
 ---

 Key: TEZ-1703
 URL: https://issues.apache.org/jira/browse/TEZ-1703
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-1703-2.patch, TEZ-1703.patch


 For handleInputInitializerEvent - this should be fairly straightfoward to 
 handle. At the moment this is an inline call from within the AsyncDispatcher, 
 and will end up causing a RuntimeException. The RuntimeException can be 
 changed to a AMUserCodeException which will take care of this.
 For onVertexStateUpdated, this eventually gets invoked from within 
 RootInputInitializerManager. Catching exceptions there and sending a 
 RootInputInitialzierFailedEvent should be enough to fix this ? May require 
 some state machine changes to handle this event on a few more states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-1719) Allow IFile reducer merge-sort to disable crc32 checksums

2014-10-29 Thread Gopal V (JIRA)

Gopal V created TEZ-1719:


 Summary: Allow IFile reducer merge-sort to disable crc32 checksums
 Key: TEZ-1719
 URL: https://issues.apache.org/jira/browse/TEZ-1719
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V


Next-gen filesystems like BTRFS and ZFS provide their own checksumming for disk 
data.

Using PureJavaCrc32 for data written for temporary spills to such filesystems 
is a complete waste of CPU resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1719) Allow IFile reducer merge-sort to disable crc32 checksums

2014-10-29 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1719:
-
Labels: Performance  (was: )

 Allow IFile reducer merge-sort to disable crc32 checksums
 -

 Key: TEZ-1719
 URL: https://issues.apache.org/jira/browse/TEZ-1719
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
  Labels: Performance

 Next-gen filesystems like BTRFS and ZFS provide their own checksumming for 
 disk data.
 Using PureJavaCrc32 for data written for temporary spills to such filesystems 
 is a complete waste of CPU resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


 [ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1547:

Attachment: TEZ-1547.6.patch

Patch comments out the done notification for now until TEZ-1714 is fixed. This 
will not affect functionality.

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


 [ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1547:

Attachment: TEZ-1547.6.patch

Rebased

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


 [ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1547:

Attachment: (was: TEZ-1547.6.patch)

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


 [ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1547:

Attachment: (was: TEZ-1547.6.patch)

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1702) Hive : With Auto reduce parallelism enabled TPC-DS query 31 gets stuck in Reducer 12

2014-10-29 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188758#comment-14188758
 ] 

Mostafa Mokhtar commented on TEZ-1702:
--

[~rajesh.balamohan]
Query runs fine on latest.


 Hive : With Auto reduce parallelism enabled TPC-DS query 31 gets stuck in 
 Reducer 12 
 -

 Key: TEZ-1702
 URL: https://issues.apache.org/jira/browse/TEZ-1702
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Mostafa Mokhtar
Priority: Critical
 Attachments: Logs for container_1414029100044_0150_01_01.zip, 
 json_events.log, query31_logs_stuck.txt.gz, tez-1702-am.log


 Issue found in branch-0.5 , with latest commit as 
 {code}
 commit 2e65de88af709d30207403fea881b697a4853dd6
 Author: Bikas Saha bi...@apache.org
 Date:   Tue Oct 21 14:59:56 2014 -0700
 {code}
 Running TPC-DS Query 31 with Auto reduce parallelism enabled the query gets 
 stuck in Reducer 12 
 Call Stack for stuck thread
 {code}
 Thread 14575: (state = BLOCKED)
  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
 line=186 (Interpreted frame)
  - 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
 @bci=42, line=2043 (Interpreted frame)
  - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 
 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput()
  @bci=67, line=663 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput()
  @bci=26, line=176 (Interpreted frame)
  - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next() 
 @bci=30, line=117 (Interpreted frame)
  - 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainer[],
  org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe[]) 
 @bci=259, line=112 (Compiled frame)
  - org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable() @bci=86, 
 line=190 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(java.lang.Object, 
 int) @bci=12, line=244 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, 
 line=815 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(java.lang.Object, 
 int) @bci=121, line=84 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, 
 line=815 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(java.lang.Object[], 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator$AggregationBuffer[])
  @bci=97, line=1072 (Interpreted frame)
  - 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector, 
 org.apache.hadoop.hive.ql.exec.KeyWrapper) @bci=71, line=881 (Interpreted 
 frame)
  - 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=34, 
 line=741 (Interpreted frame)
   
   
  222,0-1   79%
  - 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector, 
 org.apache.hadoop.hive.ql.exec.KeyWrapper) @bci=71, line=881 (Interpreted 
 frame)
  - 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=34, 
 line=741 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(java.lang.Object, 
 int) @bci=457, line=809 (Interpreted frame)
  - 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processKeyValues(java.lang.Iterable,
  byte) @bci=174, line=308 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord() 
 @bci=218, line=252 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run() @bci=155, 
 line=168 (Interpreted frame)
  - 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map,
  java.util.Map) @bci=224, line=163 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, 
 java.util.Map) @bci=86, line=138 (Interpreted frame)
  -

[jira] [Resolved] (TEZ-1702) Hive : With Auto reduce parallelism enabled TPC-DS query 31 gets stuck in Reducer 12

2014-10-29 Thread Mostafa Mokhtar (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar resolved TEZ-1702.
--
Resolution: Fixed

 Hive : With Auto reduce parallelism enabled TPC-DS query 31 gets stuck in 
 Reducer 12 
 -

 Key: TEZ-1702
 URL: https://issues.apache.org/jira/browse/TEZ-1702
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Mostafa Mokhtar
Priority: Critical
 Attachments: Logs for container_1414029100044_0150_01_01.zip, 
 json_events.log, query31_logs_stuck.txt.gz, tez-1702-am.log


 Issue found in branch-0.5 , with latest commit as 
 {code}
 commit 2e65de88af709d30207403fea881b697a4853dd6
 Author: Bikas Saha bi...@apache.org
 Date:   Tue Oct 21 14:59:56 2014 -0700
 {code}
 Running TPC-DS Query 31 with Auto reduce parallelism enabled the query gets 
 stuck in Reducer 12 
 Call Stack for stuck thread
 {code}
 Thread 14575: (state = BLOCKED)
  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
 line=186 (Interpreted frame)
  - 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
 @bci=42, line=2043 (Interpreted frame)
  - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 
 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput()
  @bci=67, line=663 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput()
  @bci=26, line=176 (Interpreted frame)
  - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next() 
 @bci=30, line=117 (Interpreted frame)
  - 
 org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainer[],
  org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe[]) 
 @bci=259, line=112 (Compiled frame)
  - org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable() @bci=86, 
 line=190 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(java.lang.Object, 
 int) @bci=12, line=244 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, 
 line=815 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(java.lang.Object, 
 int) @bci=121, line=84 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.Operator.forward(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=63, 
 line=815 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(java.lang.Object[], 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator$AggregationBuffer[])
  @bci=97, line=1072 (Interpreted frame)
  - 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector, 
 org.apache.hadoop.hive.ql.exec.KeyWrapper) @bci=71, line=881 (Interpreted 
 frame)
  - 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=34, 
 line=741 (Interpreted frame)
   
   
  222,0-1   79%
  - 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector, 
 org.apache.hadoop.hive.ql.exec.KeyWrapper) @bci=71, line=881 (Interpreted 
 frame)
  - 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(java.lang.Object, 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector) @bci=34, 
 line=741 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(java.lang.Object, 
 int) @bci=457, line=809 (Interpreted frame)
  - 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processKeyValues(java.lang.Iterable,
  byte) @bci=174, line=308 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord() 
 @bci=218, line=252 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run() @bci=155, 
 line=168 (Interpreted frame)
  - 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map,
  java.util.Map) @bci=224, line=163 (Interpreted frame)
  - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map, 
 java.util.Map) @bci=86, line=138 (Interpreted frame)
  - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=76, 
 line=324 (Interpreted frame)
  -

[jira] [Created] (TEZ-1720) Allow filters in all tables and also to pass in filters using url params

2014-10-29 Thread Prakash Ramachandran (JIRA)

Prakash Ramachandran created TEZ-1720:
-

 Summary: Allow filters in all tables and also to pass in filters 
using url params
 Key: TEZ-1720
 URL: https://issues.apache.org/jira/browse/TEZ-1720
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran


Need to make sure that all the tables in the ui can use filters and allow them 
to be set through url. this is needed for showing for ex the failed tasks for a 
dag/vertex etc and to bookmark searches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1720) Allow filters in all tables and also to pass in filters using url params

2014-10-29 Thread Prakash Ramachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1720:
--
Attachment: tez-1720.1.patch

- added filtering to all tables using url params.


 Allow filters in all tables and also to pass in filters using url params
 

 Key: TEZ-1720
 URL: https://issues.apache.org/jira/browse/TEZ-1720
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
 Attachments: tez-1720.1.patch


 Need to make sure that all the tables in the ui can use filters and allow 
 them to be set through url. this is needed for showing for ex the failed 
 tasks for a dag/vertex etc and to bookmark searches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


 [ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1547:

Attachment: TEZ-1547.7.patch

Fixing newly added master branch tests to work with new changes.

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1716) Additional ATS data for UI


 [ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1716:
-
Summary: Additional ATS data for UI  (was: Add failed attempts info to 
History at Vertex and DAG level.)

 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1716) Additional ATS data for UI


 [ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1716:
-
Description: 
Add failed and killed attempt info at DAG and Vertex Level.
Add tez-site configuration contents to Tez App Entity

 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah

 Add failed and killed attempt info at DAG and Vertex Level.
 Add tez-site configuration contents to Tez App Entity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1703) Exception handling for InputInitializer

[
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188956#comment-14188956
]

Siddharth Seth commented on TEZ-1703:
-

Comments on the patch.

{code}
-DAGTerminationCause.VERTEX_FAILURE,
-vertexEvent.getVertexTerminationCause() == null ?
VertexTerminationCause.OTHER_VERTEX_FAILURE
-: vertexEvent.getVertexTerminationCause());
+DAGTerminationCause.VERTEX_FAILURE,
VertexTerminationCause.OTHER_VERTEX_FAILURE);
{code}
This is required so that all vertices don't get the same termination cause as
the first vertex to fail ?
We should remove getVertexTerminationCause in a follow up jira, since that
seems to be of no use.

{code}
+String diagnosticMsg = Vertex failed/killed due to
VertexManagerPlugin/EdgeManagerPlugin failed.
{code}
Will inputInitializer failures never go through this transition ? It may be
better to set this up based on the SOURCE information available in the
exception.

There's some race conditions possible in the InputInitialzier.
Prior to the patch
- It's possible for events/notifications to be sent to a complete Initializer
since the initializers / events are handled in separate threads. The
setComplete() and isComplete checks aren't sufficient to avoid this.
- Ideally, completed initializers should just handle these events gracefully,
but that's not something that Tez can guarantee. We need to handle such
situations, likely in a separate jira.

With the patch,
It's possible for a INITIALIZER_FAILED event to go out after an
INITIALIZER_SUCCESS goes out. Sequence: T1: initializer running, T2:
eventReceived/VertexUpdateReceived, throws Exception. T1: completes (the event
could be partially handled which triggers completion of initialize()).
Similarly it's possible to get INITILZIER_SUCCEEDED messages after a
INITIALIER_FAILED message (in a FAILEd etc state). This isn't as harmful.
This means we could end up getting INITIALIZER_FAILED messages in the INITED /
RUNNING and possibly other states.

The state machine in VertexImpl will need to change to handle INITIALIER_FAILED
in some more states, and fail the vertex.

Exception handling for InputInitializer
---

Key: TEZ-1703
URL: https://issues.apache.org/jira/browse/TEZ-1703
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Attachments: TEZ-1703-2.patch, TEZ-1703.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1703) Exception handling for InputInitializer


[ 
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188974#comment-14188974
 ] 

Siddharth Seth commented on TEZ-1703:
-

And +1 for renaming the file. Please do that just before the commit though - 
not as part of iterative patches.

 Exception handling for InputInitializer
 ---

 Key: TEZ-1703
 URL: https://issues.apache.org/jira/browse/TEZ-1703
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-1703-2.patch, TEZ-1703.patch


 For handleInputInitializerEvent - this should be fairly straightfoward to 
 handle. At the moment this is an inline call from within the AsyncDispatcher, 
 and will end up causing a RuntimeException. The RuntimeException can be 
 changed to a AMUserCodeException which will take care of this.
 For onVertexStateUpdated, this eventually gets invoked from within 
 RootInputInitializerManager. Catching exceptions there and sending a 
 RootInputInitialzierFailedEvent should be enough to fix this ? May require 
 some state machine changes to handle this event on a few more states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1666) UserPayload should be null if the payload is not specified


[ 
https://issues.apache.org/jira/browse/TEZ-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188981#comment-14188981
 ] 

Hitesh Shah commented on TEZ-1666:
--

+1. With the version check in place, 0.5.1 clients cannot use the 0.5.2 tarball 
in any case. I think we can assume everyone is using the same version of client 
jars with the HDFS tarball ( but may need some updates in the INSTALL 
instructions ). 

 UserPayload should be null if the payload is not specified
 --

 Key: TEZ-1666
 URL: https://issues.apache.org/jira/browse/TEZ-1666
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-1666.1.txt, TEZ-1666.2.txt


 As an example in the ProcessorDescriptor - if no payload is specified, 
 context.getUserPayload should return null.
 SleepProcessor has an explicit check for a null payload, to enable default 
 sleep value - which fails.
 Marking as critical since this is an API behaviour inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-1721) Update INSTALL instructions for clarifying tez client jars compatibility with runtime tarball on HDFS

Hitesh Shah created TEZ-1721:


 Summary: Update INSTALL instructions for clarifying tez client 
jars compatibility with runtime tarball on HDFS
 Key: TEZ-1721
 URL: https://issues.apache.org/jira/browse/TEZ-1721
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1721) Update INSTALL instructions for clarifying tez client jars compatibility with runtime tarball on HDFS


 [ 
https://issues.apache.org/jira/browse/TEZ-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1721:
-
Priority: Critical  (was: Major)

 Update INSTALL instructions for clarifying tez client jars compatibility with 
 runtime tarball on HDFS
 -

 Key: TEZ-1721
 URL: https://issues.apache.org/jira/browse/TEZ-1721
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1721) Update INSTALL instructions for clarifying tez client jars compatibility with runtime tarball on HDFS


 [ 
https://issues.apache.org/jira/browse/TEZ-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1721:
-
Target Version/s: 0.5.2

 Update INSTALL instructions for clarifying tez client jars compatibility with 
 runtime tarball on HDFS
 -

 Key: TEZ-1721
 URL: https://issues.apache.org/jira/browse/TEZ-1721
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189059#comment-14189059
 ] 

Bikas Saha commented on TEZ-1547:
-

[~rajesh.balamohan] [~sseth] [~hitesh] Please review.

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189059#comment-14189059
 ] 

Bikas Saha edited comment on TEZ-1547 at 10/29/14 9:50 PM:
---

[~rajesh.balamohan] [~sseth] [~hitesh] Please review.
The patch has a small issue of the canInit() precondition check to be inside 
the try block. Will fix in the next iteration.


was (Author: bikassaha):
[~rajesh.balamohan] [~sseth] [~hitesh] Please review.

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189085#comment-14189085
 ] 

Siddharth Seth commented on TEZ-1547:
-

In process. The patch has a bunch of changes unrelated to state notification 
which we've discussed in the past. Can probably close 2-3 old jiras after this 
goes in.

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189115#comment-14189115
 ] 

Bikas Saha commented on TEZ-1547:
-

Patch 1 has just the state change notification and can be reviewed separately. 
Subsequent patches add usage for the notifications but I lost my git history 
due to hard drive loss. So I put up the combined patch next.

 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1699) Vertex.setParallelism should throw an exception for invalid invocations


 [ 
https://issues.apache.org/jira/browse/TEZ-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1699:

Attachment: TEZ-1699.1.patch

Removes the boolean return value and throws exceptions instead. Tests added. 
Marked incompatible. AFAIK noone uses the return value. So its best to remove 
it now.

[~sseth] [~hitesh] Please review.

 Vertex.setParallelism should throw an exception for invalid invocations
 ---

 Key: TEZ-1699
 URL: https://issues.apache.org/jira/browse/TEZ-1699
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Priority: Critical
 Attachments: TEZ-1699.1.patch


 There is a return value of false when setParallelism is not successful. 
 However that may be ignored and in some cases the invocation is actually 
 incorrect and its better to throw an exception than return false. Throwing an 
 unchecked exception can allow doing this compatibly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1700) Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based affinity


[ 
https://issues.apache.org/jira/browse/TEZ-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189155#comment-14189155
 ] 

Hitesh Shah commented on TEZ-1700:
--

Comments: 
  
Not sure if this is really safe from a binary compatibility point of view. 
Might be worth testing a job that uses TaskLocationHints compiled against 0.5.0 
and run using a 0.5.2-SNAPSHOT runtime. 

{code}
+if (affinitizedTask != null) {
+  if (affinitizedTask.getTaskIndex() != 
other.affinitizedTask.getTaskIndex()) {
+return false;
+  } else if 
(!affinitizedTask.getVertexName().equals(other.affinitizedTask.getVertexName()))
 {
 return false;
   }
-} else if (other.containerId != null) {
+} else if (other.affinitizedTask != null) {
   return false;
 }
{code}
   - I believe the other.affinitizedTask != null should be done earlier 
before doing the != comparisons for vertex name and task index

{code}
+  taskScheduler.allocateTask(taskAttempt,
+  event.getCapability(),
+  taskAttempt.getAssignedContainerID(),
+  Priority.newInstance(event.getPriority()),
+  event.getContainerContext(),
+  event);
{code}
  -  should this be using the affinityAttempt's container id? If the unit tests 
are not catching this, maybe add one more test? 






 Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based 
 affinity
 

 Key: TEZ-1700
 URL: https://issues.apache.org/jira/browse/TEZ-1700
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1700.1.patch


 Today 1-1 dependencies are affinitized by creating a task location hint with 
 the producer task container id. It can be created by affinitizing to the 
 producer task-index+vertexname combination instead and internally Tez can map 
 it to the container. This also allows this dependency to be specified before 
 the container is assigned. This allows the dependency to be generic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-1700) Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based affinity


[ 
https://issues.apache.org/jira/browse/TEZ-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189155#comment-14189155
 ] 

Hitesh Shah edited comment on TEZ-1700 at 10/29/14 10:32 PM:
-

Comments: 
  
Not sure if this is really safe from a binary compatibility point of view. 
Might be worth testing a job that uses TaskLocationHints compiled against 0.5.0 
and run using a 0.5.2-SNAPSHOT runtime. 

{code}
+if (affinitizedTask != null) {
+  if (affinitizedTask.getTaskIndex() != 
other.affinitizedTask.getTaskIndex()) {
+return false;
+  } else if 
(!affinitizedTask.getVertexName().equals(other.affinitizedTask.getVertexName()))
 {
 return false;
   }
-} else if (other.containerId != null) {
+} else if (other.affinitizedTask != null) {
   return false;
 }
{code}
   - I believe the other.affinitizedTask != null check should be done earlier 
before doing the != comparisons for vertex name and task index

{code}
+  taskScheduler.allocateTask(taskAttempt,
+  event.getCapability(),
+  taskAttempt.getAssignedContainerID(),
+  Priority.newInstance(event.getPriority()),
+  event.getContainerContext(),
+  event);
{code}
  -  should this be using the affinityAttempt's container id? If the unit tests 
are not catching this, maybe add one more test? 







was (Author: hitesh):
Comments: 
  
Not sure if this is really safe from a binary compatibility point of view. 
Might be worth testing a job that uses TaskLocationHints compiled against 0.5.0 
and run using a 0.5.2-SNAPSHOT runtime. 

{code}
+if (affinitizedTask != null) {
+  if (affinitizedTask.getTaskIndex() != 
other.affinitizedTask.getTaskIndex()) {
+return false;
+  } else if 
(!affinitizedTask.getVertexName().equals(other.affinitizedTask.getVertexName()))
 {
 return false;
   }
-} else if (other.containerId != null) {
+} else if (other.affinitizedTask != null) {
   return false;
 }
{code}
   - I believe the other.affinitizedTask != null should be done earlier 
before doing the != comparisons for vertex name and task index

{code}
+  taskScheduler.allocateTask(taskAttempt,
+  event.getCapability(),
+  taskAttempt.getAssignedContainerID(),
+  Priority.newInstance(event.getPriority()),
+  event.getContainerContext(),
+  event);
{code}
  -  should this be using the affinityAttempt's container id? If the unit tests 
are not catching this, maybe add one more test? 






 Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based 
 affinity
 

 Key: TEZ-1700
 URL: https://issues.apache.org/jira/browse/TEZ-1700
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1700.1.patch


 Today 1-1 dependencies are affinitized by creating a task location hint with 
 the producer task container id. It can be created by affinitizing to the 
 producer task-index+vertexname combination instead and internally Tez can map 
 it to the container. This also allows this dependency to be specified before 
 the container is assigned. This allows the dependency to be generic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1716) Additional ATS data for UI


 [ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1716:
-
Attachment: TEZ-1716.1.patch

[~bikassaha] [~sseth] [~gopalv] please review.

 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1716.1.patch


 Add failed and killed attempt info at DAG and Vertex Level.
 Add tez-site configuration contents to Tez App Entity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1720) Allow filters in all tables and also to pass in filters using url params


 [ 
https://issues.apache.org/jira/browse/TEZ-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1720:
-
Fix Version/s: 0.6.0

 Allow filters in all tables and also to pass in filters using url params
 

 Key: TEZ-1720
 URL: https://issues.apache.org/jira/browse/TEZ-1720
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
 Fix For: 0.6.0

 Attachments: tez-1720.1.patch


 Need to make sure that all the tables in the ui can use filters and allow 
 them to be set through url. this is needed for showing for ex the failed 
 tasks for a dag/vertex etc and to bookmark searches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1720) Allow filters in all tables and also to pass in filters using url params


[ 
https://issues.apache.org/jira/browse/TEZ-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189211#comment-14189211
 ] 

Hitesh Shah commented on TEZ-1720:
--

Committed to branch TEZ-8

 Allow filters in all tables and also to pass in filters using url params
 

 Key: TEZ-1720
 URL: https://issues.apache.org/jira/browse/TEZ-1720
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
 Fix For: 0.6.0

 Attachments: tez-1720.1.patch


 Need to make sure that all the tables in the ui can use filters and allow 
 them to be set through url. this is needed for showing for ex the failed 
 tasks for a dag/vertex etc and to bookmark searches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1700) Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based affinity


[ 
https://issues.apache.org/jira/browse/TEZ-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189239#comment-14189239
 ] 

Bikas Saha commented on TEZ-1700:
-

Binary compatibility should be fine. Would be hard to do since 0.5.1 and 0.5.2 
are already incompatible.

bq. other.affinitizedTask != null
Was just following the existing code flow. Will check this.

bq. affinityAttempt's container id?
Good catch. Will check why the existing test for affinity did not catch this.


 Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based 
 affinity
 

 Key: TEZ-1700
 URL: https://issues.apache.org/jira/browse/TEZ-1700
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1700.1.patch


 Today 1-1 dependencies are affinitized by creating a task location hint with 
 the producer task container id. It can be created by affinitizing to the 
 producer task-index+vertexname combination instead and internally Tez can map 
 it to the container. This also allows this dependency to be specified before 
 the container is assigned. This allows the dependency to be generic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1700) Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based affinity


[ 
https://issues.apache.org/jira/browse/TEZ-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189327#comment-14189327
 ] 

Hitesh Shah commented on TEZ-1700:
--

bq. Binary compatibility should be fine. Would be hard to do since 0.5.1 and 
0.5.2 are already incompatible.

There are incompatible in that a 0.5.1 client cannot use a 0.5.2 AM. But a job 
compiled against either 0.5.0 or 0.5.1 should work when used with the 0.5.2 
jars ( both client and AM ).

 Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based 
 affinity
 

 Key: TEZ-1700
 URL: https://issues.apache.org/jira/browse/TEZ-1700
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1700.1.patch


 Today 1-1 dependencies are affinitized by creating a task location hint with 
 the producer task container id. It can be created by affinitizing to the 
 producer task-index+vertexname combination instead and internally Tez can map 
 it to the container. This also allows this dependency to be specified before 
 the container is assigned. This allows the dependency to be generic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1716) Additional ATS data for UI


 [ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1716:
-
Attachment: TEZ-1716.2.patch

Additional changes to push successful attempt id to ATS. 

 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1716.1.patch, TEZ-1716.2.patch


 Add failed and killed attempt info at DAG and Vertex Level.
 Add tez-site configuration contents to Tez App Entity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1716) Additional ATS data for UI


 [ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1716:
-
Description: 
Add failed and killed attempt info at DAG and Vertex Level.
Add tez-site configuration contents to Tez App Entity
Add task's successful attempt id in task data. 

  was:
Add failed and killed attempt info at DAG and Vertex Level.
Add tez-site configuration contents to Tez App Entity


 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1716.1.patch, TEZ-1716.2.patch


 Add failed and killed attempt info at DAG and Vertex Level.
 Add tez-site configuration contents to Tez App Entity
 Add task's successful attempt id in task data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1700) Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based affinity


[ 
https://issues.apache.org/jira/browse/TEZ-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189382#comment-14189382
 ] 

Bikas Saha commented on TEZ-1700:
-

Made the changes. Tried with broadcastAndOneToOneExample from master branch and 
it worked fine.
The existing test case (broadcastAndOneToOneExample) does not catch this 
because in the mini cluster there isnt enough parallelism and the preferred 
container gets matched by chance because there are only 2 containers around. 
The test might have become flaky after this.
I added a new test case in TestTaskSchedulerEventHandler to test that its doing 
the translation.
Please take another look.

 Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based 
 affinity
 

 Key: TEZ-1700
 URL: https://issues.apache.org/jira/browse/TEZ-1700
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1700.1.patch


 Today 1-1 dependencies are affinitized by creating a task location hint with 
 the producer task container id. It can be created by affinitizing to the 
 producer task-index+vertexname combination instead and internally Tez can map 
 it to the container. This also allows this dependency to be specified before 
 the container is assigned. This allows the dependency to be generic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1700) Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based affinity


 [ 
https://issues.apache.org/jira/browse/TEZ-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1700:

Attachment: TEZ-1700.2.patch

 Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based 
 affinity
 

 Key: TEZ-1700
 URL: https://issues.apache.org/jira/browse/TEZ-1700
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1700.1.patch, TEZ-1700.2.patch


 Today 1-1 dependencies are affinitized by creating a task location hint with 
 the producer task container id. It can be created by affinitizing to the 
 producer task-index+vertexname combination instead and internally Tez can map 
 it to the container. This also allows this dependency to be specified before 
 the container is assigned. This allows the dependency to be generic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1700) Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based affinity


[ 
https://issues.apache.org/jira/browse/TEZ-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189392#comment-14189392
 ] 

Hitesh Shah commented on TEZ-1700:
--

Most looks good except for the equals() check. it does not handle other being 
non-null and this.affinity being null. I think the equals() probably deserves a 
unit test now. 

 Replace containerId from TaskLocationHint with [TaskIndex+Vertex] based 
 affinity
 

 Key: TEZ-1700
 URL: https://issues.apache.org/jira/browse/TEZ-1700
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1700.1.patch, TEZ-1700.2.patch


 Today 1-1 dependencies are affinitized by creating a task location hint with 
 the producer task container id. It can be created by affinitizing to the 
 producer task-index+vertexname combination instead and internally Tez can map 
 it to the container. This also allows this dependency to be specified before 
 the container is assigned. This allows the dependency to be generic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1699) Vertex.setParallelism should throw an exception for invalid invocations


[ 
https://issues.apache.org/jira/browse/TEZ-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189395#comment-14189395
 ] 

Hitesh Shah commented on TEZ-1699:
--

+1

 Vertex.setParallelism should throw an exception for invalid invocations
 ---

 Key: TEZ-1699
 URL: https://issues.apache.org/jira/browse/TEZ-1699
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
Priority: Critical
 Attachments: TEZ-1699.1.patch


 There is a return value of false when setParallelism is not successful. 
 However that may be ignored and in some cases the invocation is actually 
 incorrect and its better to throw an exception than return false. Throwing an 
 unchecked exception can allow doing this compatibly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1716) Additional ATS data for UI


[ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189406#comment-14189406
 ] 

Rajesh Balamohan commented on TEZ-1716:
---

Minor comments.

In DAGImpl.java, taskStats computation can be done in separate method to avoid 
code duplication?
{code}
 MapString, Integer taskStats = new HashMapString, Integer();
ProgressBuilder progressBuilder = getDAGProgress();
taskStats.put(ATSConstants.NUM_COMPLETED_TASKS, 
progressBuilder.getTotalTaskCount());
...
{code}

In DAGSubmittedEvent.java, can vertexNameIDMap be removed?


 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1716.1.patch, TEZ-1716.2.patch


 Add failed and killed attempt info at DAG and Vertex Level.
 Add tez-site configuration contents to Tez App Entity
 Add task's successful attempt id in task data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1716) Additional ATS data for UI


[ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189484#comment-14189484
 ] 

Bikas Saha commented on TEZ-1716:
-

Shouldnt this come from YARN AHS?
{code}+AppLaunchedEvent appLaunchedEvent = new 
AppLaunchedEvent(appAttemptID.getApplicationId(),
+startTime, appSubmitTime, appMasterUgi.getShortUserName(), 
this.amConf);
+historyEventHandler.handle({code}

Why is vertexName To Id mapping moved to inited from submitted event? Can this 
mapping be passed in the vertex initialized event instead of via an initial 
map? Doing it via the vertex initialized event will make it continue to work 
when we add vertices at runtime.



 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1716.1.patch, TEZ-1716.2.patch


 Add failed and killed attempt info at DAG and Vertex Level.
 Add tez-site configuration contents to Tez App Entity
 Add task's successful attempt id in task data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-1716) Additional ATS data for UI

[
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189484#comment-14189484
]

Bikas Saha edited comment on TEZ-1716 at 10/30/14 2:04 AM:
---

Why is vertexNameToId mapping moved to inited from submitted event? Can this
mapping be passed in the vertex initialized event instead of via an initial
map? Doing it via the vertex initialized event will make it continue to work
when we add vertices at runtime.

was (Author: bikassaha):
Shouldnt this come from YARN AHS?
{code}+AppLaunchedEvent appLaunchedEvent = new
AppLaunchedEvent(appAttemptID.getApplicationId(),
+startTime, appSubmitTime, appMasterUgi.getShortUserName(),
this.amConf);
+historyEventHandler.handle({code}

Why is vertexName To Id mapping moved to inited from submitted event? Can this
mapping be passed in the vertex initialized event instead of via an initial
map? Doing it via the vertex initialized event will make it continue to work
when we add vertices at runtime.

Additional ATS data for UI
--

Key: TEZ-1716
URL: https://issues.apache.org/jira/browse/TEZ-1716
Project: Apache Tez
Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Attachments: TEZ-1716.1.patch, TEZ-1716.2.patch

Add failed and killed attempt info at DAG and Vertex Level.
Add tez-site configuration contents to Tez App Entity
Add task's successful attempt id in task data.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1716) Additional ATS data for UI


[ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189506#comment-14189506
 ] 

Hitesh Shah commented on TEZ-1716:
--

[~bikassaha] It is moving to DAGInitialized as that is where the ids are being 
generated. Cannot use the VertexInit event as that will result in one more 
additional call to ATS to update the DAG entity. The main objective of this was 
to get the name to id mapping from the dag entity instead of querying all 
vertices to do the correlation. 

[~rajesh.balamohan] Will address the comments in the next patch. 

 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1716.1.patch, TEZ-1716.2.patch


 Add failed and killed attempt info at DAG and Vertex Level.
 Add tez-site configuration contents to Tez App Entity
 Add task's successful attempt id in task data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1711) Don't cache outputSpecList in VertexImpl.getOutputSpecList(taskIndex)


 [ 
https://issues.apache.org/jira/browse/TEZ-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1711:

Attachment: TEZ-1711-2.patch

 Don't cache outputSpecList in VertexImpl.getOutputSpecList(taskIndex)
 -

 Key: TEZ-1711
 URL: https://issues.apache.org/jira/browse/TEZ-1711
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-1711-2.patch, TEZ-1711.patch


 It would cache the outputSpecList in its 
 VertexImpl.getOutputSepcList(taskIndex), but I don't think we should cache it 
 as it depends on the taskIndex, although in all the EdgeManagerPlugin 
 Implementations, the value is the same no matter what the taskIndex is. But 
 it has risk that if we have a new EdgeManagerPlugin that has different 
 behavior. Or if this case would never happens, then just remove the taskIndex 
 from the method parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1716) Additional ATS data for UI


 [ 
https://issues.apache.org/jira/browse/TEZ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1716:
-
Attachment: TEZ-1716.3.patch

Patch with [~rajesh.balamohan]'s comments addressed. 

 Additional ATS data for UI
 --

 Key: TEZ-1716
 URL: https://issues.apache.org/jira/browse/TEZ-1716
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-1716.1.patch, TEZ-1716.2.patch, TEZ-1716.3.patch


 Add failed and killed attempt info at DAG and Vertex Level.
 Add tez-site configuration contents to Tez App Entity
 Add task's successful attempt id in task data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1711) Don't cache outputSpecList in VertexImpl.getOutputSpecList(taskIndex)

[
https://issues.apache.org/jira/browse/TEZ-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189534#comment-14189534
]

Jeff Zhang commented on TEZ-1711:
-

[~bikassaha] Attach a new patch,

bq. Given this change, should we remove inputSpecList and outputSpecList as
member vars of VertexImpl?
Still use them as member of VertexImpl, but new them in constructor and clear
them in getInputSpecList and getOutputSpectList. This could avoid creating new
List especially for large job. Does it make sense to you ?

bq. Why is this change making the test DAG fail?
The affected test case is for TEZ-1689 ( Exception handling for
EdgeManagerPlugin ). Without this patch, only the first task attempt is failed
in AM side, the following task attempts wouldn't not been affected in AM side (
because we cache the outputSpecList ), but will throw exception in TezChild
since we don't get the correct outputSpecList, (but that can not been simulated
in unit test case , the unit test case can only simulate behavior in AM side).
So without this patch, AM would think the dag is still running. While with this
patch, all the task attempts would fail in AM side, and finally cause the DAG
fail.

Don't cache outputSpecList in VertexImpl.getOutputSpecList(taskIndex)
-

Key: TEZ-1711
URL: https://issues.apache.org/jira/browse/TEZ-1711
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Attachments: TEZ-1711-2.patch, TEZ-1711.patch

It would cache the outputSpecList in its
VertexImpl.getOutputSepcList(taskIndex), but I don't think we should cache it
as it depends on the taskIndex, although in all the EdgeManagerPlugin
Implementations, the value is the same no matter what the taskIndex is. But
it has risk that if we have a new EdgeManagerPlugin that has different
behavior. Or if this case would never happens, then just remove the taskIndex
from the method parameter.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1703) Exception handling for InputInitializer


 [ 
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1703:

Attachment: TEZ-1703-3.patch

 Exception handling for InputInitializer
 ---

 Key: TEZ-1703
 URL: https://issues.apache.org/jira/browse/TEZ-1703
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-1703-2.patch, TEZ-1703-3.patch, TEZ-1703.patch


 For handleInputInitializerEvent - this should be fairly straightfoward to 
 handle. At the moment this is an inline call from within the AsyncDispatcher, 
 and will end up causing a RuntimeException. The RuntimeException can be 
 changed to a AMUserCodeException which will take care of this.
 For onVertexStateUpdated, this eventually gets invoked from within 
 RootInputInitializerManager. Catching exceptions there and sending a 
 RootInputInitialzierFailedEvent should be enough to fix this ? May require 
 some state machine changes to handle this event on a few more states.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1703) Exception handling for InputInitializer

[
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189576#comment-14189576
]

Jeff Zhang commented on TEZ-1703:
-

bq. {code}
DAGTerminationCause.VERTEX_FAILURE,
vertexEvent.getVertexTerminationCause() == null ?
VertexTerminationCause.OTHER_VERTEX_FAILURE
: vertexEvent.getVertexTerminationCause());
DAGTerminationCause.VERTEX_FAILURE,
VertexTerminationCause.OTHER_VERTEX_FAILURE);
{code}
bq. This is required so that all vertices don't get the same termination cause
as the first vertex to fail ?
Yes, otherwise all the vertices' termination would be the same which don't make
sense to me. Beside there will be one issue in
VertexImpl.checkVertexForCompletion where we will check the termination cause
where we don't check ROOT_INPUT_INIT_FAILURE.

bq. Prior to the patch
bq. It's possible for events/notifications to be sent to a complete Initializer
since the initializers / events are handled in separate threads. The
setComplete() and isComplete checks aren't sufficient to avoid this.
bq. Ideally, completed initializers should just handle these events gracefully,
but that's not something that Tez can guarantee. We need to handle such
situations, likely in a separate jira.
After Initialize completed, InputInitliazerManager would been shutdown, will
that solve this issue ?

bq. Will inputInitializer failures never go through this transition ? It may be
better to set this up based on the SOURCE information available in the
exception.
InputInitializer will set TerminationCause as ROOT_INIT_FAILURE rather than
AM_USERCODE_EXCEPTION which is a special cause. Maybe we could still split
AMUserCodeException into VertexManagerException/EdgeManagerException, then it
would be much more clear and consistency.

bq. The state machine in VertexImpl will need to change to handle
INITIALIER_FAILED in some more states, and fail the vertex.
Add more transition in the state machine. But there will be on tricky case that
INIT_SUCCEEDED following by INIT_FAILURE, because INIT_SUCCEEDED would shutdown
InputInitliazerManager, in that cast the InputInitliazer Thread would been
interupted, and

bq. And +1 for renaming the file. Please do that just before the commit though
- not as part of iterative patches.
Actually it is more about renaming RootInputInitlaizerManager, do the following
changes:
* RootInputInitializerManager - InputInitializerManager
* TezRootInputInitializerContextImpl - TezInputInitializerContextImpl
* VertexEventRootInputInitialized - VertexEventInputInitialized
* VertexEventRootInputFailed - VertexEventInputFailed
* VertexTerminationCause.ROOT_INPUT_INIT_FAILURE -
VertexTerminationCause.INPUT_INIT_FAILURE.
* EventType.ROOT_INPUT_DATA_INFORMATION_EVENT -
EventType.INPUT_DATA_INFORMATION_EVENT
* EventType.ROOT_INPUT_INITIALIZER_EVENT - EventType.INPUT_INITIALIZER_EVENT
* VertexEventType.V_ROOT_INPUT_INITIALIZED -
VertexEventType.V_INPUT_INITIALIZED
* VertexEventType.V_ROOT_INPUT_FAILED - VertexEventType.V_INPUT_INIT_FAILED

Exception handling for InputInitializer
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-1703) Exception handling for InputInitializer

[
https://issues.apache.org/jira/browse/TEZ-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189576#comment-14189576
]

Jeff Zhang edited comment on TEZ-1703 at 10/30/14 3:28 AM:
---

bq. The state machine in VertexImpl will need to change to handle
INITIALIER_FAILED in some more states, and fail the vertex.
Add more transition in the state machine.

was (Author: zjffdu):
bq. {code}
DAGTerminationCause.VERTEX_FAILURE,
vertexEvent.getVertexTerminationCause() == null ?
VertexTerminationCause.OTHER_VERTEX_FAILURE
: vertexEvent.getVertexTerminationCause());
DAGTerminationCause.VERTEX_FAILURE,
VertexTerminationCause.OTHER_VERTEX_FAILURE);
{code}
bq. This is required so that all vertices don't get the same termination cause
as the first vertex to fail ?
Yes, otherwise all the vertices' termination would be the same which don't make
sense to me. Beside there will be one issue in
VertexImpl.checkVertexForCompletion where we will check the termination cause
where we don't check ROOT_INPUT_INIT_FAILURE.

bq.

[jira] [Commented] (TEZ-1547) Make use of state change notifier in VertexManagerPlugins


[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189606#comment-14189606
 ] 

Rajesh Balamohan commented on TEZ-1547:
---

Corner case in ImmediateStartVertexManager:

1. onVertexStarted() gets called and is the middle of populating 
srcVertexConfigured.  Assume it has to populate 2 items in srcVertexConfigured 
and has populated 1 item in the map.
2. In the mean time, onVertexStateUpdated() gets called with 
COMPLETELY_CONFIGURED for the item in srcVertexConfigured.
3. In this case, canScheduleTasks() would return true (without being aware of 
the 2nd item that is yet to be populated in srcVertexConfigured).
4. If source pertaining to 2nd item changes its parallelism, DAG can hang 
indefinitely.

{code}
e.g log:

2014-10-29 20:38:46,172 INFO [AsyncDispatcher event handler] 
impl.ImmediateStartVertexManager: Task count in Map_7: 1
2014-10-29 20:38:46,173 INFO [AsyncDispatcher event handler] 
impl.ImmediateStartVertexManager: Received configured notification : 
COMPLETELY_CONFIGURED for vertex: Map_7
2014-10-29 20:38:46,173 INFO [AsyncDispatcher event handler] 
impl.ImmediateStartVertexManager: Starting 10 in Map_5
2014-10-29 20:38:46,173 INFO [AsyncDispatcher event handler] 
impl.ImmediateStartVertexManager: Task count in Reducer_3: 2
...
...
2014-10-29 20:39:18,682 INFO [AsyncDispatcher event handler] 
vertexmanager.ShuffleVertexManager: Reduce auto parallelism for vertex: 
Reducer_3 to 1 from 2 . Expected output: 0 based on actual output: 0 from 1 
vertex manager events.  desiredTaskInputSize: 104857600 max slow start 
tasks:0.1 num sources completed:1
{code}

In short, check in scheduleTasks() should be added to ensure that 
srcVertexConfigured is completely populated in onVertexStarted().


 Make use of state change notifier in VertexManagerPlugins
 -

 Key: TEZ-1547
 URL: https://issues.apache.org/jira/browse/TEZ-1547
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
 TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch


 Instead of the various APIs like onVertexStarted, simple notifications could 
 be sent.
 Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1716) Additional ATS data for UI