[jira] [Updated] (TEZ-1493) WordCount example fails in recovery

2014-08-25 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1493:


Description: 
{code}
14/08/25 17:37:03 INFO client.TezClient: Submitting DAG to YARN, 
applicationId=application_1408499461970_0053, dagName=WordCount
14/08/25 17:37:03 INFO impl.YarnClientImpl: Submitted application 
application_1408499461970_0053
14/08/25 17:37:03 INFO client.TezClient: The url to track the Tez AM: 
http://jzhangMBPr.local:8088/proxy/application_1408499461970_0053/
14/08/25 17:37:03 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
14/08/25 17:37:03 INFO client.AHSProxy: Connecting to Application History 
server at /0.0.0.0:10200
14/08/25 17:37:03 INFO rpc.DAGClientRPCImpl: Waiting for DAG to start running
14/08/25 17:37:07 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 0% 
TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
14/08/25 17:37:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 50% 
TotalTasks: 2 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
14/08/25 17:37:17 INFO rpc.DAGClientRPCImpl: DAG completed. FinalState=SUBMITTED
WordCount failed with diagnostics: []
{code}

The client side shows that the job is failed, but checking the logs found that 
the recovery works in server side, and eventually finish the job successfully.

  was:
{code}
14/08/25 17:37:03 INFO client.TezClient: Submitting DAG to YARN, 
applicationId=application_1408499461970_0053, dagName=WordCount
14/08/25 17:37:03 INFO impl.YarnClientImpl: Submitted application 
application_1408499461970_0053
14/08/25 17:37:03 INFO client.TezClient: The url to track the Tez AM: 
http://jzhangMBPr.local:8088/proxy/application_1408499461970_0053/
14/08/25 17:37:03 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
14/08/25 17:37:03 INFO client.AHSProxy: Connecting to Application History 
server at /0.0.0.0:10200
14/08/25 17:37:03 INFO rpc.DAGClientRPCImpl: Waiting for DAG to start running
14/08/25 17:37:07 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 0% 
TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
14/08/25 17:37:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 50% 
TotalTasks: 2 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
14/08/25 17:37:17 INFO rpc.DAGClientRPCImpl: DAG completed. FinalState=SUBMITTED
WordCount failed with diagnostics: []
{code}


 WordCount example fails in recovery
 ---

 Key: TEZ-1493
 URL: https://issues.apache.org/jira/browse/TEZ-1493
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang

 {code}
 14/08/25 17:37:03 INFO client.TezClient: Submitting DAG to YARN, 
 applicationId=application_1408499461970_0053, dagName=WordCount
 14/08/25 17:37:03 INFO impl.YarnClientImpl: Submitted application 
 application_1408499461970_0053
 14/08/25 17:37:03 INFO client.TezClient: The url to track the Tez AM: 
 http://jzhangMBPr.local:8088/proxy/application_1408499461970_0053/
 14/08/25 17:37:03 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:8032
 14/08/25 17:37:03 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/08/25 17:37:03 INFO rpc.DAGClientRPCImpl: Waiting for DAG to start running
 14/08/25 17:37:07 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 0% 
 TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 
 50% TotalTasks: 2 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:17 INFO rpc.DAGClientRPCImpl: DAG completed. 
 FinalState=SUBMITTED
 WordCount failed with diagnostics: []
 {code}
 The client side shows that the job is failed, but checking the logs found 
 that the recovery works in server side, and eventually finish the job 
 successfully.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1493) WordCount example fails in recovery

2014-08-25 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1493:


Attachment: Tez-1493.patch

 WordCount example fails in recovery
 ---

 Key: TEZ-1493
 URL: https://issues.apache.org/jira/browse/TEZ-1493
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: Tez-1493.patch


 {code}
 14/08/25 17:37:03 INFO client.TezClient: Submitting DAG to YARN, 
 applicationId=application_1408499461970_0053, dagName=WordCount
 14/08/25 17:37:03 INFO impl.YarnClientImpl: Submitted application 
 application_1408499461970_0053
 14/08/25 17:37:03 INFO client.TezClient: The url to track the Tez AM: 
 http://jzhangMBPr.local:8088/proxy/application_1408499461970_0053/
 14/08/25 17:37:03 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:8032
 14/08/25 17:37:03 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/08/25 17:37:03 INFO rpc.DAGClientRPCImpl: Waiting for DAG to start running
 14/08/25 17:37:07 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 0% 
 TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 
 50% TotalTasks: 2 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:17 INFO rpc.DAGClientRPCImpl: DAG completed. 
 FinalState=SUBMITTED
 WordCount failed with diagnostics: []
 {code}
 The client side shows that the job is failed, but checking the logs found 
 that the recovery works in server side, and eventually finish the job 
 successfully.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1493) WordCount example fails in recovery

2014-08-25 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109015#comment-14109015
 ] 

Jeff Zhang commented on TEZ-1493:
-

Attach the patch.

The reason of this issue is that when the the second AM attempt is started, 
DAGClient will first fetch status via am which is in submitted state. 

 WordCount example fails in recovery
 ---

 Key: TEZ-1493
 URL: https://issues.apache.org/jira/browse/TEZ-1493
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: Tez-1493.patch


 {code}
 14/08/25 17:37:03 INFO client.TezClient: Submitting DAG to YARN, 
 applicationId=application_1408499461970_0053, dagName=WordCount
 14/08/25 17:37:03 INFO impl.YarnClientImpl: Submitted application 
 application_1408499461970_0053
 14/08/25 17:37:03 INFO client.TezClient: The url to track the Tez AM: 
 http://jzhangMBPr.local:8088/proxy/application_1408499461970_0053/
 14/08/25 17:37:03 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:8032
 14/08/25 17:37:03 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/08/25 17:37:03 INFO rpc.DAGClientRPCImpl: Waiting for DAG to start running
 14/08/25 17:37:07 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 0% 
 TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 
 50% TotalTasks: 2 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:17 INFO rpc.DAGClientRPCImpl: DAG completed. 
 FinalState=SUBMITTED
 WordCount failed with diagnostics: []
 {code}
 The client side shows that the job is failed, but checking the logs found 
 that the recovery works in server side, and eventually finish the job 
 successfully.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1494) DAG hangs waiting for ShuffleManager.getNextInput()

2014-08-25 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1494:
-

 Summary: DAG hangs waiting for ShuffleManager.getNextInput()
 Key: TEZ-1494
 URL: https://issues.apache.org/jira/browse/TEZ-1494
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan


Attaching the DAG and the stack trace of the hung process.  

digraph rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1 {
graph [ label=rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1, 
fontsize=24, fontname=Helvetica];
node [fontsize=12, fontname=Helvetica];
edge [fontsize=9, fontcolor=blue, fontname=Arial];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1 [ label = 
Map_1[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2 [ label = 
[input=UnorderedKVOutput,\n output=UnorderedKVInput,\n 
dataMovement=BROADCAST,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_7 [ label = 
Map_7[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_7 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5 [ label = 
[input=UnorderedKVOutput,\n output=UnorderedKVInput,\n 
dataMovement=BROADCAST,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Reducer_6_out_Reducer_6
 [ label = Reducer_6[out_Reducer_6], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8 [ label = 
Map_8[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2 [ label = 
[input=UnorderedKVOutput,\n output=UnorderedKVInput,\n 
dataMovement=BROADCAST,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_4_date_dim [ 
label = Map_4[date_dim], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_4_date_dim 
- rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_4 [ label 
= Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5 [ label = 
Map_5[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Reducer_6 [ 
label = [input=OrderedPartitionedKVOutput,\n output=OrderedGroupedKVInput,\n 
dataMovement=SCATTER_GATHER,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8_customer_address
 [ label = Map_8[customer_address], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8_customer_address
 - rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8 [ 
label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2_store_sales
 [ label = Map_2[store_sales], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2_store_sales
 - rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2 [ 
label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1_household_demographics
 [ label = Map_1[household_demographics], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1_household_demographics
 - rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1 [ 
label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_9 [ label = 
Map_9[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_9 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2 [ label = 
[input=UnorderedKVOutput,\n output=UnorderedKVInput,\n 
dataMovement=BROADCAST,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5_customer [ 
label = Map_5[customer], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5_customer 
- rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5 [ label 
= Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_7_current_addr
 [ label = Map_7[current_addr], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_7_current_addr
 - rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_7 [ 
label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Reducer_6 [ 
label = Reducer_6[ReduceTezProcessor] ];

[jira] [Updated] (TEZ-1494) DAG hangs waiting for ShuffleManager.getNextInput()

2014-08-25 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1494:
--

Description: 
Attaching the DAG and the stack trace of the hung process.  


Thread 30071: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
line=186 (Interpreted frame)
 - 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
@bci=42, line=2043 (Interpreted frame)
 - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 
(Interpreted frame)
 - 
org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput()
 @bci=67, line=610 (Interpreted frame)
 - 
org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput()
 @bci=26, line=176 (Interpreted frame)
 - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next() 
@bci=30, line=117 (Interpreted frame)

  was:
Attaching the DAG and the stack trace of the hung process.  

digraph rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1 {
graph [ label=rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1, 
fontsize=24, fontname=Helvetica];
node [fontsize=12, fontname=Helvetica];
edge [fontsize=9, fontcolor=blue, fontname=Arial];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1 [ label = 
Map_1[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2 [ label = 
[input=UnorderedKVOutput,\n output=UnorderedKVInput,\n 
dataMovement=BROADCAST,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_7 [ label = 
Map_7[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_7 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5 [ label = 
[input=UnorderedKVOutput,\n output=UnorderedKVInput,\n 
dataMovement=BROADCAST,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Reducer_6_out_Reducer_6
 [ label = Reducer_6[out_Reducer_6], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8 [ label = 
Map_8[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2 [ label = 
[input=UnorderedKVOutput,\n output=UnorderedKVInput,\n 
dataMovement=BROADCAST,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_4_date_dim [ 
label = Map_4[date_dim], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_4_date_dim 
- rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_4 [ label 
= Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5 [ label = 
Map_5[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Reducer_6 [ 
label = [input=OrderedPartitionedKVOutput,\n output=OrderedGroupedKVInput,\n 
dataMovement=SCATTER_GATHER,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8_customer_address
 [ label = Map_8[customer_address], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8_customer_address
 - rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_8 [ 
label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2_store_sales
 [ label = Map_2[store_sales], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2_store_sales
 - rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2 [ 
label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1_household_demographics
 [ label = Map_1[household_demographics], shape = box ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1_household_demographics
 - rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_1 [ 
label = Input [inputClass=MRInputLegacy,\n initializer=HiveSplitGenerator] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_9 [ label = 
Map_9[MapTezProcessor] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_9 - 
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_2 [ label = 
[input=UnorderedKVOutput,\n output=UnorderedKVInput,\n 
dataMovement=BROADCAST,\n schedulingType=SEQUENTIAL] ];
rajesh_20140825050909_6206d911_7de1_47aa_8788_dd9ffcc9ad36_1.Map_5_customer [ 
label = Map_5[customer], shape = box ];

[jira] [Updated] (TEZ-1494) DAG hangs waiting for ShuffleManager.getNextInput()

2014-08-25 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1494:
--

Attachment: TEZ-1494-DAG.dot

 DAG hangs waiting for ShuffleManager.getNextInput()
 ---

 Key: TEZ-1494
 URL: https://issues.apache.org/jira/browse/TEZ-1494
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1494-DAG.dot


 Attaching the DAG and the stack trace of the hung process.  
 Thread 30071: (state = BLOCKED)
  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
 line=186 (Interpreted frame)
  - 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
 @bci=42, line=2043 (Interpreted frame)
  - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 
 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput()
  @bci=67, line=610 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput()
  @bci=26, line=176 (Interpreted frame)
  - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next() 
 @bci=30, line=117 (Interpreted frame)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1490) dagid reported is incorrect in TezClient.java

2014-08-25 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109136#comment-14109136
 ] 

Prakash Ramachandran commented on TEZ-1490:
---

Thanks for the patch. yes its mostly a display issue as of now. 

 dagid reported is incorrect in TezClient.java
 -

 Key: TEZ-1490
 URL: https://issues.apache.org/jira/browse/TEZ-1490
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Jonathan Eagles
 Attachments: TEZ-1490-v1.patch


 The format used to get the dagid and appid in TezClient.java does not match 
 the one used in TezDagId.java.
 ex. 
 TezClient.java reports dagid as  dag_1408740248751_3_01
 The dagid as reported in logs is dag_1408740248751_0003_1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1486) TezUncheckedException when using dynamic partition pruning

2014-08-25 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1486:


Attachment: TEZ-1486.1.txt

Patch skips routing events if target vertex parallelism is 0. I've left the 
exception in there rightnow, even though it's technically possible to route 
events and generate an empty list, but is highly unlikely.
[~bikassaha] - review please.

 TezUncheckedException when using dynamic partition pruning
 --

 Key: TEZ-1486
 URL: https://issues.apache.org/jira/browse/TEZ-1486
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gunther Hagleitner
Assignee: Siddharth Seth
 Attachments: TEZ-1486.1.txt


 I'm working on using the AM event mechanism to dynamically prune partitions 
 at DAG runtime for certain queries. The query is:
 select count(*) from srcpart join srcpart_double_hour on (srcpart.hr*2 = 
 srcpart_double_hour.hr) where srcpart_double_hour.hour = 11;
 This will result in two vertices connected through a broadcast edge. The 
 vertex prepares two things: The list of partition keys (hr) that are being 
 sent to the AM for dynamic pruning and the records to be used in the hash 
 join.
 The second vertex will block until all events are received (initializer) then 
 it will load and process the hash join.
 It's possible for queries like this to result in zero splits on the second 
 vertex (i.e.: no matching rows for the join)
 The exception I get when this is run is:
 org.apache.tez.dag.api.TezUncheckedException: Event must be routed. 
 sourceVertex=vertex_1408686217936_0003_3_00 srcIndex = 0 
 destAttemptId=vertex_1408686217936_0003_3_01 
 edgeManager=org.apache.tez.dag.app.dag.impl.BroadcastEdgeManager Ev\
 ent type=DATA_MOVEMENT_EVENT
   at 
 org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:371)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3372)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1088)
   at 
 org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleVertexTasks(VertexManager.java:111)
   at 
 org.apache.tez.dag.app.dag.impl.ImmediateStartVertexManager.onVertexStarted(ImmediateStartVertexManager.java:49)
   at 
 org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:244)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:2923)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.access$5900(VertexImpl.java:169)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2914)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2906)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1355)
   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:168)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1650)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1636)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1083) Enable IFile RLE for DefaultSorter

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109453#comment-14109453
 ] 

Gopal V commented on TEZ-1083:
--

This looks alright - this just needs a roll-over check for the sameKey long 
variable.

The worst-case value for that is near O(n^2), so it might overflow before 
totalKeys does.

For performance, it can be assumed that if sameKeys is  0, isRLENeeded == true 
- instead of checking within the loop.

 Enable IFile RLE for DefaultSorter
 --

 Key: TEZ-1083
 URL: https://issues.apache.org/jira/browse/TEZ-1083
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Gopal V
 Attachments: TEZ-1083.1.patch


 Generate RLE IFiles for DefaultSorter and use it to fast-forward map-side 
 merge.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1494) DAG hangs waiting for ShuffleManager.getNextInput()

2014-08-25 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109454#comment-14109454
 ] 

Hitesh Shah commented on TEZ-1494:
--

[~rajesh.balamohan] Is this an issue present in the 0.5.0 RC? 

 DAG hangs waiting for ShuffleManager.getNextInput()
 ---

 Key: TEZ-1494
 URL: https://issues.apache.org/jira/browse/TEZ-1494
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1494-DAG.dot


 Attaching the DAG and the stack trace of the hung process.  
 Thread 30071: (state = BLOCKED)
  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
 line=186 (Interpreted frame)
  - 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
 @bci=42, line=2043 (Interpreted frame)
  - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 
 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput()
  @bci=67, line=610 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput()
  @bci=26, line=176 (Interpreted frame)
  - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next() 
 @bci=30, line=117 (Interpreted frame)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1493) WordCount example fails in recovery

2014-08-25 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1493:
-

Priority: Blocker  (was: Major)
Target Version/s: 0.5.1

 WordCount example fails in recovery
 ---

 Key: TEZ-1493
 URL: https://issues.apache.org/jira/browse/TEZ-1493
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: Tez-1493.patch


 {code}
 14/08/25 17:37:03 INFO client.TezClient: Submitting DAG to YARN, 
 applicationId=application_1408499461970_0053, dagName=WordCount
 14/08/25 17:37:03 INFO impl.YarnClientImpl: Submitted application 
 application_1408499461970_0053
 14/08/25 17:37:03 INFO client.TezClient: The url to track the Tez AM: 
 http://jzhangMBPr.local:8088/proxy/application_1408499461970_0053/
 14/08/25 17:37:03 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:8032
 14/08/25 17:37:03 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/08/25 17:37:03 INFO rpc.DAGClientRPCImpl: Waiting for DAG to start running
 14/08/25 17:37:07 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 0% 
 TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 
 50% TotalTasks: 2 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:17 INFO rpc.DAGClientRPCImpl: DAG completed. 
 FinalState=SUBMITTED
 WordCount failed with diagnostics: []
 {code}
 The client side shows that the job is failed, but checking the logs found 
 that the recovery works in server side, and eventually finish the job 
 successfully.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-1473) TEZ_RUNTIME_SHUFFLE_BUFFER is too large by default

2014-08-25 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-1473.
-

Resolution: Invalid

 TEZ_RUNTIME_SHUFFLE_BUFFER is too large by default
 --

 Key: TEZ-1473
 URL: https://issues.apache.org/jira/browse/TEZ-1473
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: TEZ-1473.1.patch


 TEZ_RUNTIME_SHUFFLE_BUFFER is 8GB by default, while 
 TEZ_TASK_RESOURCE_MEMORY_MB_DEFAULT is 1GB. It leads OoM or Container Killer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109461#comment-14109461
 ] 

Gopal V commented on TEZ-1492:
--

The BufferUtils class needs re-namespacing as well, as part of this patch.

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1489) Broadcast Shuffle should call freeResources() on FetchedInput

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109472#comment-14109472
 ] 

Gopal V commented on TEZ-1489:
--

The buffer is being cleared up correctly - but the unreserve() is not getting 
called, so the internal check switches to Disk even though buffers are unused.

 Broadcast Shuffle should call freeResources() on FetchedInput
 -

 Key: TEZ-1489
 URL: https://issues.apache.org/jira/browse/TEZ-1489
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V

 BroadcastShuffle does not seem to free up the buffer space allocated by the 
 FetchedInputs during the task runtime.
 SimpleFetchedInputAllocator::freeResources is never called as per my logging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1489) Broadcast Shuffle should call freeResources() on FetchedInput

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109476#comment-14109476
 ] 

Gopal V commented on TEZ-1489:
--

UnorderedKVReader::moveToNextInput() maybe is a good place for this?

 Broadcast Shuffle should call freeResources() on FetchedInput
 -

 Key: TEZ-1489
 URL: https://issues.apache.org/jira/browse/TEZ-1489
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V

 BroadcastShuffle does not seem to free up the buffer space allocated by the 
 FetchedInputs during the task runtime.
 SimpleFetchedInputAllocator::freeResources is never called as per my logging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1489) Broadcast Shuffle should call freeResources() on FetchedInput

2014-08-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109481#comment-14109481
 ] 

Siddharth Seth commented on TEZ-1489:
-

Do you have some logs which can be looked at - the code flow seems like it'll 
end up calling unreserve on SimpleFetchedInputAllocator.

 Broadcast Shuffle should call freeResources() on FetchedInput
 -

 Key: TEZ-1489
 URL: https://issues.apache.org/jira/browse/TEZ-1489
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V

 BroadcastShuffle does not seem to free up the buffer space allocated by the 
 FetchedInputs during the task runtime.
 SimpleFetchedInputAllocator::freeResources is never called as per my logging.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1360) Provide vertex parallelism to each vertex task

2014-08-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109484#comment-14109484
 ] 

Siddharth Seth commented on TEZ-1360:
-

+1. Committing. Thanks [~gopalv], [~oae], [~rajesh.balamohan]

 Provide vertex parallelism to each vertex task
 --

 Key: TEZ-1360
 URL: https://issues.apache.org/jira/browse/TEZ-1360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Johannes Zillmann
Assignee: Gopal V
 Fix For: 0.5.1

 Attachments: TEZ-1360.1.patch, TEZ-1360.2.patch, TEZ-1360.4.patch, 
 TEZ-1360.5.patch, TEZ-1360.6.patch


 It would be good for a task to get a info about the total task count of its 
 vertex.
 With this there would be an equivalent for map-reduce' {{mapred.map.tasks}} 
 and {{mapred.reduce.tasks}} and mr-applications using this could be ported to 
 Tez more easily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1471) Additional supplement for TEZ local mode document

2014-08-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109518#comment-14109518
 ] 

Siddharth Seth commented on TEZ-1471:
-

[~airbots] - the changes look good. Could you please move some of them to a 
separate section though - Running a DAG in Local Mode should be limited to 
the changes users need to make; it's fairly cluttered at this point.

Something like a things to watch out for section, which can contain
LargeData belong here
TezConfiguration.TEZ_AM_INLINE_TASK_EXECUTION_MAX_TASKS(tez.am.inline.task.execution.max-tasks)
 should not be changed (defaults to 1).
tez.history.logging.service.class should be the default value: 
org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService. It means 
ATS is disabbled in current Local Mode.

I don't think we need to call out NodeBlacklisting being disabled. Otherwise 
move it to the section about moving to a real cluster.



 Additional supplement for TEZ local mode document
 -

 Key: TEZ-1471
 URL: https://issues.apache.org/jira/browse/TEZ-1471
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.4.0
Reporter: Chen He
Assignee: Chen He
 Attachments: TEZ-1471-2.patch, TEZ-1471-3.patch, TEZ-1471-4.patch, 
 TEZ-1471.patch


 some supplements for Local mode document



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1495) ATS integration for TezClient

2014-08-25 Thread Prakash Ramachandran (JIRA)
Prakash Ramachandran created TEZ-1495:
-

 Summary: ATS integration for TezClient
 Key: TEZ-1495
 URL: https://issues.apache.org/jira/browse/TEZ-1495
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran


Tez client should automatically redirect to ATS when the AM is not running.
All APIs exposed ( DAG status, counters, etc ) from the DAGClient should 
continue to work after the AM has shut down.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1495) ATS integration for TezClient

2014-08-25 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1495:
--

Attachment: TEZ-1495.WIP.1.patch

Changes in TEZ-1495.WIP.1.patch
- getDagStatus, getVertexStatus fallbacks to ATS 

 ATS integration for TezClient
 -

 Key: TEZ-1495
 URL: https://issues.apache.org/jira/browse/TEZ-1495
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
 Attachments: TEZ-1495.WIP.1.patch


 Tez client should automatically redirect to ATS when the AM is not running.
 All APIs exposed ( DAG status, counters, etc ) from the DAGClient should 
 continue to work after the AM has shut down.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1476) DAGClient waitForCompletion output is confusing

2014-08-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109649#comment-14109649
 ] 

Siddharth Seth commented on TEZ-1476:
-

+1. Looks good.

 DAGClient waitForCompletion output is confusing
 ---

 Key: TEZ-1476
 URL: https://issues.apache.org/jira/browse/TEZ-1476
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Jonathan Eagles
Priority: Critical
 Attachments: TEZ-1476-v1.patch, TEZ-1476-v2.patch


 When a DAG is submitted - 2014-08-21 16:38:06,153 INFO  [main] 
 rpc.DAGClientRPCImpl (DAGClientRPCImpl.java:log(428)) - Waiting for DAG to 
 start running is logged.
 After this, nothing seems to get logged till the first task completes.
 It would be useful to log when the state changes to RUNNING - as well as at 
 least one line stating the number of tasks, etc (0% progress line). Also, 
 progress could be logged every few seconds irrespective of whether it has 
 changed or not to give the impression that the job has not just gotten stuck.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1433) Invalid credentials can be used when a DAG is submitted to a session which has timed out

2014-08-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109672#comment-14109672
 ] 

Siddharth Seth commented on TEZ-1433:
-

[~jeagles] - it seems like we have some races here, which could be triggered if 
we get AppState as RUNNING, just before the AM session times out. We're not 
really handling that at the moment - but should in the future. In such cases, I 
think it's better to rollback changes - rather than end up having one off 
failures, which could potentially be really difficult to debug.
DAGSubmissionTimedOut - is another case where the client may time out, for 
whatever reason, despite the AM still being alive.

I think a fix which is going to work when the error reporting and time outs are 
handled differently, would be better.

 Invalid credentials can be used when a DAG is submitted to a session which 
 has timed out
 

 Key: TEZ-1433
 URL: https://issues.apache.org/jira/browse/TEZ-1433
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Siddharth Seth
Assignee: Jonathan Eagles
 Attachments: TEZ-1433-v1.patch


 When a DAG is submitted to a session which has timed out, and the same DAG is 
 then submitted to a new session - credentials associated with the old session 
 can end up getting used.
 Before we know that the session is no longer valid, the DAG is modified to 
 add local resources and credentials.
 On the next submission, since the DAG already has tokens (for HDFS for 
 example) from the old session, the tokens are not updated.
 Meanwhile, the old token would end up being cancelled by the RM - since the 
 applicaiton associated with the previous session has finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1471) Additional supplement for TEZ local mode document

2014-08-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109800#comment-14109800
 ] 

Siddharth Seth commented on TEZ-1471:
-

+1. Thanks [~airbots]

 Additional supplement for TEZ local mode document
 -

 Key: TEZ-1471
 URL: https://issues.apache.org/jira/browse/TEZ-1471
 Project: Apache Tez
  Issue Type: Sub-task
Affects Versions: 0.4.0
Reporter: Chen He
Assignee: Chen He
 Attachments: TEZ-1471-2.patch, TEZ-1471-3.patch, TEZ-1471-4.patch, 
 TEZ-1471-5.patch, TEZ-1471.patch


 some supplements for Local mode document



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1486) TezUncheckedException when using dynamic partition pruning

2014-08-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109823#comment-14109823
 ] 

Bikas Saha commented on TEZ-1486:
-

Looks good. Some comments.

Log message needs fixing. Please also add which events are not being routed.
{code}  LOG.info(Num Destination Tasks is . Not routing events);{code}

The if stmt can be made common outside the existing if-else block
{code}if (isDataMovementEvent) {
  DataMovementEvent dmEvent = (DataMovementEvent) tezEvent.getEvent();
  if (routingRequired) {
edgeManager.routeDataMovementEventToDestination(dmEvent,
srcTaskIndex, dmEvent.getSourceIndex(),
destTaskAndInputIndices);
  }
} else {
  if (routingRequired) {

edgeManager.routeInputSourceTaskFailedEventToDestination(srcTaskIndex,
destTaskAndInputIndices);
  }
}{code}

Extra whitespace
{code}
break;{code}

 TezUncheckedException when using dynamic partition pruning
 --

 Key: TEZ-1486
 URL: https://issues.apache.org/jira/browse/TEZ-1486
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Gunther Hagleitner
Assignee: Siddharth Seth
 Attachments: TEZ-1486.1.txt


 I'm working on using the AM event mechanism to dynamically prune partitions 
 at DAG runtime for certain queries. The query is:
 select count(*) from srcpart join srcpart_double_hour on (srcpart.hr*2 = 
 srcpart_double_hour.hr) where srcpart_double_hour.hour = 11;
 This will result in two vertices connected through a broadcast edge. The 
 vertex prepares two things: The list of partition keys (hr) that are being 
 sent to the AM for dynamic pruning and the records to be used in the hash 
 join.
 The second vertex will block until all events are received (initializer) then 
 it will load and process the hash join.
 It's possible for queries like this to result in zero splits on the second 
 vertex (i.e.: no matching rows for the join)
 The exception I get when this is run is:
 org.apache.tez.dag.api.TezUncheckedException: Event must be routed. 
 sourceVertex=vertex_1408686217936_0003_3_00 srcIndex = 0 
 destAttemptId=vertex_1408686217936_0003_3_01 
 edgeManager=org.apache.tez.dag.app.dag.impl.BroadcastEdgeManager Ev\
 ent type=DATA_MOVEMENT_EVENT
   at 
 org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:371)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3372)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1088)
   at 
 org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleVertexTasks(VertexManager.java:111)
   at 
 org.apache.tez.dag.app.dag.impl.ImmediateStartVertexManager.onVertexStarted(ImmediateStartVertexManager.java:49)
   at 
 org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:244)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:2923)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.access$5900(VertexImpl.java:169)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2914)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2906)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1355)
   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:168)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1650)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1636)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1496) Multi MR inputs can not be configured without accessing internal proto structures

2014-08-25 Thread Vikram Dixit K (JIRA)
Vikram Dixit K created TEZ-1496:
---

 Summary: Multi MR inputs can not be configured without accessing 
internal proto structures
 Key: TEZ-1496
 URL: https://issues.apache.org/jira/browse/TEZ-1496
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.1
Reporter: Vikram Dixit K
Priority: Blocker


With all the new API changes, the multi-mr input can no longer be configured 
cleanly without accessing internal structures in tez.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1497) Add tez-broadcast-example into tez-examples/

2014-08-25 Thread Gopal V (JIRA)
Gopal V created TEZ-1497:


 Summary: Add tez-broadcast-example into tez-examples/
 Key: TEZ-1497
 URL: https://issues.apache.org/jira/browse/TEZ-1497
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V


Modify https://github.com/t3rmin4t0r/tez-broadcast-example into a usable 
example inside tez-examples.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-25 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109874#comment-14109874
 ] 

Rajesh Balamohan commented on TEZ-1492:
---

[~gopalv] BufferUtils is placed in o.a.h.io as it relies on FastByteComparisons 
which is not a public class in Hadoop.  

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-25 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109874#comment-14109874
 ] 

Rajesh Balamohan edited comment on TEZ-1492 at 8/25/14 10:28 PM:
-

[~gopalv] BufferUtils is placed in o.a.h.io as it relies on FastByteComparisons 
which is package-local class in Hadoop.  


was (Author: rajesh.balamohan):
[~gopalv] BufferUtils is placed in o.a.h.io as it relies on FastByteComparisons 
which is not a public class in Hadoop.  

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1494) DAG hangs waiting for ShuffleManager.getNextInput()

2014-08-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109872#comment-14109872
 ] 

Siddharth Seth commented on TEZ-1494:
-

[~rajesh.balamohan] - have you investigated this any further ? Were all the 
DataMovementEvents received, was task retry in play ? etc

 DAG hangs waiting for ShuffleManager.getNextInput()
 ---

 Key: TEZ-1494
 URL: https://issues.apache.org/jira/browse/TEZ-1494
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1494-DAG.dot


 Attaching the DAG and the stack trace of the hung process.  
 Thread 30071: (state = BLOCKED)
  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
 line=186 (Interpreted frame)
  - 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
 @bci=42, line=2043 (Interpreted frame)
  - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442 
 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput()
  @bci=67, line=610 (Interpreted frame)
  - 
 org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput()
  @bci=26, line=176 (Interpreted frame)
  - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next() 
 @bci=30, line=117 (Interpreted frame)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1476) DAGClient waitForCompletion output is confusing

2014-08-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109873#comment-14109873
 ] 

Jonathan Eagles commented on TEZ-1476:
--

Thanks [~sseth]] and [~zjffdu] for the reviews. I committed this to master and 
branch-0.5

 DAGClient waitForCompletion output is confusing
 ---

 Key: TEZ-1476
 URL: https://issues.apache.org/jira/browse/TEZ-1476
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Jonathan Eagles
Priority: Critical
 Attachments: TEZ-1476-v1.patch, TEZ-1476-v2.patch


 When a DAG is submitted - 2014-08-21 16:38:06,153 INFO  [main] 
 rpc.DAGClientRPCImpl (DAGClientRPCImpl.java:log(428)) - Waiting for DAG to 
 start running is logged.
 After this, nothing seems to get logged till the first task completes.
 It would be useful to log when the state changes to RUNNING - as well as at 
 least one line stating the number of tasks, etc (0% progress line). Also, 
 progress could be logged every few seconds irrespective of whether it has 
 changed or not to give the impression that the job has not just gotten stuck.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1044) Need to consider map/reduce java.opts values for container reuse

2014-08-25 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109951#comment-14109951
 ] 

Hitesh Shah commented on TEZ-1044:
--

[~yeshavora] Please provide more details on this issue. 

 Need to consider map/reduce java.opts values for container reuse
 

 Key: TEZ-1044
 URL: https://issues.apache.org/jira/browse/TEZ-1044
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Yesha Vora

 Currently, mapreduce.map.java.opts and mapreduce.reduce.java.opts are not 
 being considered for container reuse. These properties should be considered 
 while reusing containers. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1496) Multi MR inputs can not be configured without accessing internal proto structures

2014-08-25 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1496:
-

Target Version/s: 0.5.0

 Multi MR inputs can not be configured without accessing internal proto 
 structures
 -

 Key: TEZ-1496
 URL: https://issues.apache.org/jira/browse/TEZ-1496
 Project: Apache Tez
  Issue Type: Bug
Reporter: Vikram Dixit K
Priority: Blocker

 With all the new API changes, the multi-mr input can no longer be configured 
 cleanly without accessing internal structures in tez.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1496) Multi MR inputs can not be configured without accessing internal proto structures

2014-08-25 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109960#comment-14109960
 ] 

Hitesh Shah commented on TEZ-1496:
--

[~vikram.dixit] Can you provide more details on what you are trying to 
configure and what APIs are lacking? 

 Multi MR inputs can not be configured without accessing internal proto 
 structures
 -

 Key: TEZ-1496
 URL: https://issues.apache.org/jira/browse/TEZ-1496
 Project: Apache Tez
  Issue Type: Bug
Reporter: Vikram Dixit K
Priority: Blocker

 With all the new API changes, the multi-mr input can no longer be configured 
 cleanly without accessing internal structures in tez.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1496) Multi MR inputs can not be configured without accessing internal proto structures

2014-08-25 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1496:
-

Affects Version/s: (was: 0.5.1)

 Multi MR inputs can not be configured without accessing internal proto 
 structures
 -

 Key: TEZ-1496
 URL: https://issues.apache.org/jira/browse/TEZ-1496
 Project: Apache Tez
  Issue Type: Bug
Reporter: Vikram Dixit K
Priority: Blocker

 With all the new API changes, the multi-mr input can no longer be configured 
 cleanly without accessing internal structures in tez.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1476) DAGClient waitForCompletion output is confusing

2014-08-25 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1476:


Fix Version/s: 0.5.1

 DAGClient waitForCompletion output is confusing
 ---

 Key: TEZ-1476
 URL: https://issues.apache.org/jira/browse/TEZ-1476
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Assignee: Jonathan Eagles
Priority: Critical
 Fix For: 0.5.1

 Attachments: TEZ-1476-v1.patch, TEZ-1476-v2.patch


 When a DAG is submitted - 2014-08-21 16:38:06,153 INFO  [main] 
 rpc.DAGClientRPCImpl (DAGClientRPCImpl.java:log(428)) - Waiting for DAG to 
 start running is logged.
 After this, nothing seems to get logged till the first task completes.
 It would be useful to log when the state changes to RUNNING - as well as at 
 least one line stating the number of tasks, etc (0% progress line). Also, 
 progress could be logged every few seconds irrespective of whether it has 
 changed or not to give the impression that the job has not just gotten stuck.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1490) dagid reported is incorrect in TezClient.java

2014-08-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110023#comment-14110023
 ] 

Jonathan Eagles commented on TEZ-1490:
--

[~bikassaha], could you give a review and provide input for target version as 
well?

 dagid reported is incorrect in TezClient.java
 -

 Key: TEZ-1490
 URL: https://issues.apache.org/jira/browse/TEZ-1490
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Jonathan Eagles
 Attachments: TEZ-1490-v1.patch


 The format used to get the dagid and appid in TezClient.java does not match 
 the one used in TezDagId.java.
 ex. 
 TezClient.java reports dagid as  dag_1408740248751_3_01
 The dagid as reported in logs is dag_1408740248751_0003_1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1497) Add tez-broadcast-example into tez-examples/

2014-08-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110042#comment-14110042
 ] 

Bikas Saha commented on TEZ-1497:
-

The JoinExample in tez-examples is already illustrating the broadcast edge with 
a broadcast join user story. Does that suffice or is this example illustrating 
some other concepts?

 Add tez-broadcast-example into tez-examples/
 

 Key: TEZ-1497
 URL: https://issues.apache.org/jira/browse/TEZ-1497
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V

 Modify https://github.com/t3rmin4t0r/tez-broadcast-example into a usable 
 example inside tez-examples.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1433) Invalid credentials can be used when a DAG is submitted to a session which has timed out

2014-08-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110053#comment-14110053
 ] 

Jonathan Eagles commented on TEZ-1433:
--

A true rollback can be difficult to implement (how to distinguish user changes 
vs client changes) without significant instrumentation or maintaining a copy of 
the unmodified DAG submitted. [~sseth], how would you feel if we change to 
submitting a copy of the client DAG? In that case it becomes safe to submit a 
dag multiple times per session and even across sessions with any side effects.

 Invalid credentials can be used when a DAG is submitted to a session which 
 has timed out
 

 Key: TEZ-1433
 URL: https://issues.apache.org/jira/browse/TEZ-1433
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Siddharth Seth
Assignee: Jonathan Eagles
 Attachments: TEZ-1433-v1.patch


 When a DAG is submitted to a session which has timed out, and the same DAG is 
 then submitted to a new session - credentials associated with the old session 
 can end up getting used.
 Before we know that the session is no longer valid, the DAG is modified to 
 add local resources and credentials.
 On the next submission, since the DAG already has tokens (for HDFS for 
 example) from the old session, the tokens are not updated.
 Meanwhile, the old token would end up being cancelled by the RM - since the 
 applicaiton associated with the previous session has finished.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1490) dagid reported is incorrect in TezClient.java

2014-08-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110060#comment-14110060
 ] 

Bikas Saha commented on TEZ-1490:
-

Clearly, the DO NOT CHANGE this comment did not help in maintaining the sync 
between the 2 identical codes. If we can make this common in the API project 
(and reference from the DAG project) that would be ideal. Otherwise the patch 
looks good to me as is. Given that the current RC is cancelled, we should 
probably bring this change all the way into 0.5.0.

 dagid reported is incorrect in TezClient.java
 -

 Key: TEZ-1490
 URL: https://issues.apache.org/jira/browse/TEZ-1490
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Jonathan Eagles
 Attachments: TEZ-1490-v1.patch


 The format used to get the dagid and appid in TezClient.java does not match 
 the one used in TezDagId.java.
 ex. 
 TezClient.java reports dagid as  dag_1408740248751_3_01
 The dagid as reported in logs is dag_1408740248751_0003_1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1430) Javadoc generation should not generate docs for classes annotated as private

2014-08-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110226#comment-14110226
 ] 

Jonathan Eagles commented on TEZ-1430:
--

[~hitesh], can you please review as you added the original javadoc generation 
to tez?

 Javadoc generation should not generate docs for classes annotated as private
 

 Key: TEZ-1430
 URL: https://issues.apache.org/jira/browse/TEZ-1430
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Jonathan Eagles
 Attachments: TEZ-1430-v2.patch


 mvn javadoc:javadoc generates javadoc for everything. Haven't tried mvn site 
 though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-25 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1492:
--

Attachment: TEZ-1492.3.patch

Moved BufferUtils and FastByteComparisons to o.a.t.runtime.library.utils in the 
latest patch.

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch, TEZ-1492.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1493) Tez examples fail in recovery sometimes

2014-08-25 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1493:


Summary: Tez examples fail in recovery sometimes  (was: WordCount example 
fails in recovery)

 Tez examples fail in recovery sometimes
 ---

 Key: TEZ-1493
 URL: https://issues.apache.org/jira/browse/TEZ-1493
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Blocker
 Attachments: Tez-1493.patch


 {code}
 14/08/25 17:37:03 INFO client.TezClient: Submitting DAG to YARN, 
 applicationId=application_1408499461970_0053, dagName=WordCount
 14/08/25 17:37:03 INFO impl.YarnClientImpl: Submitted application 
 application_1408499461970_0053
 14/08/25 17:37:03 INFO client.TezClient: The url to track the Tez AM: 
 http://jzhangMBPr.local:8088/proxy/application_1408499461970_0053/
 14/08/25 17:37:03 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:8032
 14/08/25 17:37:03 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/08/25 17:37:03 INFO rpc.DAGClientRPCImpl: Waiting for DAG to start running
 14/08/25 17:37:07 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 0% 
 TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 
 50% TotalTasks: 2 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
 14/08/25 17:37:17 INFO rpc.DAGClientRPCImpl: DAG completed. 
 FinalState=SUBMITTED
 WordCount failed with diagnostics: []
 {code}
 The client side shows that the job is failed, but checking the logs found 
 that the recovery works in server side, and eventually finish the job 
 successfully.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TEZ-1498) Usage info is not printed when error number of arguments for JoinExample

2014-08-25 Thread Jeff Zhang (JIRA)
Jeff Zhang created TEZ-1498:
---

 Summary: Usage info is not printed when error number of arguments 
for JoinExample
 Key: TEZ-1498
 URL: https://issues.apache.org/jira/browse/TEZ-1498
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110264#comment-14110264
 ] 

Gopal V commented on TEZ-1492:
--

Thanks [~rajesh.balamohan], can I have the same diff with git mv instead of 
the big change-sets?

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch, TEZ-1492.3.patch, 
 TEZ-1492.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TEZ-1492) IFile RLE not kicking in due to bug in BufferUtils.compare()

2014-08-25 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1492:
--

Attachment: TEZ-1492.4.patch

No code change (but regenerated the patch with git diff -M --no-prefix HEAD)

 IFile RLE not kicking in due to bug in BufferUtils.compare()
 

 Key: TEZ-1492
 URL: https://issues.apache.org/jira/browse/TEZ-1492
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: TEZ-1492.1.patch, TEZ-1492.2.patch, TEZ-1492.3.patch, 
 TEZ-1492.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TEZ-1497) Add tez-broadcast-example into tez-examples/

2014-08-25 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V resolved TEZ-1497.
--

Resolution: Not a Problem

 Add tez-broadcast-example into tez-examples/
 

 Key: TEZ-1497
 URL: https://issues.apache.org/jira/browse/TEZ-1497
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V
Assignee: Gopal V

 Modify https://github.com/t3rmin4t0r/tez-broadcast-example into a usable 
 example inside tez-examples.



--
This message was sent by Atlassian JIRA
(v6.2#6252)