[jira] [Created] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1945:
-

 Summary: Remove 2 GB memlimit restriction in MergeManager
 Key: TEZ-1945
 URL: https://issues.apache.org/jira/browse/TEZ-1945
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


In certain situations (data coming in larger chunks, but yet to complete), 
fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
become available.  

Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1945:
--
Attachment: TEZ-1945.1.patch

Attaching initial patch to remove 2 GB check. There is a corner case wherein, 
this can break Intermediate mem-to-mem merging as it relies InMemoryWriter and 
InMemoryWriter is currently bound by 2 GB limit.  One option could be to do 
intemediate mem-to-mem merging "only" till 2 GB in createInMemorySegments() for 
processing.

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-1945:
-

Assignee: Rajesh Balamohan

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1946) Show all counters in the counter selector UI.

2015-01-14 Thread Sreenath Somarajapuram (JIRA)
Sreenath Somarajapuram created TEZ-1946:
---

 Summary: Show all counters in the counter selector UI.
 Key: TEZ-1946
 URL: https://issues.apache.org/jira/browse/TEZ-1946
 Project: Apache Tez
  Issue Type: Task
Reporter: Sreenath Somarajapuram
Assignee: Sreenath Somarajapuram






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1947) Failing fast when DAG configs have wrong values can save cluster resources

2015-01-14 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1947:
-

 Summary: Failing fast when DAG configs have wrong values can save 
cluster resources
 Key: TEZ-1947
 URL: https://issues.apache.org/jira/browse/TEZ-1947
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


It would be beneficial to do certain config checks upfront rather having fail 
later in the downstream.  For e.g, in the following example the DAG failed 
after 400+ seconds for some config issue.

{code}
Status: Running (Executing on YARN cluster with App id 
application_1421164610335_0060)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..  KILLED2511700   81   0  81
Reducer 2 FAILED   1009  00 1009  231008

VERTICES: 00/02  [===>>---] 13%   ELAPSED TIME: 449.01 s

Status: Failed
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, 
diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
mergeThreshold: 148668720
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
mergeThreshold: 148668720
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1947) Failing fast when DAG configs have wrong values can save cluster resources

2015-01-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1947:
--
Description: 
It would be beneficial to do certain config checks (whereever possible) upfront 
rather having fail later in the downstream.  For e.g, in the following example 
the DAG failed after 400+ seconds for some config issue.

{code}
Status: Running (Executing on YARN cluster with App id 
application_1421164610335_0060)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..  KILLED2511700   81   0  81
Reducer 2 FAILED   1009  00 1009  231008

VERTICES: 00/02  [===>>---] 13%   ELAPSED TIME: 449.01 s

Status: Failed
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, 
diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
mergeThreshold: 148668720
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
mergeThreshold: 148668720
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
{code}

  was:
It would be beneficial to do certain config checks upfront rather having fail 
later in the downstream.  For e.g, in the following example the DAG failed 
after 400+ seconds for some config issue.

{code}
Status: Running (Executing on YARN cluster with App id 
application_1421164610335_0060)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..  KILLED2511700   81   0  81
Reducer 2 FAILED   1009  00 1009  231008

VERTICES: 00/02  [===>>---] 13%   ELAPSED TIME: 449.01 s

Status: Failed
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, 
diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleL

[jira] [Updated] (TEZ-1947) Failing fast when DAG configs have wrong values can save cluster resources

2015-01-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1947:
--
Description: 
It would be beneficial to do certain config checks (wherever possible) upfront 
rather having fail later in the downstream.  For e.g, in the following example 
the DAG failed after 400+ seconds for some config issue.

{code}
Status: Running (Executing on YARN cluster with App id 
application_1421164610335_0060)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..  KILLED2511700   81   0  81
Reducer 2 FAILED   1009  00 1009  231008

VERTICES: 00/02  [===>>---] 13%   ELAPSED TIME: 449.01 s

Status: Failed
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, 
diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
mergeThreshold: 148668720
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
mergeThreshold: 148668720
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260)
at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206)
at 
org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
{code}

  was:
It would be beneficial to do certain config checks (whereever possible) upfront 
rather having fail later in the downstream.  For e.g, in the following example 
the DAG failed after 400+ seconds for some config issue.

{code}
Status: Running (Executing on YARN cluster with App id 
application_1421164610335_0060)


VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..  KILLED2511700   81   0  81
Reducer 2 FAILED   1009  00 1009  231008

VERTICES: 00/02  [===>>---] 13%   ELAPSED TIME: 449.01 s

Status: Failed
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, 
diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: Invlaid configuratio

[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints

2015-01-14 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276901#comment-14276901
 ] 

Jeff Zhang commented on TEZ-1069:
-

Yes, [~hitesh]

I have a initial patch that can works.  Here's the main flow

* Identify whether the TaskAttempt is failed due OOM. 2 ways:
** ContainerExitStatus
** TaskAttemptCompleteEvent through heartbeat (  OOM exception may be caught 
and passed through heartbeat )
* Remember how many times of OOM failed task attempts for each task, and 
calculate the max value of this vertex. 
* Update the Resource of vertex and all its tasks based on the max OOM failed 
task attempts  : pow(1+increase_percent_per_OOM_failed_attempt,  
max_failed_attempt)

For the task attempt that is in the START_WAIT ( being scheduled by 
TaskSchedulerService), I didn't change it now. This may be the most complicated 
part if required. 

 




> Support ability to re-size a task attempt when previous attempts fail due to 
> resource constraints
> -
>
> Key: TEZ-1069
> URL: https://issues.apache.org/jira/browse/TEZ-1069
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
>
> Consider a case where attempts for the final stage in a long DAG fails due to 
> out of memory. In such a scenario, the framework  ( or via the base vertex 
> manager ) should be able to change the task specifications on the fly to 
> trigger a re-run with modified specs. 
> Changes could be both java opts changes as well as container resource 
> requirements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints

2015-01-14 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1069:

Attachment: TEZ-1069-1.patch

> Support ability to re-size a task attempt when previous attempts fail due to 
> resource constraints
> -
>
> Key: TEZ-1069
> URL: https://issues.apache.org/jira/browse/TEZ-1069
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1069-1.patch
>
>
> Consider a case where attempts for the final stage in a long DAG fails due to 
> out of memory. In such a scenario, the framework  ( or via the base vertex 
> manager ) should be able to change the task specifications on the fly to 
> trigger a re-run with modified specs. 
> Changes could be both java opts changes as well as container resource 
> requirements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276928#comment-14276928
 ] 

Hadoop QA commented on TEZ-1069:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692217/TEZ-1069-1.patch
  against master revision 6bc500f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 260 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/18//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/18//console

This message is automatically generated.

> Support ability to re-size a task attempt when previous attempts fail due to 
> resource constraints
> -
>
> Key: TEZ-1069
> URL: https://issues.apache.org/jira/browse/TEZ-1069
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1069-1.patch
>
>
> Consider a case where attempts for the final stage in a long DAG fails due to 
> out of memory. In such a scenario, the framework  ( or via the base vertex 
> manager ) should be able to change the task specifications on the fly to 
> trigger a re-run with modified specs. 
> Changes could be both java opts changes as well as container resource 
> requirements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1942:
--
Attachment: result_with_primary_filter.png
result_with_direct_vertex.png

did some analysis. looks like the results returned is changing based on the 
query parameters. 
when queried for "get all vertexes for this dag" it returns 1009  (numTasks in 
the screenshot)
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1
{code}

when queried for "get for a particular dag" it returns 253
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
{code}

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276929#comment-14276929
 ] 

Prakash Ramachandran edited comment on TEZ-1942 at 1/14/15 2:18 PM:


did some analysis. looks like the results returned is changing based on the 
query parameters. 
when queried for "get all vertexes for this dag" it returns 1009  (numTasks in 
the screenshot)
see screenshot result_with_primary_filter.png
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1
{code}

when queried for "get for a particular dag" it returns 253
see screenshot result_with_direct_vertex.png
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
{code}


was (Author: pramachandran):
did some analysis. looks like the results returned is changing based on the 
query parameters. 
when queried for "get all vertexes for this dag" it returns 1009  (numTasks in 
the screenshot)
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1
{code}

when queried for "get for a particular dag" it returns 253
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
{code}

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276929#comment-14276929
 ] 

Prakash Ramachandran edited comment on TEZ-1942 at 1/14/15 2:21 PM:


did some analysis. looks like the results returned is changing based on the 
query parameters. 
when queried for "get all vertexes for this dag" it returns 1009  (numTasks in 
the screenshot)
see screenshot result_with_primary_filter.png
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1
{code}

when queried for "get for a particular vertex" it returns 253
see screenshot result_with_direct_vertex.png
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
{code}


was (Author: pramachandran):
did some analysis. looks like the results returned is changing based on the 
query parameters. 
when queried for "get all vertexes for this dag" it returns 1009  (numTasks in 
the screenshot)
see screenshot result_with_primary_filter.png
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1
{code}

when queried for "get for a particular dag" it returns 253
see screenshot result_with_direct_vertex.png
{code}
http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
{code}

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277003#comment-14277003
 ] 

Prakash Ramachandran commented on TEZ-1890:
---

+1 patch looks good. tried with extracting from the war file.

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1890-v1.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277132#comment-14277132
 ] 

Rajesh Balamohan commented on TEZ-1945:
---

*Job:*
1. 10 TB scale
2. Hive query with tez "create table testData as select * from lineitem 
distribute by l_shipdate;"

Saw 5-7% improvement in runtime of job.  Counter details are given below, which 
shows good reduction in resource usages during shuffle (e.g 
NUM_MEM_TO_DISK_MERGES, ADDITIONAL_SPILLS_BYTES_WRITTEN, SPILLED_RECORDS)

Counter details TaskCounter_Reducer_2_INPUT_Map_1

||counter||4 GB, tez.runtime.shuffle.fetch.buffer.percent=0.5, 
tez.runtime.shuffle.merge.percent=0.5,application_1421164610335_0059 ||4 GB, 
tez.runtime.shuffle.fetch.buffer.percent=0.8, 
tez.runtime.shuffle.merge.percent=0.8,application_1421164610335_0064||8 GB, 
tez.runtime.shuffle.memory.limit.percent=0.1, 
tez.runtime.shuffle.fetch.buffer.percent=0.14,application_1421164610335_0063||8 
GB, tez.runtime.shuffle.memory.limit.percent=0.2, 
tez.runtime.shuffle.fetch.buffer.percent=0.5,application_1421164610335_0058||
|ADDITIONAL_SPILLS_BYTES_READ|200812472683|125413261965|331929593129|31373505945|
|ADDITIONAL_SPILLS_BYTES_WRITTEN|181649974257|106277188725|312660112747|12149251314|
|COMBINE_INPUT_RECORDS|0|0|0|0|
|FIRST_EVENT_RECEIVED|10292|12048|7404|6012|
|LAST_EVENT_RECEIVED|31296182|28215975|10513984|7342057|
|MERGED_MAP_OUTPUTS|244976|244976|244976|244976|
|MERGE_PHASE_TIME|39177076|36337714|15940783|11425071|
|NUM_DISK_TO_DISK_MERGES|0|0|0|0|
|NUM_FAILED_SHUFFLE_INPUTS|0|0|0|0|
|NUM_MEM_TO_DISK_MERGES|491|3|4537|0|
|NUM_SHUFFLED_INPUTS|244976|244976|244976|244976|
|NUM_SKIPPED_INPUTS|8283|8283|8283|8283|
|REDUCE_INPUT_GROUPS|0|0|0|0|
|REDUCE_INPUT_RECORDS|589709|589709|589709|589709|
|SHUFFLE_BYTES|365219732545|365204956818|365241417228|365215810254|
|SHUFFLE_BYTES_DECOMPRESSED|801646699974|801646699974|801646699974|801646699974|
|SHUFFLE_BYTES_DISK_DIRECT|19162498426|19136073240|19269480382|19224254631|
|SHUFFLE_BYTES_TO_DISK|0|0|0|0|
|SHUFFLE_BYTES_TO_MEM|346057234119|346068883578|345971936846|345991555623|
|SHUFFLE_PHASE_TIME|38339256|34248855|15332154|11018423|
|SPILLED_RECORDS|3272861488|2042317909|5452541545|511585624|

*Merge memory details for the above runs (applicationIds for reference)*

4 GB container Runs:
application_1421164610335_0059:
MergerManager: memoryLimit=1564475392, maxSingleShuffleLimit=391118848, 
mergeThreshold=782237696, ioSortFactor=200, memToMemMergeOutputsThreshold=200

application_1421164610335_0064:
memoryLimit=2296271339, maxSingleShuffleLimit=574067840, 
mergeThreshold=1837017088, ioSortFactor=200, memToMemMergeOutputsThreshold=200

8 GB container Runs:
application_1421164610335_0058:
MergerManager: memoryLimit=4437280030, maxSingleShuffleLimit=1109320064, 
mergeThreshold=3993552128, ioSortFactor=200, memToMemMergeOutputsThreshold=200

application_1421164610335_0058:
MergerManager: memoryLimit=891079872, maxSingleShuffleLimit=89107992, 
mergeThreshold=139008464, ioSortFactor=200, memToMemMergeOutputsThreshold=200


> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277138#comment-14277138
 ] 

Jonathan Eagles commented on TEZ-1890:
--

[~Sreenath], [~pramachandran], any concerns with Ambari integration over this 
change?

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1890-v1.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277179#comment-14277179
 ] 

Hitesh Shah commented on TEZ-1069:
--

I am not sure if that is the approach I would have taken. My thinking was more 
along the lines for querying the VertexManager to allow it to modify the task 
specifications in such cases. Changing the resource is not enough. One would 
also need to change the java opts. For the latter, we would need to write a 
java opts parser. 

Isn't it better to setup hooks in case of OOM failures for a VertexManager to 
resize the task? Furthermore, a lot of OOM failures are due to data skew where 
one task is affected but the rest are not. 
 
Last question on when should this increase be done? Should it be done on each 
attempt failure or only on the last attempt? 

> Support ability to re-size a task attempt when previous attempts fail due to 
> resource constraints
> -
>
> Key: TEZ-1069
> URL: https://issues.apache.org/jira/browse/TEZ-1069
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1069-1.patch
>
>
> Consider a case where attempts for the final stage in a long DAG fails due to 
> out of memory. In such a scenario, the framework  ( or via the base vertex 
> manager ) should be able to change the task specifications on the fly to 
> trigger a re-run with modified specs. 
> Changes could be both java opts changes as well as container resource 
> requirements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277179#comment-14277179
 ] 

Hitesh Shah edited comment on TEZ-1069 at 1/14/15 4:34 PM:
---

I am not sure if that is the approach I would have taken. My thinking was more 
along the lines for querying the VertexManager to allow it to modify the task 
specifications in such cases. Changing the resource is not enough. One would 
also need to change the java opts. For the latter, we would need to write a 
java opts parser if the user had specified their own java opts ( Xmx, etc ). 

Isn't it better to setup hooks in case of OOM failures for a VertexManager to 
resize the task? Furthermore, a lot of OOM failures are due to data skew where 
one task is affected but the rest are not. 
 
Last question on when should this increase be done? Should it be done on each 
attempt failure or only on the last attempt? 


was (Author: hitesh):
I am not sure if that is the approach I would have taken. My thinking was more 
along the lines for querying the VertexManager to allow it to modify the task 
specifications in such cases. Changing the resource is not enough. One would 
also need to change the java opts. For the latter, we would need to write a 
java opts parser. 

Isn't it better to setup hooks in case of OOM failures for a VertexManager to 
resize the task? Furthermore, a lot of OOM failures are due to data skew where 
one task is affected but the rest are not. 
 
Last question on when should this increase be done? Should it be done on each 
attempt failure or only on the last attempt? 

> Support ability to re-size a task attempt when previous attempts fail due to 
> resource constraints
> -
>
> Key: TEZ-1069
> URL: https://issues.apache.org/jira/browse/TEZ-1069
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1069-1.patch
>
>
> Consider a case where attempts for the final stage in a long DAG fails due to 
> out of memory. In such a scenario, the framework  ( or via the base vertex 
> manager ) should be able to change the task specifications on the fly to 
> trigger a re-run with modified specs. 
> Changes could be both java opts changes as well as container resource 
> requirements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277186#comment-14277186
 ] 

Hitesh Shah commented on TEZ-1890:
--

Do any of the docs published need to change? 

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1890-v1.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1947) Failing fast when DAG configs have wrong values can save cluster resources

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277201#comment-14277201
 ] 

Hitesh Shah commented on TEZ-1947:
--

Code also has a typo that could be fixed: "Invlaid configuration:" 

MR had a notion of checking job specifications before anything is run. This was 
done on the client as part of submission. We could probably do something 
similar but this will affect all runtime library components. Also, the question 
is whether to run this on the client or in the AM? The AM need not have all the 
necessary jars to instantiate all custom objects.  

> Failing fast when DAG configs have wrong values can save cluster resources
> --
>
> Key: TEZ-1947
> URL: https://issues.apache.org/jira/browse/TEZ-1947
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>
> It would be beneficial to do certain config checks (wherever possible) 
> upfront rather having fail later in the downstream.  For e.g, in the 
> following example the DAG failed after 400+ seconds for some config issue.
> {code}
> Status: Running (Executing on YARN cluster with App id 
> application_1421164610335_0060)
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 ..  KILLED2511700   81   0  
> 81
> Reducer 2 FAILED   1009  00 1009  23
> 1008
> 
> VERTICES: 00/02  [===>>---] 13%   ELAPSED TIME: 449.01 s
> 
> Status: Failed
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, 
> diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
> should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
> mergeThreshold: 148668720
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 1 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
> should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
> mergeThreshold: 148668720
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260)
> at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277203#comment-14277203
 ] 

Hitesh Shah commented on TEZ-1942:
--

[~pramachandran] Does a YARN jira need to be filed for this timeline issue? 

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277225#comment-14277225
 ] 

Jonathan Eagles commented on TEZ-1890:
--

Good catch, [~hitesh]. Will post a patch soon.

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1890-v1.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-1890:
-
Attachment: TEZ-1890-v2.patch

Addressed site doc with v2 patch. [~hitesh], [~pramachandran], [~Sreenath], 
please review.

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277286#comment-14277286
 ] 

Sreenath Somarajapuram commented on TEZ-1890:
-

src/assembly directory can also be removed, without maven-assembly-plugin it 
serves no purpose.

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277315#comment-14277315
 ] 

Sreenath Somarajapuram commented on TEZ-1890:
-

Sorry, my bad. Saw your git rm comment just now.
+1

The changes wont affect Ambari view, moreover Ambari view pom looks for 
0.6.0-SNAPSHOT.

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277322#comment-14277322
 ] 

Hadoop QA commented on TEZ-1890:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692262/TEZ-1890-v2.patch
  against master revision 6bc500f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 260 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/19//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/19//console

This message is automatically generated.

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277331#comment-14277331
 ] 

Prakash Ramachandran commented on TEZ-1942:
---

raised YARN-2444 for the same. 

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277331#comment-14277331
 ] 

Prakash Ramachandran edited comment on TEZ-1942 at 1/14/15 6:00 PM:


raised YARN-3062 for the same. 


was (Author: pramachandran):
raised YARN-2444 for the same. 

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1948) mergedMapOutput counters can be incorrect in case of on disk merges

2015-01-14 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1948:
---

 Summary: mergedMapOutput counters can be incorrect in case of on 
disk merges
 Key: TEZ-1948
 URL: https://issues.apache.org/jira/browse/TEZ-1948
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth


In TezMerger
{code}
Constants.MERGED_OUTPUT_PREFIX) ? 
null : mergedMapOutputsCounter)));
{code}

The MergeManager now writes out merge files with an id at the end, which can 
cause this counter to be incorrect.

There's another jira to move the merge file name generation into TezOutputFiles.

This may be as simple as moving the id appended to the merged files before the 
suffix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277456#comment-14277456
 ] 

Siddharth Seth commented on TEZ-1945:
-

+1. Patch looks good to me. Thanks [~rajesh.balamohan].

Does the NUM_MEM_TO_DISK_MERGES counter seem incorrect ? I'd expect 
SPILLED_RECORDS to be 0 in the cases where NUM_MEM_TO_DISK_MERGES, 
NUM_DISK_TO_DISK_MERGES and SHUFFLE_BYTES_TO_DISK are 0. Could be because of 
SHUFFLE_BYTES_DISK_DIRECT.

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277456#comment-14277456
 ] 

Siddharth Seth edited comment on TEZ-1945 at 1/14/15 7:03 PM:
--

+1. Patch looks good to me. Thanks [~rajesh.balamohan].

Does the NUM_MEM_TO_DISK_MERGES counter seem incorrect ? I'd expect 
SPILLED_RECORDS to be 0 in the cases where NUM_MEM_TO_DISK_MERGES, 
NUM_DISK_TO_DISK_MERGES and SHUFFLE_BYTES_TO_DISK are 0. Could be because of 
SHUFFLE_BYTES_DISK_DIRECT.
Also, adjusting POST_MERGE may make the runs faster.


was (Author: sseth):
+1. Patch looks good to me. Thanks [~rajesh.balamohan].

Does the NUM_MEM_TO_DISK_MERGES counter seem incorrect ? I'd expect 
SPILLED_RECORDS to be 0 in the cases where NUM_MEM_TO_DISK_MERGES, 
NUM_DISK_TO_DISK_MERGES and SHUFFLE_BYTES_TO_DISK are 0. Could be because of 
SHUFFLE_BYTES_DISK_DIRECT.

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277464#comment-14277464
 ] 

Jonathan Eagles commented on TEZ-1890:
--

Thanks, everybody for the feedback. Committed this to master and branch-0.6

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository

2015-01-14 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles reassigned TEZ-1890:


Assignee: Jonathan Eagles

> tez-ui web.tar.gz also being uploaded to maven repository
> -
>
> Key: TEZ-1890
> URL: https://issues.apache.org/jira/browse/TEZ-1890
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Jonathan Eagles
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch
>
>
> Not sure if we should be uploading the web tar.gz as part of maven deploy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277482#comment-14277482
 ] 

Siddharth Seth commented on TEZ-1945:
-

Ideally, we should be changing the POST_MERGE_MEM_LIMIT to be a long as well. 
Separate jira ?

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1949) Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges

2015-01-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1949:
-
Affects Version/s: 0.7.0

> Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges
> ---
>
> Key: TEZ-1949
> URL: https://issues.apache.org/jira/browse/TEZ-1949
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.5.2, 0.7.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> Tez configuration whitelisting is missing TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH 
> for broadcast edges (UnorderedKVInput).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1949) Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges

2015-01-14 Thread Gopal V (JIRA)
Gopal V created TEZ-1949:


 Summary: Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast 
edges
 Key: TEZ-1949
 URL: https://issues.apache.org/jira/browse/TEZ-1949
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2, 0.6.0
Reporter: Gopal V
Assignee: Gopal V


Tez configuration whitelisting is missing TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for 
broadcast edges (UnorderedKVInput).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1949) Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges

2015-01-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1949:
-
Attachment: TEZ-1949.1.patch

> Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges
> ---
>
> Key: TEZ-1949
> URL: https://issues.apache.org/jira/browse/TEZ-1949
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.5.2, 0.7.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: TEZ-1949.1.patch
>
>
> Tez configuration whitelisting is missing TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH 
> for broadcast edges (UnorderedKVInput).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1949) Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277791#comment-14277791
 ] 

Hadoop QA commented on TEZ-1949:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692335/TEZ-1949.1.patch
  against master revision adcfb84.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 260 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/20//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/20//console

This message is automatically generated.

> Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges
> ---
>
> Key: TEZ-1949
> URL: https://issues.apache.org/jira/browse/TEZ-1949
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.5.2, 0.7.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: TEZ-1949.1.patch
>
>
> Tez configuration whitelisting is missing TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH 
> for broadcast edges (UnorderedKVInput).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277905#comment-14277905
 ] 

Rajesh Balamohan commented on TEZ-1945:
---

SPILLED_RECORDS can be > 0  as it is accounted for in finalMerge (mem + disk).  
Will create a separate JIRA for post_merge_mem_limit.

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1950) Incorrect handling of counters in TaskAttemptImpl

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1950:


 Summary: Incorrect handling of counters in TaskAttemptImpl
 Key: TEZ-1950
 URL: https://issues.apache.org/jira/browse/TEZ-1950
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah


To maintain task attempt counters, we are using 
TaskAttempt.TaskAttemptStatus.counters

Now, counters is not accessed in a thread safe manner. 

Counters are updated in either StatusUpdaterTransition or modified as part of
TaskAttempt.TaskAttemptStatus::setLocalityCounter(). 

In a scenario, where TaskAttempt::getCounters() is called before any status 
update transition comes back, the locality counter will get lost because the 
atomic boolean flag is never reset.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1950) Incorrect handling of locality counter in TaskAttemptImpl

2015-01-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1950:
-
Summary: Incorrect handling of locality counter in TaskAttemptImpl  (was: 
Incorrect handling of counters in TaskAttemptImpl)

> Incorrect handling of locality counter in TaskAttemptImpl
> -
>
> Key: TEZ-1950
> URL: https://issues.apache.org/jira/browse/TEZ-1950
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> To maintain task attempt counters, we are using 
> TaskAttempt.TaskAttemptStatus.counters
> Now, counters is not accessed in a thread safe manner. 
> Counters are updated in either StatusUpdaterTransition or modified as part of
> TaskAttempt.TaskAttemptStatus::setLocalityCounter(). 
> In a scenario, where TaskAttempt::getCounters() is called before any status 
> update transition comes back, the locality counter will get lost because the 
> atomic boolean flag is never reset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1945:
--
Target Version/s: 0.7.0
   Fix Version/s: (was: 0.7.0)

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1950) Incorrect handling of locality counter in TaskAttemptImpl

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277919#comment-14277919
 ] 

Hitesh Shah commented on TEZ-1950:
--

[~rajesh.balamohan] [~sseth] Does the above analysis seem correct?

> Incorrect handling of locality counter in TaskAttemptImpl
> -
>
> Key: TEZ-1950
> URL: https://issues.apache.org/jira/browse/TEZ-1950
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> To maintain task attempt counters, we are using 
> TaskAttempt.TaskAttemptStatus.counters
> Now, counters is not accessed in a thread safe manner. 
> Counters are updated in either StatusUpdaterTransition or modified as part of
> TaskAttempt.TaskAttemptStatus::setLocalityCounter(). 
> In a scenario, where TaskAttempt::getCounters() is called before any status 
> update transition comes back, the locality counter will get lost because the 
> atomic boolean flag is never reset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1900) Fix findbugs warnings in tez-dag

2015-01-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1900:
-
Issue Type: Bug  (was: Sub-task)
Parent: (was: TEZ-316)

> Fix findbugs warnings in tez-dag
> 
>
> Key: TEZ-1900
> URL: https://issues.apache.org/jira/browse/TEZ-1900
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>
> Might need to be split out more. 
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-dag.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1951) Fix general findbugs warnings in tez-dag

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1951:


 Summary: Fix general findbugs warnings in tez-dag
 Key: TEZ-1951
 URL: https://issues.apache.org/jira/browse/TEZ-1951
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1953) Inconsistent synchronization of org.apache.tez.dag.app.dag.impl.VertexImpl.groupInputSpecList

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1953:


 Summary: Inconsistent synchronization of 
org.apache.tez.dag.app.dag.impl.VertexImpl.groupInputSpecList
 Key: TEZ-1953
 URL: https://issues.apache.org/jira/browse/TEZ-1953
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah


Inconsistent synchronization of 
org.apache.tez.dag.app.dag.impl.VertexImpl.groupInputSpecList; locked 50% of 
time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1952) Inconsistent synchronization of org.apache.tez.dag.app.dag.impl.Edge.edgeProperty

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1952:


 Summary: Inconsistent synchronization of 
org.apache.tez.dag.app.dag.impl.Edge.edgeProperty
 Key: TEZ-1952
 URL: https://issues.apache.org/jira/browse/TEZ-1952
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah


Inconsistent synchronization of 
org.apache.tez.dag.app.dag.impl.Edge.edgeProperty; locked 78% of time

In class org.apache.tez.dag.app.dag.impl.Edge
Field org.apache.tez.dag.app.dag.impl.Edge.edgeProperty
Synchronized 78% of the time
Unsynchronized access at Edge.java:[line 212]
Unsynchronized access at Edge.java:[line 184]
Unsynchronized access at Edge.java:[line 226]
Synchronized access at Edge.java:[line 117]
Synchronized access at Edge.java:[line 131]
Synchronized access at Edge.java:[line 144]
Synchronized access at Edge.java:[line 133]
Synchronized access at Edge.java:[line 134]
Synchronized access at Edge.java:[line 137]
Synchronized access at Edge.java:[line 167]
Synchronized access at Edge.java:[line 167]
Synchronized access at Edge.java:[line 167]
Synchronized access at Edge.java:[line 167]
Synchronized access at Edge.java:[line 173]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1954) Multiple instances of Inconsistent synchronization in org.apache.tez.dag.app.DAGAppMaster.

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1954:


 Summary: Multiple instances of Inconsistent synchronization in 
org.apache.tez.dag.app.DAGAppMaster.
 Key: TEZ-1954
 URL: https://issues.apache.org/jira/browse/TEZ-1954
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah


Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.amTokens; 
locked 50% of time
Inconsistent synchronization of 
org.apache.tez.dag.app.DAGAppMaster.appMasterUgi; locked 66% of time
Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.context; 
locked 65% of time
Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.currentDAG; 
locked 72% of time
Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.state; 
locked 80% of time
Inconsistent synchronization of 
org.apache.tez.dag.app.DAGAppMaster.taskSchedulerEventHandler; locked 78% of 
time
Inconsistent synchronization of 
org.apache.tez.dag.app.DAGAppMaster.versionMismatch; locked 83% of time
Inconsistent synchronization of 
org.apache.tez.dag.app.DAGAppMaster.versionMismatchDiagnostics; locked 80% of 
time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1955) Inconsistent synchronization of org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.taskScheduler

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1955:


 Summary: Inconsistent synchronization of 
org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.taskScheduler
 Key: TEZ-1955
 URL: https://issues.apache.org/jira/browse/TEZ-1955
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah


Inconsistent synchronization of 
org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.taskScheduler; locked 47% 
of time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1956) Multiple instances: Inconsistent synchronization of org.apache.tez.dag.app.rm.YarnTaskSchedulerService

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1956:


 Summary: Multiple instances: Inconsistent synchronization of 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService
 Key: TEZ-1956
 URL: https://issues.apache.org/jira/browse/TEZ-1956
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah


Inconsistent synchronization of 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.delayedContainerManager; 
locked 80% of time
Inconsistent synchronization of 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.heartbeatAtLastPreemption; 
locked 66% of time
Inconsistent synchronization of 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.localitySchedulingDelay; 
locked 91% of time
Inconsistent synchronization of 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptionPercentage; locked 
85% of time
Inconsistent synchronization of 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService.shouldReuseContainers; 
locked 85% of time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1957) Multiple instances: Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.DAGAppMaster

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1957:


 Summary: Multiple instances: Synchronization performed on 
java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.DAGAppMaster
 Key: TEZ-1957
 URL: https://issues.apache.org/jira/browse/TEZ-1957
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah


Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in 
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler.shutdown(boolean)
Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in 
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run()
Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in 
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHook.run()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1958) Synchronization performed on java.util.concurrent.BlockingQueue in org.apache.tez.dag.app.rm.LocalTaskSchedulerService

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1958:


 Summary: Synchronization performed on 
java.util.concurrent.BlockingQueue in 
org.apache.tez.dag.app.rm.LocalTaskSchedulerService
 Key: TEZ-1958
 URL: https://issues.apache.org/jira/browse/TEZ-1958
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah


Synchronization performed on java.util.concurrent.BlockingQueue in 
org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.addDeallocateTaskRequest(Object)
Synchronization performed on java.util.concurrent.BlockingQueue in 
org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1959) Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run()

2015-01-14 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-1959:


 Summary: Synchronization performed on 
java.util.concurrent.atomic.AtomicBoolean in 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run()
 Key: TEZ-1959
 URL: https://issues.apache.org/jira/browse/TEZ-1959
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah


Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run()

In class 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager
In method 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run()
Type java.util.concurrent.atomic.AtomicBoolean
Value loaded from field 
org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.drainedDelayedContainersForTest
At YarnTaskSchedulerService.java:[line 1822]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1951) Fix general findbugs warnings in tez-dag

2015-01-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1951:
-
Attachment: TEZ-1951.1.patch

> Fix general findbugs warnings in tez-dag
> 
>
> Key: TEZ-1951
> URL: https://issues.apache.org/jira/browse/TEZ-1951
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
> Attachments: TEZ-1951.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1951) Fix general findbugs warnings in tez-dag

2015-01-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned TEZ-1951:


Assignee: Hitesh Shah

> Fix general findbugs warnings in tez-dag
> 
>
> Key: TEZ-1951
> URL: https://issues.apache.org/jira/browse/TEZ-1951
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1951.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1905) Fix findbugs warnings in tez-tests

2015-01-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned TEZ-1905:


Assignee: Hitesh Shah

> Fix findbugs warnings in tez-tests
> --
>
> Key: TEZ-1905
> URL: https://issues.apache.org/jira/browse/TEZ-1905
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1900) Fix findbugs warnings in tez-dag

2015-01-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1900:
-
Assignee: (was: Hitesh Shah)

> Fix findbugs warnings in tez-dag
> 
>
> Key: TEZ-1900
> URL: https://issues.apache.org/jira/browse/TEZ-1900
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Might need to be split out more. 
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-dag.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277971#comment-14277971
 ] 

Siddharth Seth commented on TEZ-1945:
-

bq. SPILLED_RECORDS can be > 0 as it is accounted for in finalMerge (mem + 
disk). Will create a separate JIRA for post_merge_mem_limit.
So it's a merge not triggered by the fetchMemoryLimit but by the 
postMergeMemoryLimit. Should be accounted for somehow in the counters; will 
create a follow up jira.

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1960) finalMerge spills should be accounted for in some counter

2015-01-14 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1960:
---

 Summary: finalMerge spills should be accounted for in some counter
 Key: TEZ-1960
 URL: https://issues.apache.org/jira/browse/TEZ-1960
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277993#comment-14277993
 ] 

Hadoop QA commented on TEZ-1945:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692195/TEZ-1945.1.patch
  against master revision adcfb84.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 260 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/21//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/21//console

This message is automatically generated.

> Remove 2 GB memlimit restriction in MergeManager
> 
>
> Key: TEZ-1945
> URL: https://issues.apache.org/jira/browse/TEZ-1945
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1945.1.patch
>
>
> In certain situations (data coming in larger chunks, but yet to complete), 
> fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to 
> become available.  
> Removing the 2 GB resitrction on MergeManager.memlimit would help in such 
> situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1943) Move shared OutputCommiter to DAG

2015-01-14 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-1943:

Description: 
Currently, we will have one committer for each output of vertex even if it is a 
shared output in the case of Vertex Group. In this scenario, the method 
initialize and setupOutput of OutputCommiter of vertex group will been called 
multiple times. Although currently there is no issue on the current 
OutputCommitter impl, but this could cause potential issue for any customized 
OutputCommitter in the future.  
So this jira is for moving shared OutputCommiter to DAG and let DAG to control 
the share OutputCommitter.

  was:
Currently, we will have one committer for each output of vertex even if it is a 
shared output in the case of Vertex Group. In this scenario, the method 
initialize and setupOutput of OutputCommiter will been called multiple times. 
Although currently there is no issue on that, but this could cause potential 
issue for any customized OutputCommitter in the future.  
So this jira is for moving shared OutputCommiter to DAG and let DAG to control 
the share OutputCommitter.


> Move shared OutputCommiter to DAG
> -
>
> Key: TEZ-1943
> URL: https://issues.apache.org/jira/browse/TEZ-1943
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>
> Currently, we will have one committer for each output of vertex even if it is 
> a shared output in the case of Vertex Group. In this scenario, the method 
> initialize and setupOutput of OutputCommiter of vertex group will been called 
> multiple times. Although currently there is no issue on the current 
> OutputCommitter impl, but this could cause potential issue for any customized 
> OutputCommitter in the future.  
> So this jira is for moving shared OutputCommiter to DAG and let DAG to 
> control the share OutputCommitter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1905) Fix findbugs warnings in tez-tests

2015-01-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1905:
-
Attachment: TEZ-1905.1.patch

> Fix findbugs warnings in tez-tests
> --
>
> Key: TEZ-1905
> URL: https://issues.apache.org/jira/browse/TEZ-1905
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1905.1.patch
>
>
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1905) Fix findbugs warnings in tez-tests

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278000#comment-14278000
 ] 

Hitesh Shah commented on TEZ-1905:
--

[~sseth] [~rajesh.balamohan] review please

> Fix findbugs warnings in tez-tests
> --
>
> Key: TEZ-1905
> URL: https://issues.apache.org/jira/browse/TEZ-1905
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1905.1.patch
>
>
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278001#comment-14278001
 ] 

Hitesh Shah commented on TEZ-1951:
--

[~sseth] [~rajesh.balamohan] review please

> Fix general findbugs warnings in tez-dag
> 
>
> Key: TEZ-1951
> URL: https://issues.apache.org/jira/browse/TEZ-1951
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1951.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1961) Remove misleading exception "No running dag" from AM logs

2015-01-14 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1961:
---

 Summary: Remove misleading exception "No running dag" from AM logs
 Key: TEZ-1961
 URL: https://issues.apache.org/jira/browse/TEZ-1961
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth


{code}
15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 10.11.3.176:51879 Call#0 Retry#0
org.apache.tez.dag.api.TezException: No running dag at present
at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
at 
org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
CurrentState=Running
{code}

This exception shows up fairly often and isn't very relevant - queries before a 
DAG is submitted to the AM.
This is very misleading, especially for folks new to Tez, and should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1961) Remove misleading exception "No running dag" from AM logs

2015-01-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1961:

Description: 
{code}
15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from  Call#0 Retry#0
org.apache.tez.dag.api.TezException: No running dag at present
at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
at 
org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
CurrentState=Running
{code}

This exception shows up fairly often and isn't very relevant - queries before a 
DAG is submitted to the AM.
This is very misleading, especially for folks new to Tez, and should be removed.

  was:
{code}
15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 10.11.3.176:51879 Call#0 Retry#0
org.apache.tez.dag.api.TezException: No running dag at present
at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
at 
org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035)
15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: 
CurrentState=Running
{code}

This exception shows up fairly often and isn't very relevant - queries before a 
DAG is submitted to the AM.
This is very misleading, especially for folks new to Tez, and should be removed.


> Remove misleading exception "No running dag" from AM logs
> -
>
> Key: TEZ-1961
> URL: https://issues.apache.org/jira/browse/TEZ-1961
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>
> {code}
> 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
> from  Call#0 Retry#0
> org.apache.tez.dag.api.TezException: No running dag at present
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
>   at 
> org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
>   at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subj

[jira] [Assigned] (TEZ-1879) Create local UGI instances for each task and the AM, when running in LocalMode

2015-01-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned TEZ-1879:
---

Assignee: Siddharth Seth

> Create local UGI instances for each task and the AM, when running in LocalMode
> --
>
> Key: TEZ-1879
> URL: https://issues.apache.org/jira/browse/TEZ-1879
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>
> Modifying the client UGI can cause issues when the client tries to submit 
> another job - or has tokens already populated in the UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1879) Create local UGI instances for each task and the AM, when running in LocalMode

2015-01-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1879:

Attachment: TEZ-1879.1.txt

Fairly straighforward patch which requires Credentials to be moved around. UGI 
for the AM and Child are already being created explicitly, so didn't need to do 
much there. No new unit tests since this functionality is covered by the 
existing local mode and MiniCluster tests.

[~hitesh] - please review.

> Create local UGI instances for each task and the AM, when running in LocalMode
> --
>
> Key: TEZ-1879
> URL: https://issues.apache.org/jira/browse/TEZ-1879
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-1879.1.txt
>
>
> Modifying the client UGI can cause issues when the client tries to submit 
> another job - or has tokens already populated in the UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278033#comment-14278033
 ] 

Hadoop QA commented on TEZ-1951:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692389/TEZ-1951.1.patch
  against master revision adcfb84.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 74 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestAMRecovery

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/22//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/22//console

This message is automatically generated.

> Fix general findbugs warnings in tez-dag
> 
>
> Key: TEZ-1951
> URL: https://issues.apache.org/jira/browse/TEZ-1951
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1951.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1942:
-
 Target Version/s: 0.6.0, 0.5.4
Affects Version/s: 0.5.2

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278040#comment-14278040
 ] 

Hitesh Shah commented on TEZ-1942:
--

[~pramachandran] Looks like we need to add primary filters to the entities on 
every call to timeline as per the conversation on YARN-3062. Seems like a very 
lame solution but probably the only way to get the UI to work correctly against 
timeline data.

Would you like to take a crack at this?  

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1962) Running out of threads in tez local mode

2015-01-14 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created TEZ-1962:
---

 Summary: Running out of threads in tez local mode
 Key: TEZ-1962
 URL: https://issues.apache.org/jira/browse/TEZ-1962
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gunther Hagleitner
Priority: Critical


I've been trying to port the hive ut to tez local mode. However, local mode 
seems to leak threads which causes tests to crash after a while (oom). See 
attached stack trace - there are a lot of "TezChild" threads still hanging 
around.

([~sseth] as discussed offline)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1962) Running out of threads in tez local mode

2015-01-14 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated TEZ-1962:

Attachment: stack5.txt

> Running out of threads in tez local mode
> 
>
> Key: TEZ-1962
> URL: https://issues.apache.org/jira/browse/TEZ-1962
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Priority: Critical
> Attachments: stack5.txt
>
>
> I've been trying to port the hive ut to tez local mode. However, local mode 
> seems to leak threads which causes tests to crash after a while (oom). See 
> attached stack trace - there are a lot of "TezChild" threads still hanging 
> around.
> ([~sseth] as discussed offline)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1905) Fix findbugs warnings in tez-tests

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278055#comment-14278055
 ] 

Hadoop QA commented on TEZ-1905:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692395/TEZ-1905.1.patch
  against master revision adcfb84.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 254 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/23//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/23//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/23//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/23//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/23//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/23//console

This message is automatically generated.

> Fix findbugs warnings in tez-tests
> --
>
> Key: TEZ-1905
> URL: https://issues.apache.org/jira/browse/TEZ-1905
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1905.1.patch
>
>
> https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278069#comment-14278069
 ] 

Hitesh Shah commented on TEZ-1942:
--

An initial fix might be to fix VertexInit, VertexFinished and 
VertexParallelismUpdated events.

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints

2015-01-14 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278074#comment-14278074
 ] 

Jeff Zhang commented on TEZ-1069:
-

bq. My thinking was more along the lines for querying the VertexManager to 
allow it to modify the task specifications in such cases. Changing the resource 
is not enough. One would also need to change the java opts. For the latter, we 
would need to write a java opts parser if the user had specified their own java 
opts ( Xmx, etc ).
Agree, VM is the better place to do this kind of thing, and will update the 
java opts also.

bq. Isn't it better to setup hooks in case of OOM failures for a VertexManager 
to resize the task? Furthermore, a lot of OOM failures are due to data skew 
where one task is affected but the rest are not.
I think I would add one method to VM to get notification of its task attempt 
failure, and decide whether to resize task. The rough idea is to resize only 
the task with OOM task attempt failure, and when the number of task with OOM 
task attempt failure meet some threshold, resize the whole vertex. 

bq. Last question on when should this increase be done? Should it be done on 
each attempt failure or only on the last attempt?
If we identify the task attempt failed due to OOM, I think the next attempt 
will most likely still fail due to OOM.


> Support ability to re-size a task attempt when previous attempts fail due to 
> resource constraints
> -
>
> Key: TEZ-1069
> URL: https://issues.apache.org/jira/browse/TEZ-1069
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1069-1.patch
>
>
> Consider a case where attempts for the final stage in a long DAG fails due to 
> out of memory. In such a scenario, the framework  ( or via the base vertex 
> manager ) should be able to change the task specifications on the fly to 
> trigger a re-run with modified specs. 
> Changes could be both java opts changes as well as container resource 
> requirements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints

2015-01-14 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278083#comment-14278083
 ] 

Jeff Zhang commented on TEZ-1069:
-

After more thinking, it would change a lot on VM. So the next step I will first 
put all these stuff in vertex and move to VM if necessary later. 

> Support ability to re-size a task attempt when previous attempts fail due to 
> resource constraints
> -
>
> Key: TEZ-1069
> URL: https://issues.apache.org/jira/browse/TEZ-1069
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Hitesh Shah
>Assignee: Jeff Zhang
> Attachments: TEZ-1069-1.patch
>
>
> Consider a case where attempts for the final stage in a long DAG fails due to 
> out of memory. In such a scenario, the framework  ( or via the base vertex 
> manager ) should be able to change the task specifications on the fly to 
> trigger a re-run with modified specs. 
> Changes could be both java opts changes as well as container resource 
> requirements. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran reassigned TEZ-1942:
-

Assignee: Prakash Ramachandran

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Rajesh Balamohan
>Assignee: Prakash Ramachandran
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, 
> result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278089#comment-14278089
 ] 

Hitesh Shah commented on TEZ-1951:
--

[~zjffdu] Do you see anything in the changes that might make 
TestAMRecovery.testVertexCompletelyFinished_Broadcast flaky?

{code}
java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_Broadcast(TestAMRecovery.java:246)
{code}

> Fix general findbugs warnings in tez-dag
> 
>
> Key: TEZ-1951
> URL: https://issues.apache.org/jira/browse/TEZ-1951
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1951.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag

2015-01-14 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278098#comment-14278098
 ] 

Jeff Zhang commented on TEZ-1951:
-

[~hitesh] I saw this several days ago and created TEZ-1934, patch is available, 
please help review.

> Fix general findbugs warnings in tez-dag
> 
>
> Key: TEZ-1951
> URL: https://issues.apache.org/jira/browse/TEZ-1951
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1951.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1879) Create local UGI instances for each task and the AM, when running in LocalMode

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278108#comment-14278108
 ] 

Hadoop QA commented on TEZ-1879:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692400/TEZ-1879.1.txt
  against master revision 61bb0f8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 259 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/24//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/24//console

This message is automatically generated.

> Create local UGI instances for each task and the AM, when running in LocalMode
> --
>
> Key: TEZ-1879
> URL: https://issues.apache.org/jira/browse/TEZ-1879
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-1879.1.txt
>
>
> Modifying the client UGI can cause issues when the client tries to submit 
> another job - or has tokens already populated in the UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278121#comment-14278121
 ] 

Hitesh Shah commented on TEZ-1934:
--

Mostly looks good. Does the "onSourceTaskCompleted" function need to be 
synchronized - can it be called concurrently for diff tasks finishing? 

> TestAMRecovery may fail due to the execution order is not determined 
> -
>
> Key: TEZ-1934
> URL: https://issues.apache.org/jira/browse/TEZ-1934
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1934-1.patch
>
>
> task_1 is not guaranteed to been scheduled before task_0, so task_1 may 
> finished before task_0. While in the current TestAMRecovery, the finish of 
> task_1 is treated as the finished signal of vertex ( only 2 tasks in this 
> vertex) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278123#comment-14278123
 ] 

Hitesh Shah commented on TEZ-1934:
--

Triggered test patch for this jira. 

> TestAMRecovery may fail due to the execution order is not determined 
> -
>
> Key: TEZ-1934
> URL: https://issues.apache.org/jira/browse/TEZ-1934
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1934-1.patch
>
>
> task_1 is not guaranteed to been scheduled before task_0, so task_1 may 
> finished before task_0. While in the current TestAMRecovery, the finish of 
> task_1 is treated as the finished signal of vertex ( only 2 tasks in this 
> vertex) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278125#comment-14278125
 ] 

Hadoop QA commented on TEZ-1951:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692389/TEZ-1951.1.patch
  against master revision 61bb0f8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 74 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/25//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/25//console

This message is automatically generated.

> Fix general findbugs warnings in tez-dag
> 
>
> Key: TEZ-1951
> URL: https://issues.apache.org/jira/browse/TEZ-1951
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
> Attachments: TEZ-1951.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined

2015-01-14 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278128#comment-14278128
 ] 

Jeff Zhang commented on TEZ-1934:
-

onSourceTaskCompleted is only called in the main dispatcher thread. it should 
be fine without synchronized

> TestAMRecovery may fail due to the execution order is not determined 
> -
>
> Key: TEZ-1934
> URL: https://issues.apache.org/jira/browse/TEZ-1934
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1934-1.patch
>
>
> task_1 is not guaranteed to been scheduled before task_0, so task_1 may 
> finished before task_0. While in the current TestAMRecovery, the finish of 
> task_1 is treated as the finished signal of vertex ( only 2 tasks in this 
> vertex) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278131#comment-14278131
 ] 

Hitesh Shah edited comment on TEZ-1934 at 1/15/15 2:27 AM:
---

+1 (pending test-patch results).


was (Author: hitesh):
+1.

> TestAMRecovery may fail due to the execution order is not determined 
> -
>
> Key: TEZ-1934
> URL: https://issues.apache.org/jira/browse/TEZ-1934
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1934-1.patch
>
>
> task_1 is not guaranteed to been scheduled before task_0, so task_1 may 
> finished before task_0. While in the current TestAMRecovery, the finish of 
> task_1 is treated as the finished signal of vertex ( only 2 tasks in this 
> vertex) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined

2015-01-14 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278131#comment-14278131
 ] 

Hitesh Shah commented on TEZ-1934:
--

+1.

> TestAMRecovery may fail due to the execution order is not determined 
> -
>
> Key: TEZ-1934
> URL: https://issues.apache.org/jira/browse/TEZ-1934
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1934-1.patch
>
>
> task_1 is not guaranteed to been scheduled before task_0, so task_1 may 
> finished before task_0. While in the current TestAMRecovery, the finish of 
> task_1 is treated as the finished signal of vertex ( only 2 tasks in this 
> vertex) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1963) Fix post memory merge to be > 2 GB

2015-01-14 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1963:
-

 Summary: Fix post memory merge to be > 2 GB
 Key: TEZ-1963
 URL: https://issues.apache.org/jira/browse/TEZ-1963
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1963) Fix post memory merge to be > 2 GB

2015-01-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1963:
--
Attachment: TEZ-1963.1.patch

> Fix post memory merge to be > 2 GB
> --
>
> Key: TEZ-1963
> URL: https://issues.apache.org/jira/browse/TEZ-1963
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: TEZ-1963.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1963) Fix post memory merge to be > 2 GB

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278164#comment-14278164
 ] 

Hadoop QA commented on TEZ-1963:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12692421/TEZ-1963.1.patch
  against master revision 61bb0f8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 260 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.rm.TestContainerReuse

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/28//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/28//console

This message is automatically generated.

> Fix post memory merge to be > 2 GB
> --
>
> Key: TEZ-1963
> URL: https://issues.apache.org/jira/browse/TEZ-1963
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: TEZ-1963.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1962) Running out of threads in tez local mode

2015-01-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned TEZ-1962:
---

Assignee: Siddharth Seth

> Running out of threads in tez local mode
> 
>
> Key: TEZ-1962
> URL: https://issues.apache.org/jira/browse/TEZ-1962
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: stack5.txt
>
>
> I've been trying to port the hive ut to tez local mode. However, local mode 
> seems to leak threads which causes tests to crash after a while (oom). See 
> attached stack trace - there are a lot of "TezChild" threads still hanging 
> around.
> ([~sseth] as discussed offline)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-1964) `

2015-01-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-1964.
-
Resolution: Invalid

> `
> -
>
> Key: TEZ-1964
> URL: https://issues.apache.org/jira/browse/TEZ-1964
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1964) `

2015-01-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1964:

Issue Type: Bug  (was: Sub-task)
Parent: (was: TEZ-1962)

> `
> -
>
> Key: TEZ-1964
> URL: https://issues.apache.org/jira/browse/TEZ-1964
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Siddharth Seth
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1964) `

2015-01-14 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1964:
---

 Summary: `
 Key: TEZ-1964
 URL: https://issues.apache.org/jira/browse/TEZ-1964
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1962) Running out of threads in tez local mode

2015-01-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1962:

Target Version/s: 0.7.0

> Running out of threads in tez local mode
> 
>
> Key: TEZ-1962
> URL: https://issues.apache.org/jira/browse/TEZ-1962
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: stack5.txt
>
>
> I've been trying to port the hive ut to tez local mode. However, local mode 
> seems to leak threads which causes tests to crash after a while (oom). See 
> attached stack trace - there are a lot of "TezChild" threads still hanging 
> around.
> ([~sseth] as discussed offline)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278182#comment-14278182
 ] 

Hadoop QA commented on TEZ-1934:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12691636/TEZ-1934-1.patch
  against master revision 61bb0f8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 260 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/26//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/26//console

This message is automatically generated.

> TestAMRecovery may fail due to the execution order is not determined 
> -
>
> Key: TEZ-1934
> URL: https://issues.apache.org/jira/browse/TEZ-1934
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1934-1.patch
>
>
> task_1 is not guaranteed to been scheduled before task_0, so task_1 may 
> finished before task_0. While in the current TestAMRecovery, the finish of 
> task_1 is treated as the finished signal of vertex ( only 2 tasks in this 
> vertex) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1962) Running out of threads in tez local mode

2015-01-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1962:

Issue Type: Sub-task  (was: Bug)
Parent: TEZ-1876

> Running out of threads in tez local mode
> 
>
> Key: TEZ-1962
> URL: https://issues.apache.org/jira/browse/TEZ-1962
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: stack5.txt
>
>
> I've been trying to port the hive ut to tez local mode. However, local mode 
> seems to leak threads which causes tests to crash after a while (oom). See 
> attached stack trace - there are a lot of "TezChild" threads still hanging 
> around.
> ([~sseth] as discussed offline)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined

2015-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278185#comment-14278185
 ] 

Hadoop QA commented on TEZ-1934:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12691636/TEZ-1934-1.patch
  against master revision 61bb0f8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 179 javac 
compiler warnings (more than the master's current 171 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:red}-1 findbugs{color}.  The patch appears to introduce 260 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/27//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/27//console

This message is automatically generated.

> TestAMRecovery may fail due to the execution order is not determined 
> -
>
> Key: TEZ-1934
> URL: https://issues.apache.org/jira/browse/TEZ-1934
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1934-1.patch
>
>
> task_1 is not guaranteed to been scheduled before task_0, so task_1 may 
> finished before task_0. While in the current TestAMRecovery, the finish of 
> task_1 is treated as the finished signal of vertex ( only 2 tasks in this 
> vertex) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined

2015-01-14 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278194#comment-14278194
 ] 

Jeff Zhang commented on TEZ-1934:
-

[~hitesh], New javac compiler warnings and javadoc warnings are generated, 
where can I see these warnings ? The link 
https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/diffJavadocWarnings.txt
  looks broken.

> TestAMRecovery may fail due to the execution order is not determined 
> -
>
> Key: TEZ-1934
> URL: https://issues.apache.org/jira/browse/TEZ-1934
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Attachments: TEZ-1934-1.patch
>
>
> task_1 is not guaranteed to been scheduled before task_0, so task_1 may 
> finished before task_0. While in the current TestAMRecovery, the finish of 
> task_1 is treated as the finished signal of vertex ( only 2 tasks in this 
> vertex) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading

2015-01-14 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1942:
--
Attachment: TEZ-1942.1.patch

[~hitesh] review please

> Number of tasks show in Tez UI with auto-reduce parallelism is misleading
> -
>
> Key: TEZ-1942
> URL: https://issues.apache.org/jira/browse/TEZ-1942
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Rajesh Balamohan
>Assignee: Prakash Ramachandran
> Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot 
> 2015-01-14 at 9.18.54 AM.png, TEZ-1942.1.patch, output.json, 
> result_with_direct_vertex.png, result_with_primary_filter.png
>
>
> Ran a simple hive query (with tez) and "--hiveconf 
> hive.tez.auto.reducer.parallelism=true" .  This internally turns on tez's 
> auto reduce parallelism.  
> - Job started off with 1009 reduce tasks
> - Tez reduces the number of reducers to 253
> - Job completes successfully, but TEZ UI shows 1009 as the number of reducers 
> (and 253 tasks as successful tasks).  This can be a little misleading.
> I will attach the screenshots soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >