[jira] [Created] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
Rajesh Balamohan created TEZ-1945: - Summary: Remove 2 GB memlimit restriction in MergeManager Key: TEZ-1945 URL: https://issues.apache.org/jira/browse/TEZ-1945 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan In certain situations (data coming in larger chunks, but yet to complete), fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to become available. Removing the 2 GB resitrction on MergeManager.memlimit would help in such situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1945: -- Attachment: TEZ-1945.1.patch Attaching initial patch to remove 2 GB check. There is a corner case wherein, this can break Intermediate mem-to-mem merging as it relies InMemoryWriter and InMemoryWriter is currently bound by 2 GB limit. One option could be to do intemediate mem-to-mem merging "only" till 2 GB in createInMemorySegments() for processing. > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned TEZ-1945: - Assignee: Rajesh Balamohan > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1946) Show all counters in the counter selector UI.
Sreenath Somarajapuram created TEZ-1946: --- Summary: Show all counters in the counter selector UI. Key: TEZ-1946 URL: https://issues.apache.org/jira/browse/TEZ-1946 Project: Apache Tez Issue Type: Task Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1947) Failing fast when DAG configs have wrong values can save cluster resources
Rajesh Balamohan created TEZ-1947: - Summary: Failing fast when DAG configs have wrong values can save cluster resources Key: TEZ-1947 URL: https://issues.apache.org/jira/browse/TEZ-1947 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan It would be beneficial to do certain config checks upfront rather having fail later in the downstream. For e.g, in the following example the DAG failed after 400+ seconds for some config issue. {code} Status: Running (Executing on YARN cluster with App id application_1421164610335_0060) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. KILLED2511700 81 0 81 Reducer 2 FAILED 1009 00 1009 231008 VERTICES: 00/02 [===>>---] 13% ELAPSED TIME: 449.01 s Status: Failed Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, mergeThreshold: 148668720 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, mergeThreshold: 148668720 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1947) Failing fast when DAG configs have wrong values can save cluster resources
[ https://issues.apache.org/jira/browse/TEZ-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1947: -- Description: It would be beneficial to do certain config checks (whereever possible) upfront rather having fail later in the downstream. For e.g, in the following example the DAG failed after 400+ seconds for some config issue. {code} Status: Running (Executing on YARN cluster with App id application_1421164610335_0060) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. KILLED2511700 81 0 81 Reducer 2 FAILED 1009 00 1009 231008 VERTICES: 00/02 [===>>---] 13% ELAPSED TIME: 449.01 s Status: Failed Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, mergeThreshold: 148668720 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, mergeThreshold: 148668720 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) {code} was: It would be beneficial to do certain config checks upfront rather having fail later in the downstream. For e.g, in the following example the DAG failed after 400+ seconds for some config issue. {code} Status: Running (Executing on YARN cluster with App id application_1421164610335_0060) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. KILLED2511700 81 0 81 Reducer 2 FAILED 1009 00 1009 231008 VERTICES: 00/02 [===>>---] 13% ELAPSED TIME: 449.01 s Status: Failed Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleL
[jira] [Updated] (TEZ-1947) Failing fast when DAG configs have wrong values can save cluster resources
[ https://issues.apache.org/jira/browse/TEZ-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1947: -- Description: It would be beneficial to do certain config checks (wherever possible) upfront rather having fail later in the downstream. For e.g, in the following example the DAG failed after 400+ seconds for some config issue. {code} Status: Running (Executing on YARN cluster with App id application_1421164610335_0060) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. KILLED2511700 81 0 81 Reducer 2 FAILED 1009 00 1009 231008 VERTICES: 00/02 [===>>---] 13% ELAPSED TIME: 449.01 s Status: Failed Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, mergeThreshold: 148668720 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, mergeThreshold: 148668720 at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206) at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) {code} was: It would be beneficial to do certain config checks (whereever possible) upfront rather having fail later in the downstream. For e.g, in the following example the DAG failed after 400+ seconds for some config issue. {code} Status: Running (Executing on YARN cluster with App id application_1421164610335_0060) VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED Map 1 .. KILLED2511700 81 0 81 Reducer 2 FAILED 1009 00 1009 231008 VERTICES: 00/02 [===>>---] 13% ELAPSED TIME: 449.01 s Status: Failed Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: Invlaid configuratio
[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints
[ https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276901#comment-14276901 ] Jeff Zhang commented on TEZ-1069: - Yes, [~hitesh] I have a initial patch that can works. Here's the main flow * Identify whether the TaskAttempt is failed due OOM. 2 ways: ** ContainerExitStatus ** TaskAttemptCompleteEvent through heartbeat ( OOM exception may be caught and passed through heartbeat ) * Remember how many times of OOM failed task attempts for each task, and calculate the max value of this vertex. * Update the Resource of vertex and all its tasks based on the max OOM failed task attempts : pow(1+increase_percent_per_OOM_failed_attempt, max_failed_attempt) For the task attempt that is in the START_WAIT ( being scheduled by TaskSchedulerService), I didn't change it now. This may be the most complicated part if required. > Support ability to re-size a task attempt when previous attempts fail due to > resource constraints > - > > Key: TEZ-1069 > URL: https://issues.apache.org/jira/browse/TEZ-1069 > Project: Apache Tez > Issue Type: Improvement >Reporter: Hitesh Shah >Assignee: Jeff Zhang > > Consider a case where attempts for the final stage in a long DAG fails due to > out of memory. In such a scenario, the framework ( or via the base vertex > manager ) should be able to change the task specifications on the fly to > trigger a re-run with modified specs. > Changes could be both java opts changes as well as container resource > requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints
[ https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-1069: Attachment: TEZ-1069-1.patch > Support ability to re-size a task attempt when previous attempts fail due to > resource constraints > - > > Key: TEZ-1069 > URL: https://issues.apache.org/jira/browse/TEZ-1069 > Project: Apache Tez > Issue Type: Improvement >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: TEZ-1069-1.patch > > > Consider a case where attempts for the final stage in a long DAG fails due to > out of memory. In such a scenario, the framework ( or via the base vertex > manager ) should be able to change the task specifications on the fly to > trigger a re-run with modified specs. > Changes could be both java opts changes as well as container resource > requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints
[ https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276928#comment-14276928 ] Hadoop QA commented on TEZ-1069: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692217/TEZ-1069-1.patch against master revision 6bc500f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 260 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/18//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/18//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/18//console This message is automatically generated. > Support ability to re-size a task attempt when previous attempts fail due to > resource constraints > - > > Key: TEZ-1069 > URL: https://issues.apache.org/jira/browse/TEZ-1069 > Project: Apache Tez > Issue Type: Improvement >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: TEZ-1069-1.patch > > > Consider a case where attempts for the final stage in a long DAG fails due to > out of memory. In such a scenario, the framework ( or via the base vertex > manager ) should be able to change the task specifications on the fly to > trigger a re-run with modified specs. > Changes could be both java opts changes as well as container resource > requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-1942: -- Attachment: result_with_primary_filter.png result_with_direct_vertex.png did some analysis. looks like the results returned is changing based on the query parameters. when queried for "get all vertexes for this dag" it returns 1009 (numTasks in the screenshot) {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1 {code} when queried for "get for a particular dag" it returns 253 {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276929#comment-14276929 ] Prakash Ramachandran edited comment on TEZ-1942 at 1/14/15 2:18 PM: did some analysis. looks like the results returned is changing based on the query parameters. when queried for "get all vertexes for this dag" it returns 1009 (numTasks in the screenshot) see screenshot result_with_primary_filter.png {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1 {code} when queried for "get for a particular dag" it returns 253 see screenshot result_with_direct_vertex.png {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} was (Author: pramachandran): did some analysis. looks like the results returned is changing based on the query parameters. when queried for "get all vertexes for this dag" it returns 1009 (numTasks in the screenshot) {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1 {code} when queried for "get for a particular dag" it returns 253 {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276929#comment-14276929 ] Prakash Ramachandran edited comment on TEZ-1942 at 1/14/15 2:21 PM: did some analysis. looks like the results returned is changing based on the query parameters. when queried for "get all vertexes for this dag" it returns 1009 (numTasks in the screenshot) see screenshot result_with_primary_filter.png {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1 {code} when queried for "get for a particular vertex" it returns 253 see screenshot result_with_direct_vertex.png {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} was (Author: pramachandran): did some analysis. looks like the results returned is changing based on the query parameters. when queried for "get all vertexes for this dag" it returns 1009 (numTasks in the screenshot) see screenshot result_with_primary_filter.png {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11&primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1 {code} when queried for "get for a particular dag" it returns 253 see screenshot result_with_direct_vertex.png {code} http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277003#comment-14277003 ] Prakash Ramachandran commented on TEZ-1890: --- +1 patch looks good. tried with extracting from the war file. > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1890-v1.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277132#comment-14277132 ] Rajesh Balamohan commented on TEZ-1945: --- *Job:* 1. 10 TB scale 2. Hive query with tez "create table testData as select * from lineitem distribute by l_shipdate;" Saw 5-7% improvement in runtime of job. Counter details are given below, which shows good reduction in resource usages during shuffle (e.g NUM_MEM_TO_DISK_MERGES, ADDITIONAL_SPILLS_BYTES_WRITTEN, SPILLED_RECORDS) Counter details TaskCounter_Reducer_2_INPUT_Map_1 ||counter||4 GB, tez.runtime.shuffle.fetch.buffer.percent=0.5, tez.runtime.shuffle.merge.percent=0.5,application_1421164610335_0059 ||4 GB, tez.runtime.shuffle.fetch.buffer.percent=0.8, tez.runtime.shuffle.merge.percent=0.8,application_1421164610335_0064||8 GB, tez.runtime.shuffle.memory.limit.percent=0.1, tez.runtime.shuffle.fetch.buffer.percent=0.14,application_1421164610335_0063||8 GB, tez.runtime.shuffle.memory.limit.percent=0.2, tez.runtime.shuffle.fetch.buffer.percent=0.5,application_1421164610335_0058|| |ADDITIONAL_SPILLS_BYTES_READ|200812472683|125413261965|331929593129|31373505945| |ADDITIONAL_SPILLS_BYTES_WRITTEN|181649974257|106277188725|312660112747|12149251314| |COMBINE_INPUT_RECORDS|0|0|0|0| |FIRST_EVENT_RECEIVED|10292|12048|7404|6012| |LAST_EVENT_RECEIVED|31296182|28215975|10513984|7342057| |MERGED_MAP_OUTPUTS|244976|244976|244976|244976| |MERGE_PHASE_TIME|39177076|36337714|15940783|11425071| |NUM_DISK_TO_DISK_MERGES|0|0|0|0| |NUM_FAILED_SHUFFLE_INPUTS|0|0|0|0| |NUM_MEM_TO_DISK_MERGES|491|3|4537|0| |NUM_SHUFFLED_INPUTS|244976|244976|244976|244976| |NUM_SKIPPED_INPUTS|8283|8283|8283|8283| |REDUCE_INPUT_GROUPS|0|0|0|0| |REDUCE_INPUT_RECORDS|589709|589709|589709|589709| |SHUFFLE_BYTES|365219732545|365204956818|365241417228|365215810254| |SHUFFLE_BYTES_DECOMPRESSED|801646699974|801646699974|801646699974|801646699974| |SHUFFLE_BYTES_DISK_DIRECT|19162498426|19136073240|19269480382|19224254631| |SHUFFLE_BYTES_TO_DISK|0|0|0|0| |SHUFFLE_BYTES_TO_MEM|346057234119|346068883578|345971936846|345991555623| |SHUFFLE_PHASE_TIME|38339256|34248855|15332154|11018423| |SPILLED_RECORDS|3272861488|2042317909|5452541545|511585624| *Merge memory details for the above runs (applicationIds for reference)* 4 GB container Runs: application_1421164610335_0059: MergerManager: memoryLimit=1564475392, maxSingleShuffleLimit=391118848, mergeThreshold=782237696, ioSortFactor=200, memToMemMergeOutputsThreshold=200 application_1421164610335_0064: memoryLimit=2296271339, maxSingleShuffleLimit=574067840, mergeThreshold=1837017088, ioSortFactor=200, memToMemMergeOutputsThreshold=200 8 GB container Runs: application_1421164610335_0058: MergerManager: memoryLimit=4437280030, maxSingleShuffleLimit=1109320064, mergeThreshold=3993552128, ioSortFactor=200, memToMemMergeOutputsThreshold=200 application_1421164610335_0058: MergerManager: memoryLimit=891079872, maxSingleShuffleLimit=89107992, mergeThreshold=139008464, ioSortFactor=200, memToMemMergeOutputsThreshold=200 > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277138#comment-14277138 ] Jonathan Eagles commented on TEZ-1890: -- [~Sreenath], [~pramachandran], any concerns with Ambari integration over this change? > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1890-v1.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints
[ https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277179#comment-14277179 ] Hitesh Shah commented on TEZ-1069: -- I am not sure if that is the approach I would have taken. My thinking was more along the lines for querying the VertexManager to allow it to modify the task specifications in such cases. Changing the resource is not enough. One would also need to change the java opts. For the latter, we would need to write a java opts parser. Isn't it better to setup hooks in case of OOM failures for a VertexManager to resize the task? Furthermore, a lot of OOM failures are due to data skew where one task is affected but the rest are not. Last question on when should this increase be done? Should it be done on each attempt failure or only on the last attempt? > Support ability to re-size a task attempt when previous attempts fail due to > resource constraints > - > > Key: TEZ-1069 > URL: https://issues.apache.org/jira/browse/TEZ-1069 > Project: Apache Tez > Issue Type: Improvement >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: TEZ-1069-1.patch > > > Consider a case where attempts for the final stage in a long DAG fails due to > out of memory. In such a scenario, the framework ( or via the base vertex > manager ) should be able to change the task specifications on the fly to > trigger a re-run with modified specs. > Changes could be both java opts changes as well as container resource > requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints
[ https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277179#comment-14277179 ] Hitesh Shah edited comment on TEZ-1069 at 1/14/15 4:34 PM: --- I am not sure if that is the approach I would have taken. My thinking was more along the lines for querying the VertexManager to allow it to modify the task specifications in such cases. Changing the resource is not enough. One would also need to change the java opts. For the latter, we would need to write a java opts parser if the user had specified their own java opts ( Xmx, etc ). Isn't it better to setup hooks in case of OOM failures for a VertexManager to resize the task? Furthermore, a lot of OOM failures are due to data skew where one task is affected but the rest are not. Last question on when should this increase be done? Should it be done on each attempt failure or only on the last attempt? was (Author: hitesh): I am not sure if that is the approach I would have taken. My thinking was more along the lines for querying the VertexManager to allow it to modify the task specifications in such cases. Changing the resource is not enough. One would also need to change the java opts. For the latter, we would need to write a java opts parser. Isn't it better to setup hooks in case of OOM failures for a VertexManager to resize the task? Furthermore, a lot of OOM failures are due to data skew where one task is affected but the rest are not. Last question on when should this increase be done? Should it be done on each attempt failure or only on the last attempt? > Support ability to re-size a task attempt when previous attempts fail due to > resource constraints > - > > Key: TEZ-1069 > URL: https://issues.apache.org/jira/browse/TEZ-1069 > Project: Apache Tez > Issue Type: Improvement >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: TEZ-1069-1.patch > > > Consider a case where attempts for the final stage in a long DAG fails due to > out of memory. In such a scenario, the framework ( or via the base vertex > manager ) should be able to change the task specifications on the fly to > trigger a re-run with modified specs. > Changes could be both java opts changes as well as container resource > requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277186#comment-14277186 ] Hitesh Shah commented on TEZ-1890: -- Do any of the docs published need to change? > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1890-v1.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1947) Failing fast when DAG configs have wrong values can save cluster resources
[ https://issues.apache.org/jira/browse/TEZ-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277201#comment-14277201 ] Hitesh Shah commented on TEZ-1947: -- Code also has a typo that could be fixed: "Invlaid configuration:" MR had a notion of checking job specifications before anything is run. This was done on the client as part of submission. We could probably do something similar but this will affect all runtime library components. Also, the question is whether to run this on the client or in the AM? The AM need not have all the necessary jars to instantiate all custom objects. > Failing fast when DAG configs have wrong values can save cluster resources > -- > > Key: TEZ-1947 > URL: https://issues.apache.org/jira/browse/TEZ-1947 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > > It would be beneficial to do certain config checks (wherever possible) > upfront rather having fail later in the downstream. For e.g, in the > following example the DAG failed after 400+ seconds for some config issue. > {code} > Status: Running (Executing on YARN cluster with App id > application_1421164610335_0060) > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED > KILLED > > Map 1 .. KILLED2511700 81 0 > 81 > Reducer 2 FAILED 1009 00 1009 23 > 1008 > > VERTICES: 00/02 [===>>---] 13% ELAPSED TIME: 449.01 s > > Status: Failed > Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, > diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_04, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit > should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, > mergeThreshold: 148668720 > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > ], TaskAttempt 1 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit > should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, > mergeThreshold: 148668720 > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.(MergeManager.java:260) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.(Shuffle.java:206) > at > org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277203#comment-14277203 ] Hitesh Shah commented on TEZ-1942: -- [~pramachandran] Does a YARN jira need to be filed for this timeline issue? > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277225#comment-14277225 ] Jonathan Eagles commented on TEZ-1890: -- Good catch, [~hitesh]. Will post a patch soon. > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1890-v1.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-1890: - Attachment: TEZ-1890-v2.patch Addressed site doc with v2 patch. [~hitesh], [~pramachandran], [~Sreenath], please review. > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277286#comment-14277286 ] Sreenath Somarajapuram commented on TEZ-1890: - src/assembly directory can also be removed, without maven-assembly-plugin it serves no purpose. > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277315#comment-14277315 ] Sreenath Somarajapuram commented on TEZ-1890: - Sorry, my bad. Saw your git rm comment just now. +1 The changes wont affect Ambari view, moreover Ambari view pom looks for 0.6.0-SNAPSHOT. > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277322#comment-14277322 ] Hadoop QA commented on TEZ-1890: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692262/TEZ-1890-v2.patch against master revision 6bc500f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 260 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/19//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/19//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/19//console This message is automatically generated. > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277331#comment-14277331 ] Prakash Ramachandran commented on TEZ-1942: --- raised YARN-2444 for the same. > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277331#comment-14277331 ] Prakash Ramachandran edited comment on TEZ-1942 at 1/14/15 6:00 PM: raised YARN-3062 for the same. was (Author: pramachandran): raised YARN-2444 for the same. > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1948) mergedMapOutput counters can be incorrect in case of on disk merges
Siddharth Seth created TEZ-1948: --- Summary: mergedMapOutput counters can be incorrect in case of on disk merges Key: TEZ-1948 URL: https://issues.apache.org/jira/browse/TEZ-1948 Project: Apache Tez Issue Type: Bug Reporter: Siddharth Seth In TezMerger {code} Constants.MERGED_OUTPUT_PREFIX) ? null : mergedMapOutputsCounter))); {code} The MergeManager now writes out merge files with an id at the end, which can cause this counter to be incorrect. There's another jira to move the merge file name generation into TezOutputFiles. This may be as simple as moving the id appended to the merged files before the suffix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277456#comment-14277456 ] Siddharth Seth commented on TEZ-1945: - +1. Patch looks good to me. Thanks [~rajesh.balamohan]. Does the NUM_MEM_TO_DISK_MERGES counter seem incorrect ? I'd expect SPILLED_RECORDS to be 0 in the cases where NUM_MEM_TO_DISK_MERGES, NUM_DISK_TO_DISK_MERGES and SHUFFLE_BYTES_TO_DISK are 0. Could be because of SHUFFLE_BYTES_DISK_DIRECT. > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277456#comment-14277456 ] Siddharth Seth edited comment on TEZ-1945 at 1/14/15 7:03 PM: -- +1. Patch looks good to me. Thanks [~rajesh.balamohan]. Does the NUM_MEM_TO_DISK_MERGES counter seem incorrect ? I'd expect SPILLED_RECORDS to be 0 in the cases where NUM_MEM_TO_DISK_MERGES, NUM_DISK_TO_DISK_MERGES and SHUFFLE_BYTES_TO_DISK are 0. Could be because of SHUFFLE_BYTES_DISK_DIRECT. Also, adjusting POST_MERGE may make the runs faster. was (Author: sseth): +1. Patch looks good to me. Thanks [~rajesh.balamohan]. Does the NUM_MEM_TO_DISK_MERGES counter seem incorrect ? I'd expect SPILLED_RECORDS to be 0 in the cases where NUM_MEM_TO_DISK_MERGES, NUM_DISK_TO_DISK_MERGES and SHUFFLE_BYTES_TO_DISK are 0. Could be because of SHUFFLE_BYTES_DISK_DIRECT. > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277464#comment-14277464 ] Jonathan Eagles commented on TEZ-1890: -- Thanks, everybody for the feedback. Committed this to master and branch-0.6 > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Priority: Blocker > Fix For: 0.6.0 > > Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-1890) tez-ui web.tar.gz also being uploaded to maven repository
[ https://issues.apache.org/jira/browse/TEZ-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reassigned TEZ-1890: Assignee: Jonathan Eagles > tez-ui web.tar.gz also being uploaded to maven repository > - > > Key: TEZ-1890 > URL: https://issues.apache.org/jira/browse/TEZ-1890 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Jonathan Eagles >Priority: Blocker > Fix For: 0.6.0 > > Attachments: TEZ-1890-v1.patch, TEZ-1890-v2.patch > > > Not sure if we should be uploading the web tar.gz as part of maven deploy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277482#comment-14277482 ] Siddharth Seth commented on TEZ-1945: - Ideally, we should be changing the POST_MERGE_MEM_LIMIT to be a long as well. Separate jira ? > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1949) Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges
[ https://issues.apache.org/jira/browse/TEZ-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-1949: - Affects Version/s: 0.7.0 > Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges > --- > > Key: TEZ-1949 > URL: https://issues.apache.org/jira/browse/TEZ-1949 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0, 0.5.2, 0.7.0 >Reporter: Gopal V >Assignee: Gopal V > > Tez configuration whitelisting is missing TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH > for broadcast edges (UnorderedKVInput). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1949) Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges
Gopal V created TEZ-1949: Summary: Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges Key: TEZ-1949 URL: https://issues.apache.org/jira/browse/TEZ-1949 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.2, 0.6.0 Reporter: Gopal V Assignee: Gopal V Tez configuration whitelisting is missing TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges (UnorderedKVInput). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1949) Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges
[ https://issues.apache.org/jira/browse/TEZ-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-1949: - Attachment: TEZ-1949.1.patch > Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges > --- > > Key: TEZ-1949 > URL: https://issues.apache.org/jira/browse/TEZ-1949 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0, 0.5.2, 0.7.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: TEZ-1949.1.patch > > > Tez configuration whitelisting is missing TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH > for broadcast edges (UnorderedKVInput). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1949) Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges
[ https://issues.apache.org/jira/browse/TEZ-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277791#comment-14277791 ] Hadoop QA commented on TEZ-1949: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692335/TEZ-1949.1.patch against master revision adcfb84. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 260 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/20//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/20//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/20//console This message is automatically generated. > Whitelist TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH for broadcast edges > --- > > Key: TEZ-1949 > URL: https://issues.apache.org/jira/browse/TEZ-1949 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0, 0.5.2, 0.7.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: TEZ-1949.1.patch > > > Tez configuration whitelisting is missing TEZ_RUNTIME_OPTIMIZE_SHARED_FETCH > for broadcast edges (UnorderedKVInput). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277905#comment-14277905 ] Rajesh Balamohan commented on TEZ-1945: --- SPILLED_RECORDS can be > 0 as it is accounted for in finalMerge (mem + disk). Will create a separate JIRA for post_merge_mem_limit. > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1950) Incorrect handling of counters in TaskAttemptImpl
Hitesh Shah created TEZ-1950: Summary: Incorrect handling of counters in TaskAttemptImpl Key: TEZ-1950 URL: https://issues.apache.org/jira/browse/TEZ-1950 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah To maintain task attempt counters, we are using TaskAttempt.TaskAttemptStatus.counters Now, counters is not accessed in a thread safe manner. Counters are updated in either StatusUpdaterTransition or modified as part of TaskAttempt.TaskAttemptStatus::setLocalityCounter(). In a scenario, where TaskAttempt::getCounters() is called before any status update transition comes back, the locality counter will get lost because the atomic boolean flag is never reset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1950) Incorrect handling of locality counter in TaskAttemptImpl
[ https://issues.apache.org/jira/browse/TEZ-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1950: - Summary: Incorrect handling of locality counter in TaskAttemptImpl (was: Incorrect handling of counters in TaskAttemptImpl) > Incorrect handling of locality counter in TaskAttemptImpl > - > > Key: TEZ-1950 > URL: https://issues.apache.org/jira/browse/TEZ-1950 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > > To maintain task attempt counters, we are using > TaskAttempt.TaskAttemptStatus.counters > Now, counters is not accessed in a thread safe manner. > Counters are updated in either StatusUpdaterTransition or modified as part of > TaskAttempt.TaskAttemptStatus::setLocalityCounter(). > In a scenario, where TaskAttempt::getCounters() is called before any status > update transition comes back, the locality counter will get lost because the > atomic boolean flag is never reset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1945: -- Target Version/s: 0.7.0 Fix Version/s: (was: 0.7.0) > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1950) Incorrect handling of locality counter in TaskAttemptImpl
[ https://issues.apache.org/jira/browse/TEZ-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277919#comment-14277919 ] Hitesh Shah commented on TEZ-1950: -- [~rajesh.balamohan] [~sseth] Does the above analysis seem correct? > Incorrect handling of locality counter in TaskAttemptImpl > - > > Key: TEZ-1950 > URL: https://issues.apache.org/jira/browse/TEZ-1950 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > > To maintain task attempt counters, we are using > TaskAttempt.TaskAttemptStatus.counters > Now, counters is not accessed in a thread safe manner. > Counters are updated in either StatusUpdaterTransition or modified as part of > TaskAttempt.TaskAttemptStatus::setLocalityCounter(). > In a scenario, where TaskAttempt::getCounters() is called before any status > update transition comes back, the locality counter will get lost because the > atomic boolean flag is never reset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1900) Fix findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1900: - Issue Type: Bug (was: Sub-task) Parent: (was: TEZ-316) > Fix findbugs warnings in tez-dag > > > Key: TEZ-1900 > URL: https://issues.apache.org/jira/browse/TEZ-1900 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > > Might need to be split out more. > https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-dag.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1951) Fix general findbugs warnings in tez-dag
Hitesh Shah created TEZ-1951: Summary: Fix general findbugs warnings in tez-dag Key: TEZ-1951 URL: https://issues.apache.org/jira/browse/TEZ-1951 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1953) Inconsistent synchronization of org.apache.tez.dag.app.dag.impl.VertexImpl.groupInputSpecList
Hitesh Shah created TEZ-1953: Summary: Inconsistent synchronization of org.apache.tez.dag.app.dag.impl.VertexImpl.groupInputSpecList Key: TEZ-1953 URL: https://issues.apache.org/jira/browse/TEZ-1953 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Inconsistent synchronization of org.apache.tez.dag.app.dag.impl.VertexImpl.groupInputSpecList; locked 50% of time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1952) Inconsistent synchronization of org.apache.tez.dag.app.dag.impl.Edge.edgeProperty
Hitesh Shah created TEZ-1952: Summary: Inconsistent synchronization of org.apache.tez.dag.app.dag.impl.Edge.edgeProperty Key: TEZ-1952 URL: https://issues.apache.org/jira/browse/TEZ-1952 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Inconsistent synchronization of org.apache.tez.dag.app.dag.impl.Edge.edgeProperty; locked 78% of time In class org.apache.tez.dag.app.dag.impl.Edge Field org.apache.tez.dag.app.dag.impl.Edge.edgeProperty Synchronized 78% of the time Unsynchronized access at Edge.java:[line 212] Unsynchronized access at Edge.java:[line 184] Unsynchronized access at Edge.java:[line 226] Synchronized access at Edge.java:[line 117] Synchronized access at Edge.java:[line 131] Synchronized access at Edge.java:[line 144] Synchronized access at Edge.java:[line 133] Synchronized access at Edge.java:[line 134] Synchronized access at Edge.java:[line 137] Synchronized access at Edge.java:[line 167] Synchronized access at Edge.java:[line 167] Synchronized access at Edge.java:[line 167] Synchronized access at Edge.java:[line 167] Synchronized access at Edge.java:[line 173] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1954) Multiple instances of Inconsistent synchronization in org.apache.tez.dag.app.DAGAppMaster.
Hitesh Shah created TEZ-1954: Summary: Multiple instances of Inconsistent synchronization in org.apache.tez.dag.app.DAGAppMaster. Key: TEZ-1954 URL: https://issues.apache.org/jira/browse/TEZ-1954 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.amTokens; locked 50% of time Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.appMasterUgi; locked 66% of time Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.context; locked 65% of time Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.currentDAG; locked 72% of time Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.state; locked 80% of time Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.taskSchedulerEventHandler; locked 78% of time Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.versionMismatch; locked 83% of time Inconsistent synchronization of org.apache.tez.dag.app.DAGAppMaster.versionMismatchDiagnostics; locked 80% of time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1955) Inconsistent synchronization of org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.taskScheduler
Hitesh Shah created TEZ-1955: Summary: Inconsistent synchronization of org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.taskScheduler Key: TEZ-1955 URL: https://issues.apache.org/jira/browse/TEZ-1955 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Inconsistent synchronization of org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.taskScheduler; locked 47% of time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1956) Multiple instances: Inconsistent synchronization of org.apache.tez.dag.app.rm.YarnTaskSchedulerService
Hitesh Shah created TEZ-1956: Summary: Multiple instances: Inconsistent synchronization of org.apache.tez.dag.app.rm.YarnTaskSchedulerService Key: TEZ-1956 URL: https://issues.apache.org/jira/browse/TEZ-1956 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Inconsistent synchronization of org.apache.tez.dag.app.rm.YarnTaskSchedulerService.delayedContainerManager; locked 80% of time Inconsistent synchronization of org.apache.tez.dag.app.rm.YarnTaskSchedulerService.heartbeatAtLastPreemption; locked 66% of time Inconsistent synchronization of org.apache.tez.dag.app.rm.YarnTaskSchedulerService.localitySchedulingDelay; locked 91% of time Inconsistent synchronization of org.apache.tez.dag.app.rm.YarnTaskSchedulerService.preemptionPercentage; locked 85% of time Inconsistent synchronization of org.apache.tez.dag.app.rm.YarnTaskSchedulerService.shouldReuseContainers; locked 85% of time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1957) Multiple instances: Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.DAGAppMaster
Hitesh Shah created TEZ-1957: Summary: Multiple instances: Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.DAGAppMaster Key: TEZ-1957 URL: https://issues.apache.org/jira/browse/TEZ-1957 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler.shutdown(boolean) Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run() Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHook.run() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1958) Synchronization performed on java.util.concurrent.BlockingQueue in org.apache.tez.dag.app.rm.LocalTaskSchedulerService
Hitesh Shah created TEZ-1958: Summary: Synchronization performed on java.util.concurrent.BlockingQueue in org.apache.tez.dag.app.rm.LocalTaskSchedulerService Key: TEZ-1958 URL: https://issues.apache.org/jira/browse/TEZ-1958 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Synchronization performed on java.util.concurrent.BlockingQueue in org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.addDeallocateTaskRequest(Object) Synchronization performed on java.util.concurrent.BlockingQueue in org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1959) Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run()
Hitesh Shah created TEZ-1959: Summary: Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run() Key: TEZ-1959 URL: https://issues.apache.org/jira/browse/TEZ-1959 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Synchronization performed on java.util.concurrent.atomic.AtomicBoolean in org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run() In class org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager In method org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run() Type java.util.concurrent.atomic.AtomicBoolean Value loaded from field org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.drainedDelayedContainersForTest At YarnTaskSchedulerService.java:[line 1822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1951) Fix general findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1951: - Attachment: TEZ-1951.1.patch > Fix general findbugs warnings in tez-dag > > > Key: TEZ-1951 > URL: https://issues.apache.org/jira/browse/TEZ-1951 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah > Attachments: TEZ-1951.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-1951) Fix general findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned TEZ-1951: Assignee: Hitesh Shah > Fix general findbugs warnings in tez-dag > > > Key: TEZ-1951 > URL: https://issues.apache.org/jira/browse/TEZ-1951 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1951.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-1905) Fix findbugs warnings in tez-tests
[ https://issues.apache.org/jira/browse/TEZ-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned TEZ-1905: Assignee: Hitesh Shah > Fix findbugs warnings in tez-tests > -- > > Key: TEZ-1905 > URL: https://issues.apache.org/jira/browse/TEZ-1905 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > > https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-tests.html > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1900) Fix findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1900: - Assignee: (was: Hitesh Shah) > Fix findbugs warnings in tez-dag > > > Key: TEZ-1900 > URL: https://issues.apache.org/jira/browse/TEZ-1900 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > > Might need to be split out more. > https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-dag.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277971#comment-14277971 ] Siddharth Seth commented on TEZ-1945: - bq. SPILLED_RECORDS can be > 0 as it is accounted for in finalMerge (mem + disk). Will create a separate JIRA for post_merge_mem_limit. So it's a merge not triggered by the fetchMemoryLimit but by the postMergeMemoryLimit. Should be accounted for somehow in the counters; will create a follow up jira. > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1960) finalMerge spills should be accounted for in some counter
Siddharth Seth created TEZ-1960: --- Summary: finalMerge spills should be accounted for in some counter Key: TEZ-1960 URL: https://issues.apache.org/jira/browse/TEZ-1960 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1945) Remove 2 GB memlimit restriction in MergeManager
[ https://issues.apache.org/jira/browse/TEZ-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277993#comment-14277993 ] Hadoop QA commented on TEZ-1945: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692195/TEZ-1945.1.patch against master revision adcfb84. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 260 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/21//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/21//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/21//console This message is automatically generated. > Remove 2 GB memlimit restriction in MergeManager > > > Key: TEZ-1945 > URL: https://issues.apache.org/jira/browse/TEZ-1945 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1945.1.patch > > > In certain situations (data coming in larger chunks, but yet to complete), > fetchers might wait in MerManager.waitForShuffleToMergeMemory() for memory to > become available. > Removing the 2 GB resitrction on MergeManager.memlimit would help in such > situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1943) Move shared OutputCommiter to DAG
[ https://issues.apache.org/jira/browse/TEZ-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-1943: Description: Currently, we will have one committer for each output of vertex even if it is a shared output in the case of Vertex Group. In this scenario, the method initialize and setupOutput of OutputCommiter of vertex group will been called multiple times. Although currently there is no issue on the current OutputCommitter impl, but this could cause potential issue for any customized OutputCommitter in the future. So this jira is for moving shared OutputCommiter to DAG and let DAG to control the share OutputCommitter. was: Currently, we will have one committer for each output of vertex even if it is a shared output in the case of Vertex Group. In this scenario, the method initialize and setupOutput of OutputCommiter will been called multiple times. Although currently there is no issue on that, but this could cause potential issue for any customized OutputCommitter in the future. So this jira is for moving shared OutputCommiter to DAG and let DAG to control the share OutputCommitter. > Move shared OutputCommiter to DAG > - > > Key: TEZ-1943 > URL: https://issues.apache.org/jira/browse/TEZ-1943 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > > Currently, we will have one committer for each output of vertex even if it is > a shared output in the case of Vertex Group. In this scenario, the method > initialize and setupOutput of OutputCommiter of vertex group will been called > multiple times. Although currently there is no issue on the current > OutputCommitter impl, but this could cause potential issue for any customized > OutputCommitter in the future. > So this jira is for moving shared OutputCommiter to DAG and let DAG to > control the share OutputCommitter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1905) Fix findbugs warnings in tez-tests
[ https://issues.apache.org/jira/browse/TEZ-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1905: - Attachment: TEZ-1905.1.patch > Fix findbugs warnings in tez-tests > -- > > Key: TEZ-1905 > URL: https://issues.apache.org/jira/browse/TEZ-1905 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1905.1.patch > > > https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-tests.html > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1905) Fix findbugs warnings in tez-tests
[ https://issues.apache.org/jira/browse/TEZ-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278000#comment-14278000 ] Hitesh Shah commented on TEZ-1905: -- [~sseth] [~rajesh.balamohan] review please > Fix findbugs warnings in tez-tests > -- > > Key: TEZ-1905 > URL: https://issues.apache.org/jira/browse/TEZ-1905 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1905.1.patch > > > https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-tests.html > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278001#comment-14278001 ] Hitesh Shah commented on TEZ-1951: -- [~sseth] [~rajesh.balamohan] review please > Fix general findbugs warnings in tez-dag > > > Key: TEZ-1951 > URL: https://issues.apache.org/jira/browse/TEZ-1951 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1951.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1961) Remove misleading exception "No running dag" from AM logs
Siddharth Seth created TEZ-1961: --- Summary: Remove misleading exception "No running dag" from AM logs Key: TEZ-1961 URL: https://issues.apache.org/jira/browse/TEZ-1961 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth {code} 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus from 10.11.3.176:51879 Call#0 Retry#0 org.apache.tez.dag.api.TezException: No running dag at present at org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84) at org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035) 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running {code} This exception shows up fairly often and isn't very relevant - queries before a DAG is submitted to the AM. This is very misleading, especially for folks new to Tez, and should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1961) Remove misleading exception "No running dag" from AM logs
[ https://issues.apache.org/jira/browse/TEZ-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1961: Description: {code} 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus from Call#0 Retry#0 org.apache.tez.dag.api.TezException: No running dag at present at org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84) at org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035) 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running {code} This exception shows up fairly often and isn't very relevant - queries before a DAG is submitted to the AM. This is very misleading, especially for folks new to Tez, and should be removed. was: {code} 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus from 10.11.3.176:51879 Call#0 Retry#0 org.apache.tez.dag.api.TezException: No running dag at present at org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84) at org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94) at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2035) 15/01/14 16:45:06 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running {code} This exception shows up fairly often and isn't very relevant - queries before a DAG is submitted to the AM. This is very misleading, especially for folks new to Tez, and should be removed. > Remove misleading exception "No running dag" from AM logs > - > > Key: TEZ-1961 > URL: https://issues.apache.org/jira/browse/TEZ-1961 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth > > {code} > 15/01/14 16:45:06 INFO ipc.Server: IPC Server handler 0 on 51000, call > org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus > from Call#0 Retry#0 > org.apache.tez.dag.api.TezException: No running dag at present > at > org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84) > at > org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151) > at > org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94) > at > org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2041) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2037) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subj
[jira] [Assigned] (TEZ-1879) Create local UGI instances for each task and the AM, when running in LocalMode
[ https://issues.apache.org/jira/browse/TEZ-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned TEZ-1879: --- Assignee: Siddharth Seth > Create local UGI instances for each task and the AM, when running in LocalMode > -- > > Key: TEZ-1879 > URL: https://issues.apache.org/jira/browse/TEZ-1879 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > > Modifying the client UGI can cause issues when the client tries to submit > another job - or has tokens already populated in the UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1879) Create local UGI instances for each task and the AM, when running in LocalMode
[ https://issues.apache.org/jira/browse/TEZ-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1879: Attachment: TEZ-1879.1.txt Fairly straighforward patch which requires Credentials to be moved around. UGI for the AM and Child are already being created explicitly, so didn't need to do much there. No new unit tests since this functionality is covered by the existing local mode and MiniCluster tests. [~hitesh] - please review. > Create local UGI instances for each task and the AM, when running in LocalMode > -- > > Key: TEZ-1879 > URL: https://issues.apache.org/jira/browse/TEZ-1879 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-1879.1.txt > > > Modifying the client UGI can cause issues when the client tries to submit > another job - or has tokens already populated in the UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278033#comment-14278033 ] Hadoop QA commented on TEZ-1951: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692389/TEZ-1951.1.patch against master revision adcfb84. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 74 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestAMRecovery Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/22//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/22//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/22//console This message is automatically generated. > Fix general findbugs warnings in tez-dag > > > Key: TEZ-1951 > URL: https://issues.apache.org/jira/browse/TEZ-1951 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1951.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1942: - Target Version/s: 0.6.0, 0.5.4 Affects Version/s: 0.5.2 > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.2 >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278040#comment-14278040 ] Hitesh Shah commented on TEZ-1942: -- [~pramachandran] Looks like we need to add primary filters to the entities on every call to timeline as per the conversation on YARN-3062. Seems like a very lame solution but probably the only way to get the UI to work correctly against timeline data. Would you like to take a crack at this? > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.2 >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1962) Running out of threads in tez local mode
Gunther Hagleitner created TEZ-1962: --- Summary: Running out of threads in tez local mode Key: TEZ-1962 URL: https://issues.apache.org/jira/browse/TEZ-1962 Project: Apache Tez Issue Type: Bug Reporter: Gunther Hagleitner Priority: Critical I've been trying to port the hive ut to tez local mode. However, local mode seems to leak threads which causes tests to crash after a while (oom). See attached stack trace - there are a lot of "TezChild" threads still hanging around. ([~sseth] as discussed offline) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1962) Running out of threads in tez local mode
[ https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated TEZ-1962: Attachment: stack5.txt > Running out of threads in tez local mode > > > Key: TEZ-1962 > URL: https://issues.apache.org/jira/browse/TEZ-1962 > Project: Apache Tez > Issue Type: Bug >Reporter: Gunther Hagleitner >Priority: Critical > Attachments: stack5.txt > > > I've been trying to port the hive ut to tez local mode. However, local mode > seems to leak threads which causes tests to crash after a while (oom). See > attached stack trace - there are a lot of "TezChild" threads still hanging > around. > ([~sseth] as discussed offline) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1905) Fix findbugs warnings in tez-tests
[ https://issues.apache.org/jira/browse/TEZ-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278055#comment-14278055 ] Hadoop QA commented on TEZ-1905: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692395/TEZ-1905.1.patch against master revision adcfb84. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 254 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/23//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/23//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/23//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/23//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/23//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/23//console This message is automatically generated. > Fix findbugs warnings in tez-tests > -- > > Key: TEZ-1905 > URL: https://issues.apache.org/jira/browse/TEZ-1905 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1905.1.patch > > > https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-tests.html > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278069#comment-14278069 ] Hitesh Shah commented on TEZ-1942: -- An initial fix might be to fix VertexInit, VertexFinished and VertexParallelismUpdated events. > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.2 >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints
[ https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278074#comment-14278074 ] Jeff Zhang commented on TEZ-1069: - bq. My thinking was more along the lines for querying the VertexManager to allow it to modify the task specifications in such cases. Changing the resource is not enough. One would also need to change the java opts. For the latter, we would need to write a java opts parser if the user had specified their own java opts ( Xmx, etc ). Agree, VM is the better place to do this kind of thing, and will update the java opts also. bq. Isn't it better to setup hooks in case of OOM failures for a VertexManager to resize the task? Furthermore, a lot of OOM failures are due to data skew where one task is affected but the rest are not. I think I would add one method to VM to get notification of its task attempt failure, and decide whether to resize task. The rough idea is to resize only the task with OOM task attempt failure, and when the number of task with OOM task attempt failure meet some threshold, resize the whole vertex. bq. Last question on when should this increase be done? Should it be done on each attempt failure or only on the last attempt? If we identify the task attempt failed due to OOM, I think the next attempt will most likely still fail due to OOM. > Support ability to re-size a task attempt when previous attempts fail due to > resource constraints > - > > Key: TEZ-1069 > URL: https://issues.apache.org/jira/browse/TEZ-1069 > Project: Apache Tez > Issue Type: Improvement >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: TEZ-1069-1.patch > > > Consider a case where attempts for the final stage in a long DAG fails due to > out of memory. In such a scenario, the framework ( or via the base vertex > manager ) should be able to change the task specifications on the fly to > trigger a re-run with modified specs. > Changes could be both java opts changes as well as container resource > requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1069) Support ability to re-size a task attempt when previous attempts fail due to resource constraints
[ https://issues.apache.org/jira/browse/TEZ-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278083#comment-14278083 ] Jeff Zhang commented on TEZ-1069: - After more thinking, it would change a lot on VM. So the next step I will first put all these stuff in vertex and move to VM if necessary later. > Support ability to re-size a task attempt when previous attempts fail due to > resource constraints > - > > Key: TEZ-1069 > URL: https://issues.apache.org/jira/browse/TEZ-1069 > Project: Apache Tez > Issue Type: Improvement >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: TEZ-1069-1.patch > > > Consider a case where attempts for the final stage in a long DAG fails due to > out of memory. In such a scenario, the framework ( or via the base vertex > manager ) should be able to change the task specifications on the fly to > trigger a re-run with modified specs. > Changes could be both java opts changes as well as container resource > requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran reassigned TEZ-1942: - Assignee: Prakash Ramachandran > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.2 >Reporter: Rajesh Balamohan >Assignee: Prakash Ramachandran > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, output.json, result_with_direct_vertex.png, > result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278089#comment-14278089 ] Hitesh Shah commented on TEZ-1951: -- [~zjffdu] Do you see anything in the changes that might make TestAMRecovery.testVertexCompletelyFinished_Broadcast flaky? {code} java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_Broadcast(TestAMRecovery.java:246) {code} > Fix general findbugs warnings in tez-dag > > > Key: TEZ-1951 > URL: https://issues.apache.org/jira/browse/TEZ-1951 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1951.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278098#comment-14278098 ] Jeff Zhang commented on TEZ-1951: - [~hitesh] I saw this several days ago and created TEZ-1934, patch is available, please help review. > Fix general findbugs warnings in tez-dag > > > Key: TEZ-1951 > URL: https://issues.apache.org/jira/browse/TEZ-1951 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1951.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1879) Create local UGI instances for each task and the AM, when running in LocalMode
[ https://issues.apache.org/jira/browse/TEZ-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278108#comment-14278108 ] Hadoop QA commented on TEZ-1879: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692400/TEZ-1879.1.txt against master revision 61bb0f8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 259 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/24//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/24//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/24//console This message is automatically generated. > Create local UGI instances for each task and the AM, when running in LocalMode > -- > > Key: TEZ-1879 > URL: https://issues.apache.org/jira/browse/TEZ-1879 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-1879.1.txt > > > Modifying the client UGI can cause issues when the client tries to submit > another job - or has tokens already populated in the UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined
[ https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278121#comment-14278121 ] Hitesh Shah commented on TEZ-1934: -- Mostly looks good. Does the "onSourceTaskCompleted" function need to be synchronized - can it be called concurrently for diff tasks finishing? > TestAMRecovery may fail due to the execution order is not determined > - > > Key: TEZ-1934 > URL: https://issues.apache.org/jira/browse/TEZ-1934 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-1934-1.patch > > > task_1 is not guaranteed to been scheduled before task_0, so task_1 may > finished before task_0. While in the current TestAMRecovery, the finish of > task_1 is treated as the finished signal of vertex ( only 2 tasks in this > vertex) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined
[ https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278123#comment-14278123 ] Hitesh Shah commented on TEZ-1934: -- Triggered test patch for this jira. > TestAMRecovery may fail due to the execution order is not determined > - > > Key: TEZ-1934 > URL: https://issues.apache.org/jira/browse/TEZ-1934 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-1934-1.patch > > > task_1 is not guaranteed to been scheduled before task_0, so task_1 may > finished before task_0. While in the current TestAMRecovery, the finish of > task_1 is treated as the finished signal of vertex ( only 2 tasks in this > vertex) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1951) Fix general findbugs warnings in tez-dag
[ https://issues.apache.org/jira/browse/TEZ-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278125#comment-14278125 ] Hadoop QA commented on TEZ-1951: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692389/TEZ-1951.1.patch against master revision 61bb0f8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 74 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/25//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/25//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/25//console This message is automatically generated. > Fix general findbugs warnings in tez-dag > > > Key: TEZ-1951 > URL: https://issues.apache.org/jira/browse/TEZ-1951 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-1951.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined
[ https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278128#comment-14278128 ] Jeff Zhang commented on TEZ-1934: - onSourceTaskCompleted is only called in the main dispatcher thread. it should be fine without synchronized > TestAMRecovery may fail due to the execution order is not determined > - > > Key: TEZ-1934 > URL: https://issues.apache.org/jira/browse/TEZ-1934 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-1934-1.patch > > > task_1 is not guaranteed to been scheduled before task_0, so task_1 may > finished before task_0. While in the current TestAMRecovery, the finish of > task_1 is treated as the finished signal of vertex ( only 2 tasks in this > vertex) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined
[ https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278131#comment-14278131 ] Hitesh Shah edited comment on TEZ-1934 at 1/15/15 2:27 AM: --- +1 (pending test-patch results). was (Author: hitesh): +1. > TestAMRecovery may fail due to the execution order is not determined > - > > Key: TEZ-1934 > URL: https://issues.apache.org/jira/browse/TEZ-1934 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-1934-1.patch > > > task_1 is not guaranteed to been scheduled before task_0, so task_1 may > finished before task_0. While in the current TestAMRecovery, the finish of > task_1 is treated as the finished signal of vertex ( only 2 tasks in this > vertex) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined
[ https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278131#comment-14278131 ] Hitesh Shah commented on TEZ-1934: -- +1. > TestAMRecovery may fail due to the execution order is not determined > - > > Key: TEZ-1934 > URL: https://issues.apache.org/jira/browse/TEZ-1934 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-1934-1.patch > > > task_1 is not guaranteed to been scheduled before task_0, so task_1 may > finished before task_0. While in the current TestAMRecovery, the finish of > task_1 is treated as the finished signal of vertex ( only 2 tasks in this > vertex) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1963) Fix post memory merge to be > 2 GB
Rajesh Balamohan created TEZ-1963: - Summary: Fix post memory merge to be > 2 GB Key: TEZ-1963 URL: https://issues.apache.org/jira/browse/TEZ-1963 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1963) Fix post memory merge to be > 2 GB
[ https://issues.apache.org/jira/browse/TEZ-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1963: -- Attachment: TEZ-1963.1.patch > Fix post memory merge to be > 2 GB > -- > > Key: TEZ-1963 > URL: https://issues.apache.org/jira/browse/TEZ-1963 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: TEZ-1963.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1963) Fix post memory merge to be > 2 GB
[ https://issues.apache.org/jira/browse/TEZ-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278164#comment-14278164 ] Hadoop QA commented on TEZ-1963: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692421/TEZ-1963.1.patch against master revision 61bb0f8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 260 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.rm.TestContainerReuse Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/28//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/28//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/28//console This message is automatically generated. > Fix post memory merge to be > 2 GB > -- > > Key: TEZ-1963 > URL: https://issues.apache.org/jira/browse/TEZ-1963 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1963.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-1962) Running out of threads in tez local mode
[ https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned TEZ-1962: --- Assignee: Siddharth Seth > Running out of threads in tez local mode > > > Key: TEZ-1962 > URL: https://issues.apache.org/jira/browse/TEZ-1962 > Project: Apache Tez > Issue Type: Bug >Reporter: Gunther Hagleitner >Assignee: Siddharth Seth >Priority: Critical > Attachments: stack5.txt > > > I've been trying to port the hive ut to tez local mode. However, local mode > seems to leak threads which causes tests to crash after a while (oom). See > attached stack trace - there are a lot of "TezChild" threads still hanging > around. > ([~sseth] as discussed offline) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-1964) `
[ https://issues.apache.org/jira/browse/TEZ-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved TEZ-1964. - Resolution: Invalid > ` > - > > Key: TEZ-1964 > URL: https://issues.apache.org/jira/browse/TEZ-1964 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1964) `
[ https://issues.apache.org/jira/browse/TEZ-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1964: Issue Type: Bug (was: Sub-task) Parent: (was: TEZ-1962) > ` > - > > Key: TEZ-1964 > URL: https://issues.apache.org/jira/browse/TEZ-1964 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1964) `
Siddharth Seth created TEZ-1964: --- Summary: ` Key: TEZ-1964 URL: https://issues.apache.org/jira/browse/TEZ-1964 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1962) Running out of threads in tez local mode
[ https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1962: Target Version/s: 0.7.0 > Running out of threads in tez local mode > > > Key: TEZ-1962 > URL: https://issues.apache.org/jira/browse/TEZ-1962 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Gunther Hagleitner >Assignee: Siddharth Seth >Priority: Critical > Attachments: stack5.txt > > > I've been trying to port the hive ut to tez local mode. However, local mode > seems to leak threads which causes tests to crash after a while (oom). See > attached stack trace - there are a lot of "TezChild" threads still hanging > around. > ([~sseth] as discussed offline) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined
[ https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278182#comment-14278182 ] Hadoop QA commented on TEZ-1934: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691636/TEZ-1934-1.patch against master revision 61bb0f8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 260 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/26//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/26//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/26//console This message is automatically generated. > TestAMRecovery may fail due to the execution order is not determined > - > > Key: TEZ-1934 > URL: https://issues.apache.org/jira/browse/TEZ-1934 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-1934-1.patch > > > task_1 is not guaranteed to been scheduled before task_0, so task_1 may > finished before task_0. While in the current TestAMRecovery, the finish of > task_1 is treated as the finished signal of vertex ( only 2 tasks in this > vertex) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1962) Running out of threads in tez local mode
[ https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1962: Issue Type: Sub-task (was: Bug) Parent: TEZ-1876 > Running out of threads in tez local mode > > > Key: TEZ-1962 > URL: https://issues.apache.org/jira/browse/TEZ-1962 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Gunther Hagleitner >Assignee: Siddharth Seth >Priority: Critical > Attachments: stack5.txt > > > I've been trying to port the hive ut to tez local mode. However, local mode > seems to leak threads which causes tests to crash after a while (oom). See > attached stack trace - there are a lot of "TezChild" threads still hanging > around. > ([~sseth] as discussed offline) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined
[ https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278185#comment-14278185 ] Hadoop QA commented on TEZ-1934: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691636/TEZ-1934-1.patch against master revision 61bb0f8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 179 javac compiler warnings (more than the master's current 171 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:red}-1 findbugs{color}. The patch appears to introduce 260 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/27//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/27//console This message is automatically generated. > TestAMRecovery may fail due to the execution order is not determined > - > > Key: TEZ-1934 > URL: https://issues.apache.org/jira/browse/TEZ-1934 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-1934-1.patch > > > task_1 is not guaranteed to been scheduled before task_0, so task_1 may > finished before task_0. While in the current TestAMRecovery, the finish of > task_1 is treated as the finished signal of vertex ( only 2 tasks in this > vertex) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1934) TestAMRecovery may fail due to the execution order is not determined
[ https://issues.apache.org/jira/browse/TEZ-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278194#comment-14278194 ] Jeff Zhang commented on TEZ-1934: - [~hitesh], New javac compiler warnings and javadoc warnings are generated, where can I see these warnings ? The link https://builds.apache.org/job/PreCommit-TEZ-Build/27//artifact/patchprocess/diffJavadocWarnings.txt looks broken. > TestAMRecovery may fail due to the execution order is not determined > - > > Key: TEZ-1934 > URL: https://issues.apache.org/jira/browse/TEZ-1934 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-1934-1.patch > > > task_1 is not guaranteed to been scheduled before task_0, so task_1 may > finished before task_0. While in the current TestAMRecovery, the finish of > task_1 is treated as the finished signal of vertex ( only 2 tasks in this > vertex) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1942) Number of tasks show in Tez UI with auto-reduce parallelism is misleading
[ https://issues.apache.org/jira/browse/TEZ-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-1942: -- Attachment: TEZ-1942.1.patch [~hitesh] review please > Number of tasks show in Tez UI with auto-reduce parallelism is misleading > - > > Key: TEZ-1942 > URL: https://issues.apache.org/jira/browse/TEZ-1942 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.2 >Reporter: Rajesh Balamohan >Assignee: Prakash Ramachandran > Attachments: Screen Shot 2015-01-14 at 9.18.21 AM.png, Screen Shot > 2015-01-14 at 9.18.54 AM.png, TEZ-1942.1.patch, output.json, > result_with_direct_vertex.png, result_with_primary_filter.png > > > Ran a simple hive query (with tez) and "--hiveconf > hive.tez.auto.reducer.parallelism=true" . This internally turns on tez's > auto reduce parallelism. > - Job started off with 1009 reduce tasks > - Tez reduces the number of reducers to 253 > - Job completes successfully, but TEZ UI shows 1009 as the number of reducers > (and 253 tasks as successful tasks). This can be a little misleading. > I will attach the screenshots soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)