[jira] [Updated] (TEZ-1637) Improved shuffle error handling across NM restarts
[ https://issues.apache.org/jira/browse/TEZ-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1637: -- Attachment: TEZ-1637.1.patch ready for review > Improved shuffle error handling across NM restarts > --- > > Key: TEZ-1637 > URL: https://issues.apache.org/jira/browse/TEZ-1637 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-1637.1.patch, TEZ-1637.WIP.patch > > > Similar to MAPREDUCE-5891 :- need to make sure the Tez shufflehandler can > handle NM restarts correctly. This is required for rolling upgrades -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1635) Dag gets stuck intermittently
[ https://issues.apache.org/jira/browse/TEZ-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158601#comment-14158601 ] Gunther Hagleitner commented on TEZ-1635: - {noformat} "TezChild" daemon prio=5 tid=7fc9ad1a6000 nid=0x112c82000 waiting on condition [112c8] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <7f3b53b68> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.awaitCondition(InputReadyTracker.java:120) at org.apache.tez.runtime.InputReadyTracker.waitForAnyInputReady(InputReadyTracker.java:83) at org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAnyInputReady(TezProcessorContextImpl.java:104) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:161) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:142) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:695) {noformat} > Dag gets stuck intermittently > - > > Key: TEZ-1635 > URL: https://issues.apache.org/jira/browse/TEZ-1635 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Vikram Dixit K >Priority: Blocker > Attachments: syslog_dag_1412109415326_0002_10.gz > > > Attaching logs for the dag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1344) Combiner counters reported by Tez look wrong
[ https://issues.apache.org/jira/browse/TEZ-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158858#comment-14158858 ] Alexander Pivovarov edited comment on TEZ-1344 at 10/4/14 1:45 AM: --- MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. {code} hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar wordcount -D mapreduce.framework.name=yarn-tez in out 14/10/03 18:22:59 INFO mapreduce.Job: map 100% reduce 100% 14/10/03 18:22:59 INFO mapreduce.Job: Job job_1412382361327_0008 completed successfully 14/10/03 18:22:59 INFO mapreduce.Job: Counters: 0 {code} Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff sugested returns {code} $ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out ... org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=35518 REDUCE_INPUT_RECORDS=284742 COMBINE_INPUT_RECORDS=0 {code} comments in org.apache.tez.common.counters.TaskCounte code says {code} COMBINE_OUTPUT_RECORDS, // Not used at the moment. {code} I notieced that [~cheolsoo] mentioned class org.apache.hadoop.mapreduce.TaskCounter (defined in hadoop jars) but tez api programm returns counters from different class (defined in tez jars) org.apache.tez.common.counters.TaskCounter I'm confused. How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS counters? was (Author: apivovarov): MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. {code} hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar wordcount -D mapreduce.framework.name=yarn-tez in out {code} Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff sugested returns {code} $ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out ... org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=35518 REDUCE_INPUT_RECORDS=284742 COMBINE_INPUT_RECORDS=0 {code} comments in org.apache.tez.common.counters.TaskCounte code says {code} COMBINE_OUTPUT_RECORDS, // Not used at the moment. {code} I notieced that [~cheolsoo] mentioned class org.apache.hadoop.mapreduce.TaskCounter (defined in hadoop jars) but tez api programm returns counters from different class (defined in tez jars) org.apache.tez.common.counters.TaskCounter I'm confused. How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS counters? > Combiner counters reported by Tez look wrong > > > Key: TEZ-1344 > URL: https://issues.apache.org/jira/browse/TEZ-1344 > Project: Apache Tez > Issue Type: Bug >Reporter: Cheolsoo Park >Priority: Minor > > Combiner input/output counters reported by a Tez job seems wrong > {code} > org.apache.hadoop.mapreduce.TaskCounter: > COMBINE_OUTPUT_RECORDS 35,977,263,353 > COMBINE_INPUT_RECORDS 1,000,529,333 > {code} > As can be seen, combiner output records > input records?! > The same counters from a MR job looks as follows- > {code} > Map-Reduce Framework: > Combine output records 1,000,316,600 > Combine input records 35,977,049,632 > {code} > Somehow input and output are swapped? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1344) Combiner counters reported by Tez look wrong
[ https://issues.apache.org/jira/browse/TEZ-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158858#comment-14158858 ] Alexander Pivovarov commented on TEZ-1344: -- MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. {code} hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar wordcount -D mapreduce.framework.name=yarn-tez in out {code} Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff sugested returns {code} $ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out ... org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=35518 REDUCE_INPUT_RECORDS=284742 COMBINE_INPUT_RECORDS=0 {code} comments in org.apache.tez.common.counters.TaskCounte code says {code} COMBINE_OUTPUT_RECORDS, // Not used at the moment. {code} I notieced that [~cheolsoo] mentioned class org.apache.hadoop.mapreduce.TaskCounter (defined in hadoop jars) but tez api programm returns counters from different class (defined in tez jars) org.apache.tez.common.counters.TaskCounter I'm confused. How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS counters? > Combiner counters reported by Tez look wrong > > > Key: TEZ-1344 > URL: https://issues.apache.org/jira/browse/TEZ-1344 > Project: Apache Tez > Issue Type: Bug >Reporter: Cheolsoo Park >Priority: Minor > > Combiner input/output counters reported by a Tez job seems wrong > {code} > org.apache.hadoop.mapreduce.TaskCounter: > COMBINE_OUTPUT_RECORDS 35,977,263,353 > COMBINE_INPUT_RECORDS 1,000,529,333 > {code} > As can be seen, combiner output records > input records?! > The same counters from a MR job looks as follows- > {code} > Map-Reduce Framework: > Combine output records 1,000,316,600 > Combine input records 35,977,049,632 > {code} > Somehow input and output are swapped? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1344) Combiner counters reported by Tez look wrong
[ https://issues.apache.org/jira/browse/TEZ-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158858#comment-14158858 ] Alexander Pivovarov edited comment on TEZ-1344 at 10/4/14 1:46 AM: --- MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. {code} hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar wordcount -D mapreduce.framework.name=yarn-tez in out 14/10/03 18:22:59 INFO mapreduce.Job: map 100% reduce 100% 14/10/03 18:22:59 INFO mapreduce.Job: Job job_1412382361327_0008 completed successfully 14/10/03 18:22:59 INFO mapreduce.Job: Counters: 0 {code} Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff suggested returns {code} $ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out ... org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=35518 REDUCE_INPUT_RECORDS=284742 COMBINE_INPUT_RECORDS=0 {code} comments in org.apache.tez.common.counters.TaskCounter code says {code} COMBINE_OUTPUT_RECORDS, // Not used at the moment. {code} I notieced that [~cheolsoo] mentioned class org.apache.hadoop.mapreduce.TaskCounter (defined in hadoop jars) but tez api programm returns counters from different class (defined in tez jars) org.apache.tez.common.counters.TaskCounter I'm confused. How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS counters? was (Author: apivovarov): MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. {code} hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar wordcount -D mapreduce.framework.name=yarn-tez in out 14/10/03 18:22:59 INFO mapreduce.Job: map 100% reduce 100% 14/10/03 18:22:59 INFO mapreduce.Job: Job job_1412382361327_0008 completed successfully 14/10/03 18:22:59 INFO mapreduce.Job: Counters: 0 {code} Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff sugested returns {code} $ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out ... org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=35518 REDUCE_INPUT_RECORDS=284742 COMBINE_INPUT_RECORDS=0 {code} comments in org.apache.tez.common.counters.TaskCounte code says {code} COMBINE_OUTPUT_RECORDS, // Not used at the moment. {code} I notieced that [~cheolsoo] mentioned class org.apache.hadoop.mapreduce.TaskCounter (defined in hadoop jars) but tez api programm returns counters from different class (defined in tez jars) org.apache.tez.common.counters.TaskCounter I'm confused. How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS counters? > Combiner counters reported by Tez look wrong > > > Key: TEZ-1344 > URL: https://issues.apache.org/jira/browse/TEZ-1344 > Project: Apache Tez > Issue Type: Bug >Reporter: Cheolsoo Park >Priority: Minor > > Combiner input/output counters reported by a Tez job seems wrong > {code} > org.apache.hadoop.mapreduce.TaskCounter: > COMBINE_OUTPUT_RECORDS 35,977,263,353 > COMBINE_INPUT_RECORDS 1,000,529,333 > {code} > As can be seen, combiner output records > input records?! > The same counters from a MR job looks as follows- > {code} > Map-Reduce Framework: > Combine output records 1,000,316,600 > Combine input records 35,977,049,632 > {code} > Somehow input and output are swapped? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1344) Combiner counters reported by Tez look wrong
[ https://issues.apache.org/jira/browse/TEZ-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158858#comment-14158858 ] Alexander Pivovarov edited comment on TEZ-1344 at 10/4/14 1:49 AM: --- MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. {code} hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar wordcount -D mapreduce.framework.name=yarn-tez in out 14/10/03 18:22:59 INFO mapreduce.Job: map 100% reduce 100% 14/10/03 18:22:59 INFO mapreduce.Job: Job job_1412382361327_0008 completed successfully 14/10/03 18:22:59 INFO mapreduce.Job: Counters: 0 {code} Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff suggested returns {code} $ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out ... org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=35518 REDUCE_INPUT_RECORDS=284742 COMBINE_INPUT_RECORDS=0 {code} comments in org.apache.tez.common.counters.TaskCounter code says {code} COMBINE_OUTPUT_RECORDS, // Not used at the moment. {code} I notieced that [~cheolsoo] mentioned class org.apache.hadoop.mapreduce.TaskCounter (defined in hadoop jars) but tez api programm returns counters from different class (defined in tez jars) org.apache.tez.common.counters.TaskCounter I'm confused. How and what shoud I run by tez to get hadoop but not tez TaskCounters? org.apache.hadoop.mapreduce.TaskCounter COMBINE_OUTPUT_RECORDS COMBINE_INPUT_RECORDS was (Author: apivovarov): MR API programm (e.g. org.apache.tez.mapreduce.examples.MapredWordCount) run by yarn-tez always return Counters: 0. {code} hadoop jar tez-tests/target/tez-tests-0.6.0-SNAPSHOT.jar wordcount -D mapreduce.framework.name=yarn-tez in out 14/10/03 18:22:59 INFO mapreduce.Job: map 100% reduce 100% 14/10/03 18:22:59 INFO mapreduce.Job: Job job_1412382361327_0008 completed successfully 14/10/03 18:22:59 INFO mapreduce.Job: Counters: 0 {code} Tez API programm (e.g. org.apache.tez.examples.WordCount) modified as Jeff suggested returns {code} $ hadoop jar tez-examples/target/tez-examples-0.6.0-SNAPSHOT.jar wordcount in out ... org.apache.tez.common.counters.TaskCounter REDUCE_INPUT_GROUPS=35518 REDUCE_INPUT_RECORDS=284742 COMBINE_INPUT_RECORDS=0 {code} comments in org.apache.tez.common.counters.TaskCounter code says {code} COMBINE_OUTPUT_RECORDS, // Not used at the moment. {code} I notieced that [~cheolsoo] mentioned class org.apache.hadoop.mapreduce.TaskCounter (defined in hadoop jars) but tez api programm returns counters from different class (defined in tez jars) org.apache.tez.common.counters.TaskCounter I'm confused. How and what shoud I run by tez to get org.apache.hadoop.mapreduce.TaskCounter COMBINE_OUTPUT_RECORDS and COMBINE_INPUT_RECORDS counters? > Combiner counters reported by Tez look wrong > > > Key: TEZ-1344 > URL: https://issues.apache.org/jira/browse/TEZ-1344 > Project: Apache Tez > Issue Type: Bug >Reporter: Cheolsoo Park >Priority: Minor > > Combiner input/output counters reported by a Tez job seems wrong > {code} > org.apache.hadoop.mapreduce.TaskCounter: > COMBINE_OUTPUT_RECORDS 35,977,263,353 > COMBINE_INPUT_RECORDS 1,000,529,333 > {code} > As can be seen, combiner output records > input records?! > The same counters from a MR job looks as follows- > {code} > Map-Reduce Framework: > Combine output records 1,000,316,600 > Combine input records 35,977,049,632 > {code} > Somehow input and output are swapped? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-1615) Skeleton framework for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles resolved TEZ-1615. -- Resolution: Fixed Fix Version/s: 0.6.0 > Skeleton framework for Tez UI > - > > Key: TEZ-1615 > URL: https://issues.apache.org/jira/browse/TEZ-1615 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Prakash Ramachandran > Fix For: 0.6.0 > > Attachments: tez-ui.tgz > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1615) Skeleton framework for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158879#comment-14158879 ] Jonathan Eagles commented on TEZ-1615: -- +1. Committed to TEZ-8 branch. > Skeleton framework for Tez UI > - > > Key: TEZ-1615 > URL: https://issues.apache.org/jira/browse/TEZ-1615 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Prakash Ramachandran > Fix For: 0.6.0 > > Attachments: tez-ui.tgz > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1635) Dag gets stuck intermittently
[ https://issues.apache.org/jira/browse/TEZ-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158953#comment-14158953 ] Rajesh Balamohan commented on TEZ-1635: --- [~hagleitn] - Is there a hive test case for reproducing this which I can try out locally? > Dag gets stuck intermittently > - > > Key: TEZ-1635 > URL: https://issues.apache.org/jira/browse/TEZ-1635 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Vikram Dixit K >Priority: Blocker > Attachments: syslog_dag_1412109415326_0002_10.gz > > > Attaching logs for the dag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1635) Dag gets stuck intermittently
[ https://issues.apache.org/jira/browse/TEZ-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158953#comment-14158953 ] Rajesh Balamohan edited comment on TEZ-1635 at 10/4/14 5:18 AM: [~hagleitn] - Is there a test case for reproducing this which I can try out locally? was (Author: rajesh.balamohan): [~hagleitn] - Is there a hive test case for reproducing this which I can try out locally? > Dag gets stuck intermittently > - > > Key: TEZ-1635 > URL: https://issues.apache.org/jira/browse/TEZ-1635 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Vikram Dixit K >Priority: Blocker > Attachments: syslog_dag_1412109415326_0002_10.gz > > > Attaching logs for the dag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1635) Dag gets stuck intermittently
[ https://issues.apache.org/jira/browse/TEZ-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159002#comment-14159002 ] Gunther Hagleitner commented on TEZ-1635: - tez_smb_1.q is consistentlly failing for me (latest trunk). > Dag gets stuck intermittently > - > > Key: TEZ-1635 > URL: https://issues.apache.org/jira/browse/TEZ-1635 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Vikram Dixit K >Priority: Blocker > Attachments: syslog_dag_1412109415326_0002_10.gz > > > Attaching logs for the dag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)