[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160058#comment-15160058
 ] 

Bikas Saha commented on TEZ-3124:
-

So in this case task needed event to start and so it hung. If initgenerated 
events is legitimately empty then task will not hang and overall we will not 
hang.

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, TEZ-3124-5.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> valueSerializerClass=org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer@5f143de6,
>  comparator=org.apache.hadoop.io.IntWritable$Comparator@ec52d1f, 
> partitioner=org.apache.tez.mapreduce.partition.MRPartitioner, 
> serialization=org.apache.hadoop.io.serializer.WritableSerialization
> 2016-02-09 04:48:43,686 [INFO] [TezChild] |impl.PipelinedSorter|: Setting up 
> PipelinedSorter for ireduce1: , UsingHashComparator=false
> 2016-02-09 04:48:45,093 [INFO] [TezChild] |impl.PipelinedSorter|: Newly 
> allocated block 

[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160059#comment-15160059
 ] 

Bikas Saha commented on TEZ-3124:
-

lgtm. +1. Thanks!

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, TEZ-3124-5.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> valueSerializerClass=org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer@5f143de6,
>  comparator=org.apache.hadoop.io.IntWritable$Comparator@ec52d1f, 
> partitioner=org.apache.tez.mapreduce.partition.MRPartitioner, 
> serialization=org.apache.hadoop.io.serializer.WritableSerialization
> 2016-02-09 04:48:43,686 [INFO] [TezChild] |impl.PipelinedSorter|: Setting up 
> PipelinedSorter for ireduce1: , UsingHashComparator=false
> 2016-02-09 04:48:45,093 [INFO] [TezChild] |impl.PipelinedSorter|: Newly 
> allocated block size=1725956096, index=0, Number of buffers=1, 
> currentAllocatableMemory=0, currentBufferSize=1725956096, total=1725956096
> 2016-02-09 

Failed: TEZ-3102 PreCommit Build #1504

2016-02-23 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3102
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1504/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3579 lines...]
[ERROR]   mvn  -rf :tez-dag
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12789327/TEZ-3102.002.patch
  against master revision 7fc28f7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.history.TestHistoryParser
  org.apache.tez.dag.app.dag.impl.TestDAGImpl

  The following test timeouts occurred in :
 org.apache.tez.test.TestRecovery

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1504//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1504//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
7f87deeb53dcf5b8970e5e20d43ecd3a38f5af39 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  org.apache.tez.dag.app.dag.impl.TestDAGImpl.testCounterLimits

Error Message:
expected: but was:

Stack Trace:
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.dag.app.dag.impl.TestDAGImpl.testCounterLimits(TestDAGImpl.java:2290)


FAILED:  org.apache.tez.history.TestHistoryParser.testParserWithSuccessfulJob

Error Message:
null

Stack Trace:
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.tez.history.TestHistoryParser.verifyJobSpecificInfo(TestHistoryParser.java:252)
at 
org.apache.tez.history.TestHistoryParser.testParserWithSuccessfulJob(TestHistoryParser.java:209)




[jira] [Commented] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-23 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160033#comment-15160033
 ] 

TezQA commented on TEZ-3102:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12789327/TEZ-3102.002.patch
  against master revision 7fc28f7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.history.TestHistoryParser
  org.apache.tez.dag.app.dag.impl.TestDAGImpl

  The following test timeouts occurred in :
 org.apache.tez.test.TestRecovery

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1504//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1504//console

This message is automatically generated.

> Fetch failure of a speculated task causes job hang
> --
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160005#comment-15160005
 ] 

Jeff Zhang commented on TEZ-3124:
-

bq.  Can this happen? Even if no, why add the side effect of initing 
initGeneratedEvent?
This would not happen, restoring initGeneratedEvent is for in case miss some 
corner case. But you are right, I should not add the side effect of 
initGeneratedEvent, if hangs happens, there must another bug. 
bq. Orthogonally, initGeneratedEvents could be empty even after init. This is 
valid. Will that be a problem?
if initGeneratedEvents is empty, that means the task don't need event to 
initialize event (if it need, then it is user code bug). 


> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> valueSerializerClass=org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer@5f143de6,
>  comparator=org.apache.hadoop.io.IntWritable$Comparator@ec52d1f, 
> 

[jira] [Comment Edited] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160005#comment-15160005
 ] 

Jeff Zhang edited comment on TEZ-3124 at 2/24/16 1:13 AM:
--

bq.  Can this happen? Even if no, why add the side effect of initing 
initGeneratedEvent?
This would not happen, restoring initGeneratedEvent is for in case miss some 
corner case. But you are right, I should not add the side effect of 
initGeneratedEvent, if hangs happens again, there must another bug. 
bq. Orthogonally, initGeneratedEvents could be empty even after init. This is 
valid. Will that be a problem?
if initGeneratedEvents is empty, that means the task don't need event to 
initialize event (if it need, then it is user code bug). 



was (Author: zjffdu):
bq.  Can this happen? Even if no, why add the side effect of initing 
initGeneratedEvent?
This would not happen, restoring initGeneratedEvent is for in case miss some 
corner case. But you are right, I should not add the side effect of 
initGeneratedEvent, if hangs happens, there must another bug. 
bq. Orthogonally, initGeneratedEvents could be empty even after init. This is 
valid. Will that be a problem?
if initGeneratedEvents is empty, that means the task don't need event to 
initialize event (if it need, then it is user code bug). 


> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: 

[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159946#comment-15159946
 ] 

Bikas Saha commented on TEZ-3124:
-

Then the fix should be restricted to not logging VertexInitializedEvent if 
shouldSkipInit is true. Initializing the initGeneratedEvent to the old value 
might have the side effect when shouldSkipInit is false. In that case init will 
run again and initGeneratedEvent could have old recovered events and new init 
generated events. Can this happen? Even if no, why add the side effect of 
initing initGeneratedEvent?

Your explanation makes sense for the fix. My concern is for the change to 
initGeneratedEvents.

Orthogonally, initGeneratedEvents could be empty even after init. This is 
valid. Will that be a problem? Asking because in this case we got hung because 
vertex initialized event had empty initGeneratedEvents.

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> 

[jira] [Comment Edited] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159919#comment-15159919
 ] 

Jeff Zhang edited comment on TEZ-3124 at 2/24/16 12:07 AM:
---

RecoveryParser will scan the recovery files from the oldest attempt to latest 
attempt. And will pick up the latest VertexInitializedEvent if there's multiple 
VertexInitializedEvent. The root cause is that we will still log 
VertexInitializedEvent if shouldSkipInit() is true but in this cause there's 
zero initGeneratedEvent in the VertexInitializedEvent which cause the next 
recovery hangs. I can reproduce it in the test case and verify it is resolved 
after this patch. 


was (Author: zjffdu):
RecoveryParser will scan the recovery files from the oldest attempt to latest 
attempt. And will pick up the latest VertexInitializedEvent if there's multiple 
VertexInitializedEvent. The root cause is that we will still log 
VertexInitializedEvent if shouldSkipInit() is true and in this cause there's 
zero initGeneratedEvent in the VertexInitializedEvent which cause the next 
recovery hangs. I can reproduce it in the test case and verify it is resolved 
after this patch. 

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, 

[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159919#comment-15159919
 ] 

Jeff Zhang commented on TEZ-3124:
-

RecoveryParser will scan the recovery files from the oldest attempt to latest 
attempt. And will pick up the latest VertexInitializedEvent if there's multiple 
VertexInitializedEvent. The root cause is that we will still log 
VertexInitializedEvent if shouldSkipInit() is true and in this cause there's 
zero initGeneratedEvent in the VertexInitializedEvent which cause the next 
recovery hangs. I can reproduce it in the test case and verify it is resolved 
after this patch. 

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> valueSerializerClass=org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer@5f143de6,
>  comparator=org.apache.hadoop.io.IntWritable$Comparator@ec52d1f, 
> partitioner=org.apache.tez.mapreduce.partition.MRPartitioner, 
> 

[jira] [Commented] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159835#comment-15159835
 ] 

Bikas Saha commented on TEZ-3102:
-

+1.

I think testTaskSucceedAndRetroActiveFailure() should be covering the new code 
changes in the success attempt code path. In the small chance that its not, 
would you please update the test. Thanks!

> Fetch failure of a speculated task causes job hang
> --
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159835#comment-15159835
 ] 

Bikas Saha edited comment on TEZ-3102 at 2/23/16 11:09 PM:
---

+1.

I think testTaskSucceedAndRetroActiveFailure() should already be covering the 
new code changes in the success attempt code path. In the small chance that its 
not, would you please update the test. Thanks!


was (Author: bikassaha):
+1.

I think testTaskSucceedAndRetroActiveFailure() should be covering the new code 
changes in the success attempt code path. In the small chance that its not, 
would you please update the test. Thanks!

> Fetch failure of a speculated task causes job hang
> --
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3102) Fetch failure of a speculated task causes job hang

2016-02-23 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated TEZ-3102:

Attachment: TEZ-3102.002.patch

Sorry for the late reply, I was out on vacation.

Ah, yes, I somehow missed the successfulAttempt check when I looked at it.  I 
updated the patch to reuse the AttemptKilledTransition logic for both the 
successful and unsuccessful attempt paths in the retroactive killed case.

> Fetch failure of a speculated task causes job hang
> --
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159680#comment-15159680
 ] 

Hitesh Shah commented on TEZ-3135:
--

Looks good. Will commit shortly. Thanks [~vijay_k]

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Fix For: 0.8.3
>
> Attachments: TEZ-3135.2.patch, TEZ-3135.patch, TEZ-3135.patch.1, 
> TEZ-3135.patch.3, tez-ext-service-tests.log, tez-javadoc-tools.log, 
> tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159677#comment-15159677
 ] 

Vijay Kumar commented on TEZ-3135:
--

[~hitesh] Uploaded addendum patch TEZ-3135.patch.3.

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Fix For: 0.8.3
>
> Attachments: TEZ-3135.2.patch, TEZ-3135.patch, TEZ-3135.patch.1, 
> TEZ-3135.patch.3, tez-ext-service-tests.log, tez-javadoc-tools.log, 
> tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Kumar updated TEZ-3135:
-
Attachment: TEZ-3135.patch.3

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Fix For: 0.8.3
>
> Attachments: TEZ-3135.2.patch, TEZ-3135.patch, TEZ-3135.patch.1, 
> TEZ-3135.patch.3, tez-ext-service-tests.log, tez-javadoc-tools.log, 
> tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159640#comment-15159640
 ] 

Vijay Kumar commented on TEZ-3135:
--

[~hitesh] After moving hadoop-hdfs tez-plugins/tez-yarn-timeline-history 
dependency to test scope. I found some more missing dependencies in 
test-plugins for hadoop-hdfs.

Please use the latest patch TEZ-3135.patch.1 which is having all the missing 
dependencies with complete verification.

I see you have applied TEZ-3135.patch.2 which would be partial.

I am adding an addendum patch which would have all remaining. I mean diff 
between TEZ-3135.patch.2  (partial) and TEZ-3135.patch.1 (complete).

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Fix For: 0.8.3
>
> Attachments: TEZ-3135.2.patch, TEZ-3135.patch, TEZ-3135.patch.1, 
> tez-ext-service-tests.log, tez-javadoc-tools.log, 
> tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Kumar updated TEZ-3135:
-
Attachment: TEZ-3135.patch.1

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Fix For: 0.8.3
>
> Attachments: TEZ-3135.2.patch, TEZ-3135.patch, TEZ-3135.patch.1, 
> tez-ext-service-tests.log, tez-javadoc-tools.log, 
> tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3135:
-
Attachment: TEZ-3135.2.patch

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Attachments: TEZ-3135.2.patch, TEZ-3135.patch, 
> tez-ext-service-tests.log, tez-javadoc-tools.log, 
> tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159587#comment-15159587
 ] 

Hitesh Shah commented on TEZ-3135:
--

Actually, will go ahead and address my review comment in the next patch and 
commit shortly. 

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Attachments: TEZ-3135.patch, tez-ext-service-tests.log, 
> tez-javadoc-tools.log, tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159548#comment-15159548
 ] 

Bikas Saha commented on TEZ-3124:
-

Lets say shouldSkipInit() is false because VertexInitializedEvent !=null but 
ConfigurationDoneEvent == null.
So we will rerun init. And then we will log another VertexInitializedEvent. 
Right? In that case how will the next AM attempt handle multiple 
VertexInitializedEvent?
If we are doing init again, then that process will add new items into 
initGeneratedEvents. So we should not be restoring older initGeneratedEvents 
into the new object or else the new object will have more items than necessary.

So I am not sure what is broken and how the fix is working. Could you please 
help by pointing out the exact sequence of events that causes the issue? Thanks!

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> 

[jira] [Commented] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159547#comment-15159547
 ] 

Hitesh Shah commented on TEZ-3135:
--

Thanks for attaching the logs and the fix. The changes mostly look fine except 
for the one for "tez-plugins/tez-yarn-timeline-history". I believe the added 
dep for hdfs should be in scope test. Could you please provide an updated 
patch? 

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Attachments: TEZ-3135.patch, tez-ext-service-tests.log, 
> tez-javadoc-tools.log, tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159546#comment-15159546
 ] 

Vijay Kumar commented on TEZ-3135:
--

[~hitesh] Compile error logs attached herein.

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Attachments: TEZ-3135.patch, tez-ext-service-tests.log, 
> tez-javadoc-tools.log, tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Kumar updated TEZ-3135:
-
Attachment: tez-yarn-timeline-history.log
tez-javadoc-tools.log
tez-ext-service-tests.log

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Attachments: TEZ-3135.patch, tez-ext-service-tests.log, 
> tez-javadoc-tools.log, tez-yarn-timeline-history.log
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Kumar updated TEZ-3135:
-
Attachment: TEZ-3135.patch

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Attachments: TEZ-3135.patch
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Kumar updated TEZ-3135:
-
Attachment: (was: TEZ-3135.patch)

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3135:
-
Assignee: Vijay Kumar

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
>Assignee: Vijay Kumar
> Attachments: TEZ-3135.patch
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Kumar updated TEZ-3135:
-
Attachment: TEZ-3135.patch

> tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and 
> tez-tools/tez-javadoc-tools missing dependencies
> -
>
> Key: TEZ-3135
> URL: https://issues.apache.org/jira/browse/TEZ-3135
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Vijay Kumar
> Attachments: TEZ-3135.patch
>
>
> Tez fails to compile for the following modules: 
> tez-ext-service-tests, 
> tez-plugins/tez-yarn-timeline-history
> tez-tools/tez-javadoc-tools
> With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3135) tez-ext-service-tests, tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing dependencies

2016-02-23 Thread Vijay Kumar (JIRA)
Vijay Kumar created TEZ-3135:


 Summary: tez-ext-service-tests, 
tez-plugins/tez-yarn-timeline-history and tez-tools/tez-javadoc-tools missing 
dependencies
 Key: TEZ-3135
 URL: https://issues.apache.org/jira/browse/TEZ-3135
 Project: Apache Tez
  Issue Type: Bug
Reporter: Vijay Kumar


Tez fails to compile for the following modules: 
tez-ext-service-tests, 
tez-plugins/tez-yarn-timeline-history
tez-tools/tez-javadoc-tools

With dependency added as per the patch attached solves the failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3119) Add missing AM translations in DeprecatedKeys#populateMRToDagParamMap

2016-02-23 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159486#comment-15159486
 ] 

TezQA commented on TEZ-3119:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12789049/TEZ-3119.001.patch
  against master revision d1dee43.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1503//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1503//console

This message is automatically generated.

> Add missing AM translations in DeprecatedKeys#populateMRToDagParamMap
> -
>
> Key: TEZ-3119
> URL: https://issues.apache.org/jira/browse/TEZ-3119
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3119.001.patch
>
>
> MRToDagParamMap is missing some of the relevant configs. Some of them include:
> {code}
> TEZ_CREDENTIALS_PATH
> TEZ_AM_LOG_LEVEL
> TEZ_AM_MAX_APP_ATTEMPTS
> TEZ_AM_RESOURCE_MEMORY_MB
> TEZ_AM_RESOURCE_CPU_VCORES
> TEZ_AM_CLIENT_THREAD_COUNT
> TEZ_AM_CLIENT_AM_PORT_RANGE
> TEZ_AM_RM_HEARTBEAT_INTERVAL_MS_MAX
> TASK_HEARTBEAT_TIMEOUT_MS
> TEZ_TASK_AM_HEARTBEAT_INTERVAL_MS
> TEZ_AM_APPLICATION_PRIORITY
> TEZ_AM_VIEW_ACLS
> TEZ_AM_MODIFY_ACLS
> TEZ_CANCEL_DELEGATION_TOKENS_ON_COMPLETION
> TEZ_AM_CONTAINERLAUNCHER_THREAD_COUNT_LIMIT
> TEZ_AM_CONTAINERLAUNCHER_THREAD_COUNT_LIMIT
> TEZ_AM_LEGACY_SPECULATIVE_SLOWTASK_THRESHOLD
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3119 PreCommit Build #1503

2016-02-23 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3119
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1503/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 3624 lines...]
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-tests
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12789049/TEZ-3119.001.patch
  against master revision d1dee43.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1503//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1503//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
9a16033a85b4a3bf2b0cea93ed172cac6287a307 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
7 tests failed.
FAILED:  org.apache.tez.test.TestFaultTolerance.testRandomFailingInputs

Error Message:
expected: but was:

Stack Trace:
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:141)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120)
at 
org.apache.tez.test.TestFaultTolerance.testRandomFailingInputs(TestFaultTolerance.java:763)


FAILED:  org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120)
at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:261)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 

[jira] [Commented] (TEZ-2962) Use per partition stats in shuffle vertex manager auto parallelism

2016-02-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159414#comment-15159414
 ] 

Bikas Saha commented on TEZ-2962:
-

The downside of partition stats is that the values are approximate in buckets 
of 1mb/10mb/100mb etc. So 100MB stat could imply 900mb actual data size. So 
respecting max data size per task can become tricky.

> Use per partition stats in shuffle vertex manager auto parallelism
> --
>
> Key: TEZ-2962
> URL: https://issues.apache.org/jira/browse/TEZ-2962
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Priority: Critical
>
> The original code used output size sent by completed tasks. Recently per 
> partition stats have been added that provide granular information. Using 
> partition stats may be more accurate and also remove the duplicate counting 
> of data size in partition stats and per task overall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2962) Use per partition stats in shuffle vertex manager auto parallelism

2016-02-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated TEZ-2962:

Priority: Critical  (was: Major)

> Use per partition stats in shuffle vertex manager auto parallelism
> --
>
> Key: TEZ-2962
> URL: https://issues.apache.org/jira/browse/TEZ-2962
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Bikas Saha
>Priority: Critical
>
> The original code used output size sent by completed tasks. Recently per 
> partition stats have been added that provide granular information. Using 
> partition stats may be more accurate and also remove the duplicate counting 
> of data size in partition stats and per task overall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3126) Log reason for not reducing parallelism

2016-02-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated TEZ-3126:

Priority: Minor  (was: Critical)

> Log reason for not reducing parallelism
> ---
>
> Key: TEZ-3126
> URL: https://issues.apache.org/jira/browse/TEZ-3126
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Minor
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3126.1.patch, TEZ-3126.2.patch
>
>
> For example, when reducing parallelism from 36 to 22. The basePartitionRange 
> will be 1 and will not re-configure the vertex.
> {code:java|title=ShuffleVertexManager#determineParallelismAndApply|borderStyle=dashed|bgColor=lightgrey}
> int desiredTaskParallelism = 
> (int)(
> (expectedTotalSourceTasksOutputSize+desiredTaskInputDataSize-1)/
> desiredTaskInputDataSize);
> if(desiredTaskParallelism < minTaskParallelism) {
>   desiredTaskParallelism = minTaskParallelism;
> }
> 
> if(desiredTaskParallelism >= currentParallelism) {
>   return true;
> }
> 
> // most shufflers will be assigned this range
> basePartitionRange = currentParallelism/desiredTaskParallelism;
> 
> if (basePartitionRange <= 1) {
>   // nothing to do if range is equal 1 partition. shuffler does it by 
> default
>   return true;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3126) Log reason for not reducing parallelism

2016-02-23 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159312#comment-15159312
 ] 

Jonathan Eagles commented on TEZ-3126:
--

[~rohini],  TEZ-2962 was filed before to make the change you are suggesting 
and in the above comment. Let's track that work in that JIRA. Linking these two 
together to better track this issue.

> Log reason for not reducing parallelism
> ---
>
> Key: TEZ-3126
> URL: https://issues.apache.org/jira/browse/TEZ-3126
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3126.1.patch, TEZ-3126.2.patch
>
>
> For example, when reducing parallelism from 36 to 22. The basePartitionRange 
> will be 1 and will not re-configure the vertex.
> {code:java|title=ShuffleVertexManager#determineParallelismAndApply|borderStyle=dashed|bgColor=lightgrey}
> int desiredTaskParallelism = 
> (int)(
> (expectedTotalSourceTasksOutputSize+desiredTaskInputDataSize-1)/
> desiredTaskInputDataSize);
> if(desiredTaskParallelism < minTaskParallelism) {
>   desiredTaskParallelism = minTaskParallelism;
> }
> 
> if(desiredTaskParallelism >= currentParallelism) {
>   return true;
> }
> 
> // most shufflers will be assigned this range
> basePartitionRange = currentParallelism/desiredTaskParallelism;
> 
> if (basePartitionRange <= 1) {
>   // nothing to do if range is equal 1 partition. shuffler does it by 
> default
>   return true;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3105) Tez does not run on IBM JDK 7 or 8

2016-02-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159310#comment-15159310
 ] 

Hitesh Shah commented on TEZ-3105:
--

Thanks Greg.

> Tez does not run on IBM JDK 7 or 8
> --
>
> Key: TEZ-3105
> URL: https://issues.apache.org/jira/browse/TEZ-3105
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Greg Senia
>Assignee: Greg Senia
>  Labels: ibm, ibm-jdk
> Attachments: TEZ-3105-2.patch, TEZ-3105.patch
>
>
> When testing Hive on Tez with IBM JDK 7 and 8. The following issue was 
> discovered:
> 2016-02-08 22:25:22,869 [ERROR] [main] |app.DAGAppMaster|: Error starting 
> DAGAppMaster
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.yarn.util.ResourceCalculatorProcessTree.getResourceCalculatorProcessTree(ResourceCalculatorProcessTree.java:225)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initResourceCalculatorPlugins(DAGAppMaster.java:347)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:371)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at org.apache.tez.dag.app.DAGAppMaster$6.run(DAGAppMaster.java:2274)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:686)
>   at javax.security.auth.Subject.doAs(Subject.java:569)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2271)
>   at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2086)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:88)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:57)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:436)
>   at 
> org.apache.hadoop.yarn.util.ResourceCalculatorProcessTree.getResourceCalculatorProcessTree(ResourceCalculatorProcessTree.java:221)
>   ... 9 more
> Caused by: java.lang.ClassCastException: 
> com.ibm.lang.management.ExtendedOperatingSystem incompatible with 
> com.sun.management.OperatingSystemMXBean
>   at 
> org.apache.tez.util.TezMxBeanResourceCalculator.(TezMxBeanResourceCalculator.java:44)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3126) Log reason for not reducing parallelism

2016-02-23 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159291#comment-15159291
 ] 

Rohini Palaniswamy commented on TEZ-3126:
-

bq. Wouldn't that break the max data per reducer limit? Ignoring the min data 
hint may be fine but ignoring the max data limit could result in failure 
because it may break operator assumptions (eg. size of hash table etc.). Say 
the reducer was designed to handle 1G of data and we send it 1.7G instead.
  Currently code is blinding assuming all reducers have equal data and 
combining consecutive basePartitionRange number of reducers into one reducer. 
This already is sending more than desiredTaskSize data to some reducers 
ignoring the max data limit and empty data to other reducers when filters are 
used and data is skewed which is very inefficient. Proper fix for this is to 
bucket according to size as discussed in previous comments and combine reducers 
based on that. basePartitionRange should allowed to be a fraction to group 
better, but if we are bucketing and grouping on size basePartitionRange will 
not be required anymore as partitioning is not based on ranges.

> Log reason for not reducing parallelism
> ---
>
> Key: TEZ-3126
> URL: https://issues.apache.org/jira/browse/TEZ-3126
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3126.1.patch, TEZ-3126.2.patch
>
>
> For example, when reducing parallelism from 36 to 22. The basePartitionRange 
> will be 1 and will not re-configure the vertex.
> {code:java|title=ShuffleVertexManager#determineParallelismAndApply|borderStyle=dashed|bgColor=lightgrey}
> int desiredTaskParallelism = 
> (int)(
> (expectedTotalSourceTasksOutputSize+desiredTaskInputDataSize-1)/
> desiredTaskInputDataSize);
> if(desiredTaskParallelism < minTaskParallelism) {
>   desiredTaskParallelism = minTaskParallelism;
> }
> 
> if(desiredTaskParallelism >= currentParallelism) {
>   return true;
> }
> 
> // most shufflers will be assigned this range
> basePartitionRange = currentParallelism/desiredTaskParallelism;
> 
> if (basePartitionRange <= 1) {
>   // nothing to do if range is equal 1 partition. shuffler does it by 
> default
>   return true;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3128) Avoid stopping containers on the AM shutdown thread

2016-02-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159263#comment-15159263
 ] 

Hitesh Shah edited comment on TEZ-3128 at 2/23/16 5:49 PM:
---

We do need to release/stop them before shutdown as there is no guarantee on 
when the AM will be killed ( think the default is less than a few seconds ) 
after unregistering if the AM still has pending work ( flushing events, etc). 
We will lose out on history data if we go with that approach. 

My point was whether we can get away with releasing running containers to YARN 
instead of calling stop on each of them via the NM proxy. If we cannot release 
them, then we need to reduce the timeout and use a new NM client proxy with the 
modified timeouts to stop the containers. 

  


was (Author: hitesh):
We do need to release/stop them before shutdown as there is no guarantee on 
when the AM will be killed after unregistering if the AM still has pending work 
( flushing events, etc).

My point was whether we can get away with releasing running containers to YARN 
instead of calling stop on each of them via the NM proxy. If we cannot release 
them, then we need to reduce the timeout and use a new NM client proxy with the 
modified timeouts to stop the containers. 

  

> Avoid stopping containers on the AM shutdown thread
> ---
>
> Key: TEZ-3128
> URL: https://issues.apache.org/jira/browse/TEZ-3128
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.0-alpha
>Reporter: Siddharth Seth
>Assignee: Tsuyoshi Ozawa
>  Labels: newbie
> Attachments: TEZ-3128.001.patch, amJstack
>
>
> During an AM shutdown, the TaskCommunicator is also shutdown and it tries to 
> stop containers in the shutdown thread itself. This can cause the AM shutdown 
> to block if NMs are not available.
> This likely affects 0.7 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread

2016-02-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159263#comment-15159263
 ] 

Hitesh Shah commented on TEZ-3128:
--

We do need to release/stop them before shutdown as there is no guarantee on 
when the AM will be killed after unregistering if the AM still has pending work 
( flushing events, etc).

My point was whether we can get away with releasing running containers to YARN 
instead of calling stop on each of them via the NM proxy. If we cannot release 
them, then we need to reduce the timeout and use a new NM client proxy with the 
modified timeouts to stop the containers. 

  

> Avoid stopping containers on the AM shutdown thread
> ---
>
> Key: TEZ-3128
> URL: https://issues.apache.org/jira/browse/TEZ-3128
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.0-alpha
>Reporter: Siddharth Seth
>Assignee: Tsuyoshi Ozawa
>  Labels: newbie
> Attachments: TEZ-3128.001.patch, amJstack
>
>
> During an AM shutdown, the TaskCommunicator is also shutdown and it tries to 
> stop containers in the shutdown thread itself. This can cause the AM shutdown 
> to block if NMs are not available.
> This likely affects 0.7 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3132) Flaky test: TestContainerReuse.testReuseConflictLocalResources

2016-02-23 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159231#comment-15159231
 ] 

Siddharth Seth commented on TEZ-3132:
-

There's another jira for TestContainerReuse - almost all of the tests are flaky 
IIRC - caused by some timing races.

> Flaky test: TestContainerReuse.testReuseConflictLocalResources
> --
>
> Key: TEZ-3132
> URL: https://issues.apache.org/jira/browse/TEZ-3132
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3134) tez-dag should depend on commons-collections4

2016-02-23 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3134:
-
Summary: tez-dag should depend on commons-collections4  (was: tez-dag 
should depend on commons-collections4 )

> tez-dag should depend on commons-collections4
> -
>
> Key: TEZ-3134
> URL: https://issues.apache.org/jira/browse/TEZ-3134
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Trivial
> Fix For: 0.8.3
>
> Attachments: TEZ-3134.1.patch
>
>
> Missing dependency on commons-collections4 given that 
> tez-dag/src/main/java/org/apache/tez/dag/app/TaskCommunicatorManager.java 
> uses org.apache.commons.collections4.ListUtils 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3128) Avoid stopping containers on the AM shutdown thread

2016-02-23 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-3128:

Attachment: amJstack

Here's the stack trace. TezContainerLauncher invokes shutdownAllContainers - 
which seems to try and kill the containers - one at a time - before shutdown.

Releasing containers is sufficient. [~hitesh] - last I spoke to you about this, 
I got the impression that a stop container was required for some reason. If 
that understanding was incorrect, we should be able to avoid the container 
stop. Do we even need to release containers ? If the app is shutting down, YARN 
should take care of this on it's own - once the application unregisters ? (or 
is that where the problem is - the unregistration happens at the end and hence 
we should release the containers early).
IIRC, even for unregistration, YARN allows an app to unregister early - and 
will not kill the AM for a certain amount of time after that.

> Avoid stopping containers on the AM shutdown thread
> ---
>
> Key: TEZ-3128
> URL: https://issues.apache.org/jira/browse/TEZ-3128
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.0-alpha
>Reporter: Siddharth Seth
>Assignee: Tsuyoshi Ozawa
>  Labels: newbie
> Attachments: TEZ-3128.001.patch, amJstack
>
>
> During an AM shutdown, the TaskCommunicator is also shutdown and it tries to 
> stop containers in the shutdown thread itself. This can cause the AM shutdown 
> to block if NMs are not available.
> This likely affects 0.7 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3128) Avoid stopping containers on the AM shutdown thread

2016-02-23 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-3128:

Target Version/s: 0.8.3

> Avoid stopping containers on the AM shutdown thread
> ---
>
> Key: TEZ-3128
> URL: https://issues.apache.org/jira/browse/TEZ-3128
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.0-alpha
>Reporter: Siddharth Seth
>Assignee: Tsuyoshi Ozawa
>  Labels: newbie
> Attachments: TEZ-3128.001.patch
>
>
> During an AM shutdown, the TaskCommunicator is also shutdown and it tries to 
> stop containers in the shutdown thread itself. This can cause the AM shutdown 
> to block if NMs are not available.
> This likely affects 0.7 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-3134) tez-dag should depend on commons-collections4

2016-02-23 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah resolved TEZ-3134.
--
   Resolution: Fixed
Fix Version/s: 0.8.3

Committing trivial fix. 

> tez-dag should depend on commons-collections4 
> --
>
> Key: TEZ-3134
> URL: https://issues.apache.org/jira/browse/TEZ-3134
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Trivial
> Fix For: 0.8.3
>
> Attachments: TEZ-3134.1.patch
>
>
> Missing dependency on commons-collections4 given that 
> tez-dag/src/main/java/org/apache/tez/dag/app/TaskCommunicatorManager.java 
> uses org.apache.commons.collections4.ListUtils 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3134) tez-dag should depend on commons-collections4

2016-02-23 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3134:
-
Attachment: TEZ-3134.1.patch

> tez-dag should depend on commons-collections4 
> --
>
> Key: TEZ-3134
> URL: https://issues.apache.org/jira/browse/TEZ-3134
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Trivial
> Attachments: TEZ-3134.1.patch
>
>
> Missing dependency on commons-collections4 given that 
> tez-dag/src/main/java/org/apache/tez/dag/app/TaskCommunicatorManager.java 
> uses org.apache.commons.collections4.ListUtils 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-3134) tez-dag should depend on commons-collections4

2016-02-23 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned TEZ-3134:


Assignee: Hitesh Shah

> tez-dag should depend on commons-collections4 
> --
>
> Key: TEZ-3134
> URL: https://issues.apache.org/jira/browse/TEZ-3134
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Hitesh Shah
>Priority: Trivial
> Attachments: TEZ-3134.1.patch
>
>
> Missing dependency on commons-collections4 given that 
> tez-dag/src/main/java/org/apache/tez/dag/app/TaskCommunicatorManager.java 
> uses org.apache.commons.collections4.ListUtils 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3134) tez-dag should depend on commons-collections4

2016-02-23 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-3134:


 Summary: tez-dag should depend on commons-collections4 
 Key: TEZ-3134
 URL: https://issues.apache.org/jira/browse/TEZ-3134
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Trivial


Missing dependency on commons-collections4 given that 
tez-dag/src/main/java/org/apache/tez/dag/app/TaskCommunicatorManager.java uses 
org.apache.commons.collections4.ListUtils 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3105) Tez does not run on IBM JDK 7 or 8

2016-02-23 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159043#comment-15159043
 ] 

Greg Senia commented on TEZ-3105:
-

I will finish the patch this evening

> Tez does not run on IBM JDK 7 or 8
> --
>
> Key: TEZ-3105
> URL: https://issues.apache.org/jira/browse/TEZ-3105
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Greg Senia
>Assignee: Greg Senia
>  Labels: ibm, ibm-jdk
> Attachments: TEZ-3105-2.patch, TEZ-3105.patch
>
>
> When testing Hive on Tez with IBM JDK 7 and 8. The following issue was 
> discovered:
> 2016-02-08 22:25:22,869 [ERROR] [main] |app.DAGAppMaster|: Error starting 
> DAGAppMaster
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.yarn.util.ResourceCalculatorProcessTree.getResourceCalculatorProcessTree(ResourceCalculatorProcessTree.java:225)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initResourceCalculatorPlugins(DAGAppMaster.java:347)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:371)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at org.apache.tez.dag.app.DAGAppMaster$6.run(DAGAppMaster.java:2274)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:686)
>   at javax.security.auth.Subject.doAs(Subject.java:569)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2271)
>   at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2086)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:88)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:57)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:436)
>   at 
> org.apache.hadoop.yarn.util.ResourceCalculatorProcessTree.getResourceCalculatorProcessTree(ResourceCalculatorProcessTree.java:221)
>   ... 9 more
> Caused by: java.lang.ClassCastException: 
> com.ibm.lang.management.ExtendedOperatingSystem incompatible with 
> com.sun.management.OperatingSystemMXBean
>   at 
> org.apache.tez.util.TezMxBeanResourceCalculator.(TezMxBeanResourceCalculator.java:44)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3133) many tez tasks only run on one node in cluster

2016-02-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158904#comment-15158904
 ] 

Hitesh Shah commented on TEZ-3133:
--

Feel free to attach the yarn application logs for your job to this jira which 
will help in the discussion on the mailing list. 

> many tez tasks only run on one node in cluster
> --
>
> Key: TEZ-3133
> URL: https://issues.apache.org/jira/browse/TEZ-3133
> Project: Apache Tez
>  Issue Type: Test
> Environment: hadoop2.6.0+tez0.7.0
>Reporter: Lvpenglin
>
> I tested the  tez example OrderedWordCount on my cluster.My input file more 
> than 200MB,and it will generate 4 tasks.But I found this 4 tasks all run on 
> one node which is different everytime.My cluster contains one master and two 
> slaves.My idea is this task should be run on different node,for example,there 
> are two  Tokenizer tasks should be run on two different nodes.this can show 
> the distribution on the cluster.I don't know how to explain this 
> phenomenon.and I guess if need to modify some configuration or set some 
> information to realize my imaginative distribution. Who can explain why or 
> tell me if need some configuration.Thank you very much 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3124) Running task hangs due to missing event to initialize input in recovery

2016-02-23 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158781#comment-15158781
 ] 

Jeff Zhang commented on TEZ-3124:
-

[~bikassaha] [~hitesh] The timeout is due to too many test scenario which takes 
a long time. Only run a subset of tests randomly in the new patch. 

> Running task hangs due to missing event to initialize input in recovery
> ---
>
> Key: TEZ-3124
> URL: https://issues.apache.org/jira/browse/TEZ-3124
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.2
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
>  Labels: Recovery
> Fix For: 0.8.3
>
> Attachments: TEZ-3124-1.patch, TEZ-3124-2.patch, TEZ-3124-3.patch, 
> TEZ-3124-4.patch, a.log
>
>
> {noformat}
> 2016-02-09 04:48:42 Starting to run new task attempt: 
> attempt_1454993155302_0001_1_00_61_3
> /attempt_1454993155302_0001_1_00_61
> 2016-02-09 04:48:43,196 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInput|: MRInput using newmapreduce API=true, split via event=true, 
> numPhysicalInputs=1
> 2016-02-09 04:48:43,200 [INFO] [I/O Setup 0 Initialize: {MRInput}] 
> |input.MRInputLegacy|: MRInput MRInputLegacy deferring initialization
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Initialized processor
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 2 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 initializers to finish
> 2016-02-09 04:48:43,333 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: All initializers finished
> 2016-02-09 04:48:43,345 [INFO] [TezChild] |resources.MemoryDistributor|: 
> InitialRequests=[MRInput:INPUT:0:org.apache.tez.mapreduce.input.MRInputLegacy],
>  
> [ireduce1:OUTPUT:1802502144:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput]
> 2016-02-09 04:48:43,559 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: 
> ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1]
> 2016-02-09 04:48:43,563 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.3, 
> AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32996
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 
> 2, numScaledRequests: 13, TotalRequested: 1802502144, TotalRequestedScaled: 
> 1.663848132923077E9, TotalJVMHeap: 2577399808, TotalAvailable: 1726857871, 
> TotalRequested/TotalJVMHeap:0.70
> 2016-02-09 04:48:43,564 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[MRInput:org.apache.tez.mapreduce.input.MRInputLegacy:INPUT:0:0], 
> [ireduce1:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:1802502144:1726857871]
> 2016-02-09 04:48:43,564 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Starting Inputs/Outputs
> 2016-02-09 04:48:43,572 [INFO] [I/O Setup 1 Start: {MRInput}] 
> |runtime.LogicalIOProcessorRuntimeTask|: Started Input with src edge: MRInput
> 2016-02-09 04:48:43,572 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Input: MRInput being auto started by 
> the framework. Subsequent instances will not be auto-started
> 2016-02-09 04:48:43,573 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Num IOs determined for AutoStart: 1
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: Waiting for 1 IOs to start
> 2016-02-09 04:48:43,574 [INFO] [TezChild] 
> |runtime.LogicalIOProcessorRuntimeTask|: AutoStartComplete
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |task.TaskRunner2Callable|: Running 
> task, taskAttemptId=attempt_1454993155302_0001_1_00_61_3
> 2016-02-09 04:48:43,583 [INFO] [TezChild] |map.MapProcessor|: Running map: 
> attempt_1454993155302_0001_1_00_61_3_10001
> 2016-02-09 04:48:43,675 [INFO] [TezChild] |impl.ExternalSorter|: ireduce1 
> using: memoryMb=1646, keySerializerClass=class 
> org.apache.hadoop.io.IntWritable, 
> valueSerializerClass=org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer@5f143de6,
>  comparator=org.apache.hadoop.io.IntWritable$Comparator@ec52d1f, 
> partitioner=org.apache.tez.mapreduce.partition.MRPartitioner, 
> serialization=org.apache.hadoop.io.serializer.WritableSerialization
> 2016-02-09 04:48:43,686 [INFO] [TezChild] |impl.PipelinedSorter|: Setting up 
> PipelinedSorter for ireduce1: , UsingHashComparator=false
> 2016-02-09 04:48:45,093 [INFO] [TezChild] |impl.PipelinedSorter|: Newly 
> allocated block size=1725956096, index=0, 

[jira] [Created] (TEZ-3133) many tez tasks only run on one node in cluster

2016-02-23 Thread Lvpenglin (JIRA)
Lvpenglin created TEZ-3133:
--

 Summary: many tez tasks only run on one node in cluster
 Key: TEZ-3133
 URL: https://issues.apache.org/jira/browse/TEZ-3133
 Project: Apache Tez
  Issue Type: Test
 Environment: hadoop2.6.0+tez0.7.0
Reporter: Lvpenglin


I tested the  tez example OrderedWordCount on my cluster.My input file more 
than 200MB,and it will generate 4 tasks.But I found this 4 tasks all run on one 
node which is different everytime.My cluster contains one master and two 
slaves.My idea is this task should be run on different node,for example,there 
are two  Tokenizer tasks should be run on two different nodes.this can show the 
distribution on the cluster.I don't know how to explain this phenomenon.and I 
guess if need to modify some configuration or set some information to realize 
my imaginative distribution. Who can explain why or tell me if need some 
configuration.Thank you very much 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)