[ 
https://issues.apache.org/jira/browse/TEZ-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249305#comment-14249305
 ] 

Jeff Zhang commented on TEZ-1852:
---------------------------------

[~hitesh], it is the same if TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH is not set to true

> WordCount in local mode hangs in shuffle phase
> ----------------------------------------------
>
>                 Key: TEZ-1852
>                 URL: https://issues.apache.org/jira/browse/TEZ-1852
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Priority: Critical
>
> WordCount hangs at the progress of 50%, it looks like something wrong in the 
> shuffle phase. [~pramachandran], could you take at look at this ? I remember 
> you had fixed this issue before. But now it comes out again, I verify that it 
> works in 0.5.1, but it doesn't work starting from 0.5.2.
> here's the log in when I run wordcount
> {code}
> 2014-12-15 12:19:33,874 INFO  [Initializer 1] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:call(434)) - Initialized Output with dest 
> edge: Output
> 2014-12-15 12:19:33,874 INFO  [TezChild] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:initialize(223)) - All initializers 
> finished
> 2014-12-15 12:19:33,874 INFO  [TezChild] resources.MemoryDistributor 
> (MemoryDistributor.java:makeInitialAllocations(124)) - Using Allocator class: 
> org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor
> 2014-12-15 12:19:33,874 INFO  [TezChild] 
> resources.WeightedScalingMemoryDistributor 
> (WeightedScalingMemoryDistributor.java:getRequestTypeForClass(197)) - Falling 
> back to RequestType.OTHER for class: org.apache.tez.mapreduce.output.MROutput
> 2014-12-15 12:19:33,874 INFO  [TezChild] 
> resources.WeightedScalingMemoryDistributor 
> (WeightedScalingMemoryDistributor.java:initialProcessMemoryRequestContext(171))
>  - ScaleFactor: 1, for type: OTHER
> 2014-12-15 12:19:33,874 INFO  [TezChild] 
> resources.WeightedScalingMemoryDistributor 
> (WeightedScalingMemoryDistributor.java:initialProcessMemoryRequestContext(171))
>  - ScaleFactor: 12, for type: SORTED_MERGED_INPUT
> 2014-12-15 12:19:33,875 INFO  [TezChild] 
> resources.WeightedScalingMemoryDistributor 
> (WeightedScalingMemoryDistributor.java:computeReservedFraction(262)) - 
> InitialReservationFraction=0.3, AdditionalReservationFractionForIOs=0.03, 
> finalReserveFractionUsed=0.32999999999999996
> 2014-12-15 12:19:33,875 INFO  [TezChild] 
> resources.WeightedScalingMemoryDistributor 
> (WeightedScalingMemoryDistributor.java:assignMemory(126)) - Scaling Requests. 
> NumRequests: 2, numScaledRequests: 13, TotalRequested: 859255552, 
> TotalRequestedScaled: 7.931589710769231E8, TotalJVMHeap: 954728448, 
> TotalAvailable: 639668060, TotalRequested/TotalJVMHeap:0.90
> 2014-12-15 12:19:33,875 INFO  [TezChild] resources.MemoryDistributor 
> (MemoryDistributor.java:makeInitialAllocations(138)) - Informing: OUTPUT, 
> Output, org.apache.tez.mapreduce.output.MROutput: requested=0, allocated=0
> 2014-12-15 12:19:33,875 INFO  [TezChild] resources.MemoryDistributor 
> (MemoryDistributor.java:makeInitialAllocations(138)) - Informing: INPUT, 
> Tokenizer, org.apache.tez.runtime.library.input.OrderedGroupedKVInput: 
> requested=859255552, allocated=639668060
> 2014-12-15 12:19:33,875 INFO  [TezChild] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:initialize(245)) - Starting Inputs/Outputs
> 2014-12-15 12:19:33,875 INFO  [TezChild] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:initialize(258)) - Input: Tokenizer being 
> auto started by the framework. Subsequent instances will not be auto-started
> 2014-12-15 12:19:33,875 INFO  [Initializer 0] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:call(402)) - Starting Input with src 
> edge: Tokenizer
> 2014-12-15 12:19:33,875 INFO  [TezChild] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:initialize(279)) - Num IOs determined for 
> AutoStart: 1
> 2014-12-15 12:19:33,875 INFO  [TezChild] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:initialize(281)) - Waiting for 1 IOs to 
> start
> 2014-12-15 12:19:33,882 INFO  [Initializer 0] common.TezRuntimeUtils 
> (TezRuntimeUtils.java:instantiateCombiner(65)) - No combiner specified via 
> tez.runtime.combiner.class. Combiner will not be used
> 2014-12-15 12:19:33,883 INFO  [Initializer 0] orderedgrouped.Shuffle 
> (Shuffle.java:<init>(179)) - Shuffle assigned with 1 inputs, codec: 
> NoneifileReadAhead: true
> 2014-12-15 12:19:33,886 INFO  [Initializer 0] orderedgrouped.ShuffleScheduler 
> (ShuffleScheduler.java:<init>(150)) - ShuffleScheduler running for 
> sourceVertex: Tokenizer with configuration: 
> maxFetchFailuresBeforeReporting=5, reportReadErrorImmediately=true, 
> maxFailedUniqueFetches=1, abortFailureLimit=30, maxMapRuntime=0
> 2014-12-15 12:19:33,887 INFO  [Initializer 0] orderedgrouped.MergeManager 
> (MergeManager.java:<init>(221)) - InitialRequest: ShuffleMem=859255552, 
> postMergeMem=0, RuntimeTotalAvailable=639668060. Updated to: 
> ShuffleMem=639668060, postMergeMem: 0
> 2014-12-15 12:19:33,887 INFO  [Initializer 0] orderedgrouped.MergeManager 
> (MergeManager.java:<init>(252)) - MergerManager: memoryLimit=639668060, 
> maxSingleShuffleLimit=159917008, mergeThreshold=575701184, ioSortFactor=100, 
> memToMemMergeOutputsThreshold=100
> 2014-12-15 12:19:33,890 INFO  [Initializer 0] orderedgrouped.Shuffle 
> (Shuffle.java:<init>(226)) - Num fetchers being started: 1
> 2014-12-15 12:19:33,891 INFO  [Initializer 0] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:call(404)) - Started Input with src edge: 
> Tokenizer
> 2014-12-15 12:19:33,891 INFO  [TezChild] 
> runtime.LogicalIOProcessorRuntimeTask 
> (LogicalIOProcessorRuntimeTask.java:initialize(294)) - AutoStartComplete
> 2014-12-15 12:19:33,891 INFO  [TezChild] task.TezTaskRunner 
> (TezTaskRunner.java:run(175)) - Running task, 
> taskAttemptId=attempt_1418617172198_0001_1_01_000000_0
> 2014-12-15 12:19:33,891 INFO  
> [TezTaskEventRouter[attempt_1418617172198_0001_1_01_000000_0]] 
> orderedgrouped.ShuffleInputEventHandlerOrderedGrouped 
> (ShuffleInputEventHandlerOrderedGrouped.java:processDataMovementEvent(83)) - 
> DME srcIdx: 0, targetIdx: 0, attemptNum: 0, payload: [host: jzhangMBPr.local, 
> port: 0, pathComponent: attempt_1418617172198_0001_1_00_000000_0_10003, 
> runDuration: 327458, ]
> 2014-12-15 12:19:34,311 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 
> 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:34,312 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Tokenizer 
> Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
> 2014-12-15 12:19:34,313 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Summation 
> Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:39,330 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 
> 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:39,331 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Tokenizer 
> Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
> 2014-12-15 12:19:39,331 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Summation 
> Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:44,348 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 
> 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:44,349 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Tokenizer 
> Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
> 2014-12-15 12:19:44,350 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Summation 
> Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:49,365 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 
> 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:49,367 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Tokenizer 
> Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
> 2014-12-15 12:19:49,369 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Summation 
> Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:54,388 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 
> 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:54,390 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Tokenizer 
> Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
> 2014-12-15 12:19:54,392 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) -     VertexStatus: VertexName: Summation 
> Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
> 2014-12-15 12:19:59,406 INFO  [main] client.DAGClientImpl 
> (DAGClientImpl.java:log(463)) - DAG: State: RUNNING Progress: 50% TotalTasks: 
> 2 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
> {code}
> You can reproduce this by adding these 3 lines of code in WordCount
> {code}
>     tezConf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true);
>     tezConf.set("fs.defaultFS", "file:///");
>     
> tezConf.setBoolean(TezRuntimeConfiguration.TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH, 
> true);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to