[ https://issues.apache.org/jira/browse/TEZ-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696636#comment-14696636 ]
TezQA commented on TEZ-2719: ---------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750460/TEZ-2719.branch-0.7.patch against master revision 6b67b0b. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/990//console This message is automatically generated. > Consider reducing logs in unordered fetcher with shared-fetch option > -------------------------------------------------------------------- > > Key: TEZ-2719 > URL: https://issues.apache.org/jira/browse/TEZ-2719 > Project: Apache Tez > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Attachments: TEZ-2719.1.patch, TEZ-2719.2.patch, > TEZ-2719.branch-0.7.patch > > > For large broadcast, this can be a problem > e.g > In one of the jobs (query_17 @ 10 TB scale), Map 7 generates around 1.1 GB of > data which is given to 330 tasks in downstream Map 1. > Map 1 uses all slots in cluster (~ 224 per wave). Until data is downloaded, > shared fetch would end up re-queuing fetches. As a part of it, it would end > up printing 3 logs per attempt. E.g > {noformat} > 2015-08-14 02:09:11,761 INFO [Fetcher [Map_7] #0] shuffle.Fetcher: Requeuing > machine1:13562 downloads because we didn't get a lock > 2015-08-14 02:09:11,761 INFO [Fetcher [Map_7] #0] shuffle.Fetcher: Shared > fetch failed to return 1 inputs on this try > 2015-08-14 02:09:11,761 INFO [ShuffleRunner [Map_7]] impl.ShuffleManager: > Scheduling fetch for inputHost: machine1:13562 > 2015-08-14 02:09:11,761 INFO [ShuffleRunner [Map_7]] impl.ShuffleManager: > Created Fetcher for host: machine1 with inputs: [InputAttemptIdentifier > [inputIdentifier=InputIdentifier [inputIndex=0], attemptNumber=0, > pathComponent=attempt_1439264591968_0058_1_04_000000_0_10029, > fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1]] > {noformat} > Based on disk / network, it might take time for fetcher to finish > downloading and release the lock. Since there was only one task in Map-1, it > ended up in a sort of tight loop generating relatively larger logs. > Looks like 260-290 MB task log files are created in this case per attempt. > That would be around 2.3 GB to 3 GB (depending on number of slots waiting) in > machine with 8-10 slots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)