[ https://issues.apache.org/jira/browse/TEZ-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519864#comment-14519864 ]
Hitesh Shah commented on TEZ-2378: ---------------------------------- bq. These are normal exceptions when fetcher is unable to get the data from local system I don't understand. Why is the fetcher failing to read off the local filesystem? That should be a cause for concern I would assume. A distraction yes when you know that there is something else wrong but a problem regardless if this is happening in the first place. Also, if the local fetch fails, do we error out as falling back to the http fetch would likely hit the same error? bq. 2015-04-28 05:41:45,487 WARN [Fetcher [Map_5] #15] shuffle.Fetcher: Failed to shuffle output of InputAttemptIdentifier [inputIdentifier=InputIdentifier [inputIndex=81], attemptNumber=0, pathComponent=attempt_1429683757595_0485_1_03_000081_0_10003, fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1] from cn047-10.l42scl.hortonworks.com(local fetch) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429683757595_0485_1_03_000081_0_10003/file.out.index in any of the configured local directories Also, I am assuming that we are not making calls to the local fetcher when the data is remote or if the output size is 0? So, the above error should only be seen when the map output somehow disappeared off the local disk? > In case Fetcher (unordered) fails to do local fetch, log in debug mode to > reduce log size > ----------------------------------------------------------------------------------------- > > Key: TEZ-2378 > URL: https://issues.apache.org/jira/browse/TEZ-2378 > Project: Apache Tez > Issue Type: Bug > Reporter: Rajesh Balamohan > > Following can be logged as debug mode as opposed to WARN level. May be > counters can be added later to track the number of times it failed to do > local-fetch. > {noformat} > 2015-04-28 05:41:45,487 WARN [Fetcher [Map_5] #15] shuffle.Fetcher: Failed to > shuffle output of InputAttemptIdentifier [inputIdentifier=InputIdentifier > [inputIndex=81], attemptNumber=0, > pathComponent=attempt_1429683757595_0485_1_03_000081_0_10003, > fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1] from > cn047-10.l42scl.hortonworks.com(local fetch) > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429683757595_0485_1_03_000081_0_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:612) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:592) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:537) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doSharedFetch(Fetcher.java:353) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:192) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)