[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520264#comment-14520264 ]
Siddharth Seth commented on TEZ-2366: ------------------------------------- Shuffle port should be sufficient though, right ? Won't have two Shuffle listeners on the same port if there's multiple NodeManagers. The NodeId would be host + RPC port ? unless YARN has added an actual identifier which is unique across restarts of the same service. > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > -------------------------------------------------------------------- > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug > Reporter: Daniel Dai > Priority: Critical > Attachments: TEZ-2366.test.txt, TEZ-2366.wip.1.patch > > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_000000_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)