[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2366: -- Attachment: TEZ-2366.4.patch fixed UT > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Prakash Ramachandran >Priority: Critical > Attachments: TEZ-2366.1.patch, TEZ-2366.2.patch, TEZ-2366.3.patch, > TEZ-2366.4.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch > > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2366: -- Attachment: TEZ-2366.3.patch addressed comments by [~sseth]. > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Prakash Ramachandran >Priority: Critical > Attachments: TEZ-2366.1.patch, TEZ-2366.2.patch, TEZ-2366.3.patch, > TEZ-2366.test.txt, TEZ-2366.wip.1.patch > > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2366: -- Attachment: TEZ-2366.2.patch patch 2 * added check for shufflemetadata being null. [~hitesh] the localcontainerlauncher sets the shuffle port to 0 {code} AuxiliaryServiceHelper.setServiceDataIntoEnv( ShuffleUtils.SHUFFLE_HANDLER_SERVICE_ID, ByteBuffer.allocate(4).putInt(0), localEnv); {code} the UT failure does not seem related, ran locally. > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Prakash Ramachandran >Priority: Critical > Attachments: TEZ-2366.1.patch, TEZ-2366.2.patch, TEZ-2366.test.txt, > TEZ-2366.wip.1.patch > > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2366: -- Attachment: TEZ-2366.1.patch > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai >Assignee: Prakash Ramachandran >Priority: Critical > Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch > > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2366: -- Attachment: TEZ-2366.wip.1.patch [~sseth] attaching a patch which checks the port along with the host. one quick question though. the mapreduce.shuffle.port is not exposed by yarn. is it fine to rely on that conf and its default value? if the patch looks ok. i can add the tests. > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai >Priority: Critical > Attachments: TEZ-2366.test.txt, TEZ-2366.wip.1.patch > > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2366: Attachment: TEZ-2366.test.txt This is what I believe causes this. Pig is running the MiniCluster with multiple instances of the NodeManager - which is good. The moment there's multiple instances however, the hostname on all of them matches, and each task attempts doing a local fetch, even though the data may have been generated on a different NodeManager (implying a different local-dir). Tez never runs into this, because all tests run with a single NodeManager instance. Two possible fixes - disable local-fetch for MiniCluster, for which I'm uploading a temporary patch. The other is to potentially make use of port information to figure out whether to do a local fetch or not. [~daijy] - please try out this patch. Alternately, disable local fetch in the pig config after setting up the cluster and calling getConfg on the cluster instance. > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai >Priority: Critical > Attachments: TEZ-2366.test.txt > > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2366: Priority: Critical (was: Blocker) > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai >Priority: Critical > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2366: Priority: Blocker (was: Major) > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai >Priority: Blocker > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated TEZ-2366: Summary: Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 (was: Pig tez local mode unit tests fail intermittently after TEZ-2333) > Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 > > > Key: TEZ-2366 > URL: https://issues.apache.org/jira/browse/TEZ-2366 > Project: Apache Tez > Issue Type: Bug >Reporter: Daniel Dai > > There are around 20 unit tests (out of around 2000) fail intermittently after > TEZ-2333. Here is a stack: > {code} > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any > of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > To reproduce that in Pig test, using the following commands: > svn co http://svn.apache.org/repos/asf/pig/trunk > ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism > test > Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to > "true" > (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). > I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to "false" in Pig and does > not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)