[jira] [Created] (TEZ-1619) Upgrade Hadoop dependency to 2.5
Bikas Saha created TEZ-1619: --- Summary: Upgrade Hadoop dependency to 2.5 Key: TEZ-1619 URL: https://issues.apache.org/jira/browse/TEZ-1619 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1620) Wait for application finish before stopping MiniTezCluster
Jeff Zhang created TEZ-1620: --- Summary: Wait for application finish before stopping MiniTezCluster Key: TEZ-1620 URL: https://issues.apache.org/jira/browse/TEZ-1620 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Currently, we sleep 10 seconds to wait for DAGAppMaster to finish, otherwise DAGAppMaster will hang there for connecting RM to unregister. We should wait for all the applications finish before stopping MiniTezCluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1568) Add system test for propagation of diagnostics for errors
[ https://issues.apache.org/jira/browse/TEZ-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-1568: Attachment: Tez-1568-8.patch Add system test for propagation of diagnostics for errors - Key: TEZ-1568 URL: https://issues.apache.org/jira/browse/TEZ-1568 Project: Apache Tez Issue Type: Sub-task Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-1568.patch, Tez-1568-2.patch, Tez-1568-3.patch, Tez-1568-4.patch, Tez-1568-5.patch, Tez-1568-6.patch, Tez-1568-7.patch, Tez-1568-8.patch Design system test where exception come from Input, Output, Processor, InputInitializer and VertexManagerPlugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1568) Add system test for propagation of diagnostics for errors
[ https://issues.apache.org/jira/browse/TEZ-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145986#comment-14145986 ] Jeff Zhang commented on TEZ-1568: - [~bikassaha] Attach the new patch with comment refined. Regarding the sleeping 10 seconds, I still keep it there. Because remove it will cause DAGAppMaster hang there. I create [TEZ-1620|https://issues.apache.org/jira/browse/TEZ-1620] for this issue. Add system test for propagation of diagnostics for errors - Key: TEZ-1568 URL: https://issues.apache.org/jira/browse/TEZ-1568 Project: Apache Tez Issue Type: Sub-task Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-1568.patch, Tez-1568-2.patch, Tez-1568-3.patch, Tez-1568-4.patch, Tez-1568-5.patch, Tez-1568-6.patch, Tez-1568-7.patch, Tez-1568-8.patch Design system test where exception come from Input, Output, Processor, InputInitializer and VertexManagerPlugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1608) TopK example
[ https://issues.apache.org/jira/browse/TEZ-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146038#comment-14146038 ] Krisztian Horvath commented on TEZ-1608: I received few improvements which I'm going to apply: Also, for the topK, can the sum task maintain a local top K and output only that much and the writer can pick the global topK from the local topKs. Would reduce the data transfer quite a bit. Then we may be able to use an UnorderedKVEdge instead of an OrderedPartitionedKVEdge? That will avoid the need to sort at the output and merge sort at the input. TopK example Key: TEZ-1608 URL: https://issues.apache.org/jira/browse/TEZ-1608 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.5.0 Reporter: Janos Matyas Attachments: TEZ-1608-1.patch The goal of this sample is to find the topK elements of a dataset, while guiding through the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism). An example use case for top K: Given a large data set in CSV format of user comments on a site listed as: userid,postid,commentid,comment,timestamp and we are looking for the top K commenter or the posts with the most comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1612) Pig on tez unit test intermittent hang
[ https://issues.apache.org/jira/browse/TEZ-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146265#comment-14146265 ] Rajesh Balamohan commented on TEZ-1612: --- Missed the fact that both the vertexmanagers are ShuffleVertexManagers in this case. Yes, the fix in master wasn't made for ShuffleVertexManager. Daniel, can you please post the logs with the latest run? Pig on tez unit test intermittent hang -- Key: TEZ-1612 URL: https://issues.apache.org/jira/browse/TEZ-1612 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Reporter: Daniel Dai Assignee: Bikas Saha Attachments: DAG1.png, syslog_dag_1411413615885_0001_1, testfail1.log.tar.gz Several Pig unit tests hang intermittently. For example, TestNewPlanImplicitSplit.testImplicitSplitInCoGroup, which is a DAG of 4 nodes: !DAG1.png! It uses auto-parallelism, vertex 106 change parallelism from 2-1, and vertex 107 from 21-1. Log attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1608) TopK example
[ https://issues.apache.org/jira/browse/TEZ-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Horvath updated TEZ-1608: --- Attachment: TEZ-1608-2.patch TopK example Key: TEZ-1608 URL: https://issues.apache.org/jira/browse/TEZ-1608 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.5.0 Reporter: Janos Matyas Attachments: TEZ-1608-1.patch, TEZ-1608-2.patch The goal of this sample is to find the topK elements of a dataset, while guiding through the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism). An example use case for top K: Given a large data set in CSV format of user comments on a site listed as: userid,postid,commentid,comment,timestamp and we are looking for the top K commenter or the posts with the most comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1608) TopK example
[ https://issues.apache.org/jira/browse/TEZ-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146322#comment-14146322 ] Krisztian Horvath commented on TEZ-1608: In the second version every Sum processor maintains a local top K and only writes these values when the task finishes, so the writer only has to select the top K from the local top Ks. The implementation is a bit more complex than the previous one. TopK example Key: TEZ-1608 URL: https://issues.apache.org/jira/browse/TEZ-1608 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.5.0 Reporter: Janos Matyas Attachments: TEZ-1608-1.patch, TEZ-1608-2.patch The goal of this sample is to find the topK elements of a dataset, while guiding through the basics of Tez (DAG creation, tokenizers, custom comparators and parallelism). An example use case for top K: Given a large data set in CSV format of user comments on a site listed as: userid,postid,commentid,comment,timestamp and we are looking for the top K commenter or the posts with the most comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1568) Add system test for propagation of diagnostics for errors
[ https://issues.apache.org/jira/browse/TEZ-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146864#comment-14146864 ] Bikas Saha commented on TEZ-1568: - Ok. I will commit this and we should remove it in TEZ-1620. Add system test for propagation of diagnostics for errors - Key: TEZ-1568 URL: https://issues.apache.org/jira/browse/TEZ-1568 Project: Apache Tez Issue Type: Sub-task Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-1568.patch, Tez-1568-2.patch, Tez-1568-3.patch, Tez-1568-4.patch, Tez-1568-5.patch, Tez-1568-6.patch, Tez-1568-7.patch, Tez-1568-8.patch Design system test where exception come from Input, Output, Processor, InputInitializer and VertexManagerPlugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1619) Upgrade Hadoop dependency to 2.5
[ https://issues.apache.org/jira/browse/TEZ-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146964#comment-14146964 ] Jonathan Eagles commented on TEZ-1619: -- +1. This change looks good. Let's ping the pig and hive teams so we can make sure we are in alignment. Upgrade Hadoop dependency to 2.5 Key: TEZ-1619 URL: https://issues.apache.org/jira/browse/TEZ-1619 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1619.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1612) Pig on tez unit test intermittent hang
[ https://issues.apache.org/jira/browse/TEZ-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147016#comment-14147016 ] Daniel Dai commented on TEZ-1612: - What do you mean the latest run? With the master? Pig on tez unit test intermittent hang -- Key: TEZ-1612 URL: https://issues.apache.org/jira/browse/TEZ-1612 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Reporter: Daniel Dai Assignee: Bikas Saha Attachments: DAG1.png, syslog_dag_1411413615885_0001_1, testfail1.log.tar.gz Several Pig unit tests hang intermittently. For example, TestNewPlanImplicitSplit.testImplicitSplitInCoGroup, which is a DAG of 4 nodes: !DAG1.png! It uses auto-parallelism, vertex 106 change parallelism from 2-1, and vertex 107 from 21-1. Log attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1612) Pig on tez unit test intermittent hang
[ https://issues.apache.org/jira/browse/TEZ-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147072#comment-14147072 ] Bikas Saha commented on TEZ-1612: - Yes. For the same test that you attached the DAG picture for above Pig on tez unit test intermittent hang -- Key: TEZ-1612 URL: https://issues.apache.org/jira/browse/TEZ-1612 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Reporter: Daniel Dai Assignee: Bikas Saha Attachments: DAG1.png, syslog_dag_1411413615885_0001_1, testfail1.log.tar.gz Several Pig unit tests hang intermittently. For example, TestNewPlanImplicitSplit.testImplicitSplitInCoGroup, which is a DAG of 4 nodes: !DAG1.png! It uses auto-parallelism, vertex 106 change parallelism from 2-1, and vertex 107 from 21-1. Log attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-1621) Actual error message not thrown on console, does appear in the YARN application log
Deepesh Khandelwal created TEZ-1621: --- Summary: Actual error message not thrown on console, does appear in the YARN application log Key: TEZ-1621 URL: https://issues.apache.org/jira/browse/TEZ-1621 Project: Apache Tez Issue Type: Bug Reporter: Deepesh Khandelwal While running an in session testorderedwordcount example the DAG failed with the following error on the console: {noformat} 14/09/25 01:55:53 INFO examples.TestOrderedWordCount: DAG 1 diagnostics: [Vertex failed, vertexName=initialmap, vertexId=vertex_1411586515507_0110_1_00, diagnostics=[Task failed, taskId=task_1411586515507_0110_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1411586515507_0110_01_02 finished with diagnostics set to [Container failed. Exception from container-launch. Container id: container_1411586515507_0110_01_02 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This wasn't very helpful, the root cause is in the application log: {noformat} 2014-09-25 01:55:41,246 ERROR [TezChild] org.apache.tez.runtime.task.TezTaskRunner: Exception of type Error. Exiting now java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;J)V at org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(Native Method) at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:57) at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:291) at org.apache.hadoop.hdfs.BlockReaderLocal.fillBuffer(BlockReaderLocal.java:344) at org.apache.hadoop.hdfs.BlockReaderLocal.fillDataBuf(BlockReaderLocal.java:444) at org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:575) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:539) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149) at org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.nextKeyValue(TezGroupedSplitsInputFormat.java:167) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.tez.mapreduce.processor.map.MapProcessor$NewRecordReader.nextKeyValue(MapProcessor.java:266) at org.apache.tez.mapreduce.hadoop.mapreduce.MapContextImpl.nextKeyValue(MapContextImpl.java:81) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.tez.mapreduce.processor.map.MapProcessor.runNewMapper(MapProcessor.java:237) at org.apache.tez.mapreduce.processor.map.MapProcessor.run(MapProcessor.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at
[jira] [Updated] (TEZ-1621) Actual error message not thrown on console, does appear in the YARN application log
[ https://issues.apache.org/jira/browse/TEZ-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated TEZ-1621: Attachment: console.txt app_logs.txt Actual error message not thrown on console, does appear in the YARN application log --- Key: TEZ-1621 URL: https://issues.apache.org/jira/browse/TEZ-1621 Project: Apache Tez Issue Type: Bug Reporter: Deepesh Khandelwal Attachments: app_logs.txt, console.txt While running an in session testorderedwordcount example the DAG failed with the following error on the console: {noformat} 14/09/25 01:55:53 INFO examples.TestOrderedWordCount: DAG 1 diagnostics: [Vertex failed, vertexName=initialmap, vertexId=vertex_1411586515507_0110_1_00, diagnostics=[Task failed, taskId=task_1411586515507_0110_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1411586515507_0110_01_02 finished with diagnostics set to [Container failed. Exception from container-launch. Container id: container_1411586515507_0110_01_02 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This wasn't very helpful, the root cause is in the application log: {noformat} 2014-09-25 01:55:41,246 ERROR [TezChild] org.apache.tez.runtime.task.TezTaskRunner: Exception of type Error. Exiting now java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;J)V at org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(Native Method) at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:57) at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:291) at org.apache.hadoop.hdfs.BlockReaderLocal.fillBuffer(BlockReaderLocal.java:344) at org.apache.hadoop.hdfs.BlockReaderLocal.fillDataBuf(BlockReaderLocal.java:444) at org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:575) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:539) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149) at org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.nextKeyValue(TezGroupedSplitsInputFormat.java:167) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.tez.mapreduce.processor.map.MapProcessor$NewRecordReader.nextKeyValue(MapProcessor.java:266) at org.apache.tez.mapreduce.hadoop.mapreduce.MapContextImpl.nextKeyValue(MapContextImpl.java:81) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.tez.mapreduce.processor.map.MapProcessor.runNewMapper(MapProcessor.java:237) at org.apache.tez.mapreduce.processor.map.MapProcessor.run(MapProcessor.java:124) at
[jira] [Updated] (TEZ-1621) Actual error message not thrown on console, does appear in the YARN application log
[ https://issues.apache.org/jira/browse/TEZ-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1621: Issue Type: Sub-task (was: Bug) Parent: TEZ-1240 Actual error message not thrown on console, does appear in the YARN application log --- Key: TEZ-1621 URL: https://issues.apache.org/jira/browse/TEZ-1621 Project: Apache Tez Issue Type: Sub-task Reporter: Deepesh Khandelwal Attachments: app_logs.txt, console.txt While running an in session testorderedwordcount example the DAG failed with the following error on the console: {noformat} 14/09/25 01:55:53 INFO examples.TestOrderedWordCount: DAG 1 diagnostics: [Vertex failed, vertexName=initialmap, vertexId=vertex_1411586515507_0110_1_00, diagnostics=[Task failed, taskId=task_1411586515507_0110_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1411586515507_0110_01_02 finished with diagnostics set to [Container failed. Exception from container-launch. Container id: container_1411586515507_0110_01_02 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This wasn't very helpful, the root cause is in the application log: {noformat} 2014-09-25 01:55:41,246 ERROR [TezChild] org.apache.tez.runtime.task.TezTaskRunner: Exception of type Error. Exiting now java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;J)V at org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(Native Method) at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:57) at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:291) at org.apache.hadoop.hdfs.BlockReaderLocal.fillBuffer(BlockReaderLocal.java:344) at org.apache.hadoop.hdfs.BlockReaderLocal.fillDataBuf(BlockReaderLocal.java:444) at org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:575) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:539) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149) at org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.nextKeyValue(TezGroupedSplitsInputFormat.java:167) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.tez.mapreduce.processor.map.MapProcessor$NewRecordReader.nextKeyValue(MapProcessor.java:266) at org.apache.tez.mapreduce.hadoop.mapreduce.MapContextImpl.nextKeyValue(MapContextImpl.java:81) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.tez.mapreduce.processor.map.MapProcessor.runNewMapper(MapProcessor.java:237) at org.apache.tez.mapreduce.processor.map.MapProcessor.run(MapProcessor.java:124) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
[jira] [Commented] (TEZ-1621) Actual error message not thrown on console, does appear in the YARN application log
[ https://issues.apache.org/jira/browse/TEZ-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147364#comment-14147364 ] Bikas Saha commented on TEZ-1621: - Not sure why we treat that separately and shutdown. Actual error message not thrown on console, does appear in the YARN application log --- Key: TEZ-1621 URL: https://issues.apache.org/jira/browse/TEZ-1621 Project: Apache Tez Issue Type: Sub-task Reporter: Deepesh Khandelwal Attachments: app_logs.txt, console.txt While running an in session testorderedwordcount example the DAG failed with the following error on the console: {noformat} 14/09/25 01:55:53 INFO examples.TestOrderedWordCount: DAG 1 diagnostics: [Vertex failed, vertexName=initialmap, vertexId=vertex_1411586515507_0110_1_00, diagnostics=[Task failed, taskId=task_1411586515507_0110_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1411586515507_0110_01_02 finished with diagnostics set to [Container failed. Exception from container-launch. Container id: container_1411586515507_0110_01_02 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This wasn't very helpful, the root cause is in the application log: {noformat} 2014-09-25 01:55:41,246 ERROR [TezChild] org.apache.tez.runtime.task.TezTaskRunner: Exception of type Error. Exiting now java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;J)V at org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(Native Method) at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:57) at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:291) at org.apache.hadoop.hdfs.BlockReaderLocal.fillBuffer(BlockReaderLocal.java:344) at org.apache.hadoop.hdfs.BlockReaderLocal.fillDataBuf(BlockReaderLocal.java:444) at org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:575) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:539) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149) at org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.nextKeyValue(TezGroupedSplitsInputFormat.java:167) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.tez.mapreduce.processor.map.MapProcessor$NewRecordReader.nextKeyValue(MapProcessor.java:266) at org.apache.tez.mapreduce.hadoop.mapreduce.MapContextImpl.nextKeyValue(MapContextImpl.java:81) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.tez.mapreduce.processor.map.MapProcessor.runNewMapper(MapProcessor.java:237) at org.apache.tez.mapreduce.processor.map.MapProcessor.run(MapProcessor.java:124) at
[jira] [Updated] (TEZ-1612) Pig on tez unit test intermittent hang
[ https://issues.apache.org/jira/browse/TEZ-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated TEZ-1612: Attachment: runwithmaster.tar.gz Attach the log directory when run with master. The test complete successfully. Pig on tez unit test intermittent hang -- Key: TEZ-1612 URL: https://issues.apache.org/jira/browse/TEZ-1612 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Reporter: Daniel Dai Assignee: Bikas Saha Attachments: DAG1.png, runwithmaster.tar.gz, syslog_dag_1411413615885_0001_1, testfail1.log.tar.gz Several Pig unit tests hang intermittently. For example, TestNewPlanImplicitSplit.testImplicitSplitInCoGroup, which is a DAG of 4 nodes: !DAG1.png! It uses auto-parallelism, vertex 106 change parallelism from 2-1, and vertex 107 from 21-1. Log attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-1621) Actual error message not thrown on console, does appear in the YARN application log
[ https://issues.apache.org/jira/browse/TEZ-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147296#comment-14147296 ] Jeff Zhang edited comment on TEZ-1621 at 9/25/14 5:38 AM: -- It is one kind of Exception that cause the TezChild Container shutdown. We should report the error to AM before shutting down TezChild {code} } else if (cause instanceof Error) { LOG.error(Exception of type Error. Exiting now, cause); ExitUtil.terminate(-1, cause); } else { {code} was (Author: zjffdu): It is one kind of Exception that cause the TezChild Container shutdown. We should report the error to task before shutting down TezChild {code} } else if (cause instanceof Error) { LOG.error(Exception of type Error. Exiting now, cause); ExitUtil.terminate(-1, cause); } else { {code} Actual error message not thrown on console, does appear in the YARN application log --- Key: TEZ-1621 URL: https://issues.apache.org/jira/browse/TEZ-1621 Project: Apache Tez Issue Type: Sub-task Reporter: Deepesh Khandelwal Attachments: app_logs.txt, console.txt While running an in session testorderedwordcount example the DAG failed with the following error on the console: {noformat} 14/09/25 01:55:53 INFO examples.TestOrderedWordCount: DAG 1 diagnostics: [Vertex failed, vertexName=initialmap, vertexId=vertex_1411586515507_0110_1_00, diagnostics=[Task failed, taskId=task_1411586515507_0110_1_00_00, diagnostics=[TaskAttempt 0 failed, info=[Container container_1411586515507_0110_01_02 finished with diagnostics set to [Container failed. Exception from container-launch. Container id: container_1411586515507_0110_01_02 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} This wasn't very helpful, the root cause is in the application log: {noformat} 2014-09-25 01:55:41,246 ERROR [TezChild] org.apache.tez.runtime.task.TezTaskRunner: Exception of type Error. Exiting now java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;J)V at org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(Native Method) at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:57) at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:291) at org.apache.hadoop.hdfs.BlockReaderLocal.fillBuffer(BlockReaderLocal.java:344) at org.apache.hadoop.hdfs.BlockReaderLocal.fillDataBuf(BlockReaderLocal.java:444) at org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:575) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:539) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149) at org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.nextKeyValue(TezGroupedSplitsInputFormat.java:167) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at
[jira] [Created] (TEZ-1622) Implement a tez jar equivalent script to avoid the complexities of hadoop jar
Gopal V created TEZ-1622: Summary: Implement a tez jar equivalent script to avoid the complexities of hadoop jar Key: TEZ-1622 URL: https://issues.apache.org/jira/browse/TEZ-1622 Project: Apache Tez Issue Type: Bug Reporter: Gopal V Currently, the only way to run a tez job by hand is to setup multiple parameters like HADOOP_CLASSPATH and then do hadoop jar {{main-class}}. This is inconvenient and complex - find an easier way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)