[ https://issues.apache.org/jira/browse/TEZ-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568764#comment-14568764 ]
Rajesh Balamohan edited comment on TEZ-2505 at 6/2/15 8:44 AM: --------------------------------------------------------------- Thanks [~hitesh], [~gopalv], [~cchepelov] . Committed to master & branch 0.7 >> commit 237d1895d224fc07466e0c4e494960b32d7a69aa (master) commit feca51a3b22bcf6cf8f8f18de2c17d44a39e6f17 (branch 0.7) >> [~cchepelov] - Created TEZ-2515 to track the other issue related to BTSE. was (Author: rajesh.balamohan): Committed to master & branch 0.7 >> commit 237d1895d224fc07466e0c4e494960b32d7a69aa (master) commit feca51a3b22bcf6cf8f8f18de2c17d44a39e6f17 (branch 0.7) >> [~cchepelov] - Created TEZ-2515 to track the other issue related to BTSE. > PipelinedSorter uses Comparator objects concurrently from multiple threads > --------------------------------------------------------------------------- > > Key: TEZ-2505 > URL: https://issues.apache.org/jira/browse/TEZ-2505 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.7.0 > Environment: Scalding 0.13.1+PR1220 ; Cascading-3.0.0-wip-118 ; scala > 2.11.6 ; java openjdk 1.8.0_45-internal ; Debian linux 8 (stable); Intel(R) > Core(TM) i7-3770 (amd64) > Reporter: Cyrille Chépélov > Assignee: Rajesh Balamohan > Fix For: 0.7.1 > > Attachments: TEZ-2505.1.patch, TEZ-2505.2.patch, > TEZ-2505.branch-0.7.0.patch, TEZ-2506.1.patch, log-7142-x.log, > verbose-MBTSE.patch > > > When attempting to run the same multi-DAG application (that worked fine under > the same environment except Cascading-3.0.0-wip-115 and tez 0.6.1), one of > the early, and simplest DAG crashes on the PipelinedSorter. > The stack at the crash site looks like: > {code} > 2015-05-28 11:52:47,120 ERROR [TezChild] element.TrapHandler: caught > Throwable, no trap available, rethrowing > cascading.CascadingException: unable to compare stream elements in position: 0 > at > cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:164) > at > cascading.tuple.hadoop.util.TupleComparator.compare(TupleComparator.java:38) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compareKeys(PipelinedSorter.java:669) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compare(PipelinedSorter.java:684) > at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99) > at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.sort(PipelinedSorter.java:631) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:230) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:311) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:272) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:164) > at > cascading.flow.tez.stream.element.OldOutputCollector.collect(OldOutputCollector.java:51) > at > cascading.tap.hadoop.util.MeasuredOutputCollector.collect(MeasuredOutputCollector.java:69) > at > cascading.flow.tez.stream.element.TezCoGroupGate.wrapGroupingAndCollect(TezCoGroupGate.java:193) > at > cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:103) > at > cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:45) > at > cascading.flow.stream.element.FunctionEachStage$1.collect(FunctionEachStage.java:81) > at > cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145) > at > cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133) > at > com.twitter.scalding.FlatMapFunction$$anonfun$operate$2.apply(Operations.scala:49) > at > com.twitter.scalding.FlatMapFunction$$anonfun$operate$2.apply(Operations.scala:47) > at scala.collection.Iterator$class.foreach(Iterator.scala:750) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1202) > at com.twitter.scalding.FlatMapFunction.operate(Operations.scala:47) > at > cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:100) > at > cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:40) > at > cascading.flow.stream.element.FunctionEachStage$1.collect(FunctionEachStage.java:81) > at > cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145) > at > cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133) > at com.twitter.scalding.MapFunction.operate(Operations.scala:60) > at > cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:100) > at > cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:40) > at > cascading.flow.stream.element.FunctionEachStage$1.collect(FunctionEachStage.java:81) > at > cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145) > at > cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133) > at cascading.operation.Identity$1.operate(Identity.java:124) > at cascading.operation.Identity.operate(Identity.java:150) > at > cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:100) > at > cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:40) > at cascading.flow.stream.element.SourceStage.map(SourceStage.java:110) > at cascading.flow.stream.element.SourceStage.run(SourceStage.java:66) > at > cascading.flow.tez.stream.element.TezSourceStage.run(TezSourceStage.java:95) > at cascading.flow.tez.FlowProcessor.run(FlowProcessor.java:165) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: cascading.CascadingException: unable to read element from > underlying stream > at > cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:82) > at > cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:33) > at > cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:160) > ... 55 more > Caused by: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at > org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:125) > at > cascading.tuple.hadoop.io.HadoopTupleInputStream.readString(HadoopTupleInputStream.java:75) > at > cascading.tuple.hadoop.io.HadoopTupleInputStream.readType(HadoopTupleInputStream.java:85) > at > cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement(HadoopTupleInputStream.java:52) > at > cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:77) > ... 57 more > {code} > with an apparently random variation at the top of stack which is > {code} > 2015-05-28 13:10:13,459 ERROR [TezChild] element.TrapHandler: caught > Throwable, no trap available, rethrowing > cascading.CascadingException: java.io.EOFException > at > cascading.tuple.hadoop.util.TupleComparator.compare(TupleComparator.java:42) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compareKeys(PipelinedSorter.java:669) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compare(PipelinedSorter.java:684) > at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99) > at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63) > at > org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.sort(PipelinedSorter.java:631) > {code} > all running TezChildren fail with the same stacks (either variant) at the > same time on the same node, which is straight off a HDFS-backed CSV file. > The cascading.tuple.hadoop.util.TupleComparator#compare code at the top of > stack has been in use in a MAPREDUCE context for over 2.5 years; first > analysis with [~cwensel] (who successfully reproduced the issue without > scalding) points towards an issue on tez side. > as a workaround, it is possible to run with > {code:scala}"tez.runtime.sorter.class" -> "LEGACY"{code}, but this is > impractical in the long run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)