[ 
https://issues.apache.org/jira/browse/TEZ-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrille Chépélov updated TEZ-2505:
----------------------------------
    Attachment: verbose-MBTSE.patch

A quick patch so that the MBTSE exception reports the original BOE exception.

Now the stack looks like
{code}
Caused by: 
org.apache.tez.runtime.library.common.sort.impl.ExternalSorter$MapBufferTooSmallException:
 Record too large for in-memory buffer. Exceeded buffer overflo
        at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:317)
        at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:320)
        at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:272)
        at 
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:164)
        at 
cascading.flow.tez.stream.element.OldOutputCollector.collect(OldOutputCollector.java:51)
  
        at 
cascading.tap.hadoop.util.MeasuredOutputCollector.collect(MeasuredOutputCollector.java:69)
        at 
cascading.flow.tez.stream.element.TezGroupByGate.wrapGroupingAndCollect(TezGroupByGate.java:109)
        at 
cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:103)
        ... 54 more
Caused by: java.nio.BufferOverflowException
        at java.nio.Buffer.nextPutIndex(Buffer.java:521)
        at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:169)
        at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$BufferStreamWrapper.write(PipelinedSorter.java:550)
        at java.io.DataOutputStream.write(DataOutputStream.java:88)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
        at 
com.twitter.chill.hadoop.KryoSerializer.serialize(KryoSerializer.java:50)
        at 
cascading.tuple.hadoop.TupleSerialization$SerializationElementWriter.write(TupleSerialization.java:705)
        at 
cascading.tuple.io.TupleOutputStream.writeElement(TupleOutputStream.java:114)
        at cascading.tuple.io.TupleOutputStream.write(TupleOutputStream.java:89)
        at 
cascading.tuple.io.TupleOutputStream.writeTuple(TupleOutputStream.java:64)  
        at 
cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:37)
        at 
cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:28)
        at 
org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:306)
{code}


> PipelinedSorter uses Comparator objects concurrently from multiple threads 
> ---------------------------------------------------------------------------
>
>                 Key: TEZ-2505
>                 URL: https://issues.apache.org/jira/browse/TEZ-2505
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>         Environment: Scalding 0.13.1+PR1220 ; Cascading-3.0.0-wip-118 ; scala 
> 2.11.6 ; java openjdk 1.8.0_45-internal ; Debian linux 8 (stable); Intel(R) 
> Core(TM) i7-3770 (amd64)
>            Reporter: Cyrille Chépélov
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2505.1.patch, TEZ-2505.2.patch, 
> TEZ-2505.branch-0.7.0.patch, TEZ-2506.1.patch, log-7142-x.log, 
> verbose-MBTSE.patch
>
>
> When attempting to run the same multi-DAG application (that worked fine under 
> the same environment except Cascading-3.0.0-wip-115 and tez 0.6.1), one of 
> the early, and simplest DAG crashes on the PipelinedSorter.
> The stack at the crash site looks like:
> {code}
> 2015-05-28 11:52:47,120 ERROR [TezChild] element.TrapHandler: caught 
> Throwable, no trap available, rethrowing
> cascading.CascadingException: unable to compare stream elements in position: 0
>         at 
> cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:164)
>         at 
> cascading.tuple.hadoop.util.TupleComparator.compare(TupleComparator.java:38)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compareKeys(PipelinedSorter.java:669)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compare(PipelinedSorter.java:684)
>         at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99)
>         at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.sort(PipelinedSorter.java:631)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:230)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:311)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:272)
>         at 
> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:164)
>         at 
> cascading.flow.tez.stream.element.OldOutputCollector.collect(OldOutputCollector.java:51)
>         at 
> cascading.tap.hadoop.util.MeasuredOutputCollector.collect(MeasuredOutputCollector.java:69)
>         at 
> cascading.flow.tez.stream.element.TezCoGroupGate.wrapGroupingAndCollect(TezCoGroupGate.java:193)
>         at 
> cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:103)
>         at 
> cascading.flow.hadoop.stream.HadoopGroupGate.receive(HadoopGroupGate.java:45)
>         at 
> cascading.flow.stream.element.FunctionEachStage$1.collect(FunctionEachStage.java:81)
>         at 
> cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
>         at 
> cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
>         at 
> com.twitter.scalding.FlatMapFunction$$anonfun$operate$2.apply(Operations.scala:49)
>         at 
> com.twitter.scalding.FlatMapFunction$$anonfun$operate$2.apply(Operations.scala:47)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:750)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1202)
>         at com.twitter.scalding.FlatMapFunction.operate(Operations.scala:47)
>         at 
> cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:100)
>         at 
> cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:40)
>         at 
> cascading.flow.stream.element.FunctionEachStage$1.collect(FunctionEachStage.java:81)
>         at 
> cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
>         at 
> cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
>         at com.twitter.scalding.MapFunction.operate(Operations.scala:60)
>         at 
> cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:100)
>         at 
> cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:40)
>         at 
> cascading.flow.stream.element.FunctionEachStage$1.collect(FunctionEachStage.java:81)
>         at 
> cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
>         at 
> cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
>         at cascading.operation.Identity$1.operate(Identity.java:124)
>         at cascading.operation.Identity.operate(Identity.java:150)
>         at 
> cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:100)
>         at 
> cascading.flow.stream.element.FunctionEachStage.receive(FunctionEachStage.java:40)
>         at cascading.flow.stream.element.SourceStage.map(SourceStage.java:110)
>         at cascading.flow.stream.element.SourceStage.run(SourceStage.java:66)
>         at 
> cascading.flow.tez.stream.element.TezSourceStage.run(TezSourceStage.java:95)
>         at cascading.flow.tez.FlowProcessor.run(FlowProcessor.java:165)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: cascading.CascadingException: unable to read element from 
> underlying stream
>         at 
> cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:82)
>         at 
> cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:33)
>         at 
> cascading.tuple.hadoop.util.DeserializerComparator.compareTuples(DeserializerComparator.java:160)
>         ... 55 more
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:197)
>         at java.io.DataInputStream.readFully(DataInputStream.java:169)
>         at 
> org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:125)
>         at 
> cascading.tuple.hadoop.io.HadoopTupleInputStream.readString(HadoopTupleInputStream.java:75)
>         at 
> cascading.tuple.hadoop.io.HadoopTupleInputStream.readType(HadoopTupleInputStream.java:85)
>         at 
> cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement(HadoopTupleInputStream.java:52)
>         at 
> cascading.tuple.hadoop.util.TupleElementComparator.compare(TupleElementComparator.java:77)
>         ... 57 more
> {code}
> with an apparently random variation at the top of stack which is
> {code}
> 2015-05-28 13:10:13,459 ERROR [TezChild] element.TrapHandler: caught 
> Throwable, no trap available, rethrowing
> cascading.CascadingException: java.io.EOFException
>         at 
> cascading.tuple.hadoop.util.TupleComparator.compare(TupleComparator.java:42)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compareKeys(PipelinedSorter.java:669)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.compare(PipelinedSorter.java:684)
>         at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99)
>         at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
>         at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.sort(PipelinedSorter.java:631)
> {code}
> all running TezChildren fail with the same stacks (either variant) at the 
> same time on the same node, which is straight off a HDFS-backed CSV file.
> The cascading.tuple.hadoop.util.TupleComparator#compare code at the top of 
> stack has been in use in a MAPREDUCE context for over 2.5 years; first 
> analysis with [~cwensel] (who successfully reproduced the issue without 
> scalding) points towards an issue on tez side.
> as a workaround, it is possible to run with 
> {code:scala}"tez.runtime.sorter.class" -> "LEGACY"{code}, but this is 
> impractical in the long run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to