[ 
https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113741#comment-14113741
 ] 

Mridul Muralidharan commented on SPARK-3277:
--------------------------------------------

This looks like unrelated changes pushed to BlockObjectWriter as part of 
introduction of ShuffleWriteMetrics.
I had introducing checks and also documented that we must not infer size based 
on position of stream after flush - since close can write data to the streams 
(and one flush can result in more data getting generated which need not be 
flushed to streams).

Apparently this logic was modified subsequently causing this bug.
Solution would be to revert changes to update shuffleBytesWritten before close 
of stream.
It must be done after close and based on file.length

> LZ4 compression cause the the ExternalSort exception
> ----------------------------------------------------
>
>                 Key: SPARK-3277
>                 URL: https://issues.apache.org/jira/browse/SPARK-3277
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2
>            Reporter: hzw
>             Fix For: 1.1.0
>
>
> I tested the LZ4 compression,and it come up with such problem.(with wordcount)
> Also I tested the snappy and LZF,and they were OK.
> At last I set the  "spark.shuffle.spill" as false to avoid such exeception, 
> but once open this "switch", this error would come.
> Exeception Info as follow:
> java.lang.AssertionError: assertion failed
>         at scala.Predef$.assert(Predef.scala:165)
>         at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.<init>(ExternalAppendOnlyMap.scala:416)
>         at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:235)
>         at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150)
>         at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>         at 
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to