[jira] [Commented] (SPARK-3277) LZ4 compression cause the the ExternalSort exception
[ https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113741#comment-14113741 ] Mridul Muralidharan commented on SPARK-3277: This looks like unrelated changes pushed to BlockObjectWriter as part of introduction of ShuffleWriteMetrics. I had introducing checks and also documented that we must not infer size based on position of stream after flush - since close can write data to the streams (and one flush can result in more data getting generated which need not be flushed to streams). Apparently this logic was modified subsequently causing this bug. Solution would be to revert changes to update shuffleBytesWritten before close of stream. It must be done after close and based on file.length LZ4 compression cause the the ExternalSort exception Key: SPARK-3277 URL: https://issues.apache.org/jira/browse/SPARK-3277 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2 Reporter: hzw Fix For: 1.1.0 I tested the LZ4 compression,and it come up with such problem.(with wordcount) Also I tested the snappy and LZF,and they were OK. At last I set the spark.shuffle.spill as false to avoid such exeception, but once open this switch, this error would come. Exeception Info as follow: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.init(ExternalAppendOnlyMap.scala:416) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:235) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3277) LZ4 compression cause the the ExternalSort exception
[ https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113789#comment-14113789 ] hzw commented on SPARK-3277: Sorry,I can not understand it clearly since I'm not familiar with the code of this class. Can you point the line number of the code where it goes wrong or make a pr to fix this problem LZ4 compression cause the the ExternalSort exception Key: SPARK-3277 URL: https://issues.apache.org/jira/browse/SPARK-3277 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2 Reporter: hzw Fix For: 1.1.0 I tested the LZ4 compression,and it come up with such problem.(with wordcount) Also I tested the snappy and LZF,and they were OK. At last I set the spark.shuffle.spill as false to avoid such exeception, but once open this switch, this error would come. It seems that if num of the words is few, wordcount will go through,but if it is a complex text ,this problem will show Exeception Info as follow: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.init(ExternalAppendOnlyMap.scala:416) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:235) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3277) LZ4 compression cause the the ExternalSort exception
[ https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114014#comment-14114014 ] Mridul Muralidharan commented on SPARK-3277: [~matei] Attaching a patch which reproduces the bug consistently. I suspect the issue is more serious than what I detailed above - spill to disk seems completely broken if I understood the assertion message correctly. Unfortunately, this is based on a few minutes of free time I could grab - so a more principled debugging session is definitely warranted ! LZ4 compression cause the the ExternalSort exception Key: SPARK-3277 URL: https://issues.apache.org/jira/browse/SPARK-3277 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0, 1.2.0 Reporter: hzw Priority: Blocker Fix For: 1.1.0 I tested the LZ4 compression,and it come up with such problem.(with wordcount) Also I tested the snappy and LZF,and they were OK. At last I set the spark.shuffle.spill as false to avoid such exeception, but once open this switch, this error would come. It seems that if num of the words is few, wordcount will go through,but if it is a complex text ,this problem will show Exeception Info as follow: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.init(ExternalAppendOnlyMap.scala:416) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:235) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3277) LZ4 compression cause the the ExternalSort exception
[ https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114026#comment-14114026 ] Mridul Muralidharan commented on SPARK-3277: [~hzw] did you notice this against 1.0.2 ? I did not think the changes for consolidated shuffle were backported to that branch, [~mateiz] can comment more though. LZ4 compression cause the the ExternalSort exception Key: SPARK-3277 URL: https://issues.apache.org/jira/browse/SPARK-3277 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0, 1.2.0 Reporter: hzw Priority: Blocker Fix For: 1.1.0 Attachments: test_lz4_bug.patch I tested the LZ4 compression,and it come up with such problem.(with wordcount) Also I tested the snappy and LZF,and they were OK. At last I set the spark.shuffle.spill as false to avoid such exeception, but once open this switch, this error would come. It seems that if num of the words is few, wordcount will go through,but if it is a complex text ,this problem will show Exeception Info as follow: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.init(ExternalAppendOnlyMap.scala:416) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:235) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3277) LZ4 compression cause the the ExternalSort exception
[ https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114247#comment-14114247 ] Matei Zaharia commented on SPARK-3277: -- Thanks Mridul -- I think Andrew and Patrick have figured this out. LZ4 compression cause the the ExternalSort exception Key: SPARK-3277 URL: https://issues.apache.org/jira/browse/SPARK-3277 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0, 1.2.0 Reporter: hzw Priority: Blocker Attachments: test_lz4_bug.patch I tested the LZ4 compression,and it come up with such problem.(with wordcount) Also I tested the snappy and LZF,and they were OK. At last I set the spark.shuffle.spill as false to avoid such exeception, but once open this switch, this error would come. It seems that if num of the words is few, wordcount will go through,but if it is a complex text ,this problem will show Exeception Info as follow: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.init(ExternalAppendOnlyMap.scala:416) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:235) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3277) LZ4 compression cause the the ExternalSort exception
[ https://issues.apache.org/jira/browse/SPARK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114484#comment-14114484 ] Mridul Muralidharan commented on SPARK-3277: Sounds great, thx ! I suspect it is because for lzo we configure it to write block on flush (partial if insufficient data to fill block); but for lz4, either such config does not exist or we dont use that. Resulting in flush becoming noop in case the data in current block is insufficientto cause a compressed block to be created - while close will force patial block to be written out. Which is why the asserion lists all sizes as 0 LZ4 compression cause the the ExternalSort exception Key: SPARK-3277 URL: https://issues.apache.org/jira/browse/SPARK-3277 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0, 1.2.0 Reporter: hzw Assignee: Andrew Or Priority: Blocker Attachments: test_lz4_bug.patch I tested the LZ4 compression,and it come up with such problem.(with wordcount) Also I tested the snappy and LZF,and they were OK. At last I set the spark.shuffle.spill as false to avoid such exeception, but once open this switch, this error would come. It seems that if num of the[ words is few, wordcount will go through,but if it is a complex text ,this problem will show Exeception Info as follow: {code} java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.init(ExternalAppendOnlyMap.scala:416) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:235) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org