[ https://issues.apache.org/jira/browse/SPARK-34680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298759#comment-17298759 ]
Hyukjin Kwon commented on SPARK-34680: -------------------------------------- What codes did you run? > Spark hangs when out of diskspace > --------------------------------- > > Key: SPARK-34680 > URL: https://issues.apache.org/jira/browse/SPARK-34680 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.0.1, 3.1.1 > Environment: Running Spark and Pyspark 3.1.1. with Hadoop 3.2.2 and > Koalas 1.6.0. > Some environment variables: > |Java Home|/usr/lib/jvm/java-11-openjdk-11.0.3.7-0.el7_6.x86_64| > |Java Version|11.0.3 (Oracle Corporation)| > |Scala Version|version 2.12.10| > Reporter: Laurens > Priority: Major > > Parsing a workflow using Koalas, I noticed a stage is hanging for 8 hours > already. I checked the logs and the last output is: > {code:java} > 21/03/09 13:50:31 ERROR TaskMemoryManager: error while calling spill() on > org.apache.spark.shuffle.sort.ShuffleExternalSorter@4127a515 > java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) > at > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59) > at > java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) > at > java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) > at > net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:223) > at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:176) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:260) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:218) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.spill(ShuffleExternalSorter.java:276) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:208) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:289) > at > org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:116) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:385) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:409) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:249) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:178) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Suppressed: java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) > at > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59) > at > java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) > at > java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142) > at net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:243) > at > org.apache.spark.serializer.DummySerializerInstance$1.flush(DummySerializerInstance.java:50) > at > org.apache.spark.storage.DiskBlockObjectWriter.commitAndGet(DiskBlockObjectWriter.scala:173) > at > org.apache.spark.storage.DiskBlockObjectWriter.$anonfun$close$1(DiskBlockObjectWriter.scala:156) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at > org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:158) > at > org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:226) > ... 18 more > Suppressed: java.io.IOException: No space left on device > at java.base/java.io.FileOutputStream.writeBytes(Native Method) > at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) > at > org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:59) > at > java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) > at > java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142) > at java.base/java.io.FilterOutputStream.close(FilterOutputStream.java:182) > at > org.apache.spark.storage.DiskBlockObjectWriter$ManualCloseBufferedOutputStream$1.org$apache$spark$storage$DiskBlockObjectWriter$ManualCloseOutputStream$$super$close(DiskBlockObjectWriter.scala:108) > at > org.apache.spark.storage.DiskBlockObjectWriter$ManualCloseOutputStream.manualClose(DiskBlockObjectWriter.scala:65) > at > org.apache.spark.storage.DiskBlockObjectWriter$ManualCloseOutputStream.manualClose$(DiskBlockObjectWriter.scala:64) > at > org.apache.spark.storage.DiskBlockObjectWriter$ManualCloseBufferedOutputStream$1.manualClose(DiskBlockObjectWriter.scala:108) > at > org.apache.spark.storage.DiskBlockObjectWriter.$anonfun$closeResources$1(DiskBlockObjectWriter.scala:135) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439) > at > org.apache.spark.storage.DiskBlockObjectWriter.closeResources(DiskBlockObjectWriter.scala:136) > at > org.apache.spark.storage.DiskBlockObjectWriter.$anonfun$close$2(DiskBlockObjectWriter.scala:158) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1448) > ... 20 more > 21/03/09 13:50:31 INFO TaskMemoryManager: Memory used in task 1255 > 21/03/09 13:50:31 INFO TaskMemoryManager: Acquired by > HybridRowQueue(org.apache.spark.memory.TaskMemoryManager@394bad48,/local/anonymized/spark/spark-4b70492b-8f2e-4108-b6a0-6ed423a98bd9/executor-b88a6782-4592-45c0-a484-73a2f642cb3e/spark-c20b49eb-83d4-4145-b07a-fe6fddef7ffe,7,org.apache.spark.serializer.SerializerManager@59dd92e8): > 105.5 MiB > 21/03/09 13:50:31 INFO TaskMemoryManager: Acquired by > org.apache.spark.shuffle.sort.ShuffleExternalSorter@4127a515: 14.4 GiB > 21/03/09 13:50:31 INFO TaskMemoryManager: Acquired by > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@34a4b163: > 15.1 GiB > 21/03/09 13:50:31 INFO TaskMemoryManager: 67108864 bytes of memory were used > by task 1255 but are not associated with specific consumers > 21/03/09 13:50:31 INFO TaskMemoryManager: 31853114929 bytes of memory are > used for execution and 526799 bytes of memory are used for storage{code} > Local time is 21/03/09 21:13:00, so it appears the worker is stuck and the > stage is not terminating unsuccessfully. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org