[jira] [Updated] (SPARK-31496) Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError

2020-04-20 Thread Tomas Shestakov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Shestakov updated SPARK-31496:

Description: 
Local spark with one core (local[1]) while trying to save Dataset to parquet 
local file cause OOM. 
{code:java}
SparkSession sparkSession = SparkSession.builder()
.appName("Loader impl test")
.master("local[1]")
.config("spark.ui.enabled", false)
.config("spark.sql.datetime.java8API.enabled", true)
.config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
.config("spark.kryoserializer.buffer.max", "1g")
.config("spark.executor.memory", "4g")
.config("spark.driver.memory", "8g")
.getOrCreate();

{code}
{noformat}
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.967]  INFO [boundedElastic-2 
o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output Committer 
Algorithm version is 1
[20-Apr-2020 11:42:27.969]  INFO [boundedElastic-2 
o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using user defined output 
committer class org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.970]  INFO [boundedElastic-2 
o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output Committer 
Algorithm version is 1
[20-Apr-2020 11:42:27.973]  INFO [boundedElastic-2 
o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using output committer 
class org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:34.371]  INFO [boundedElastic-2 
org.apache.spark.SparkContext:57] q: - Starting job: save at LoaderImpl.java:305
[20-Apr-2020 11:42:34.389]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Got job 0 (save at 
LoaderImpl.java:305) with 1 output partitions
[20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0 
(save at LoaderImpl.java:305)
[20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Parents of final stage: List()
[20-Apr-2020 11:42:34.392]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Missing parents: 
List()[20-Apr-2020 11:42:34.398]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting ResultStage 0 
(MapPartitionsRDD[6] at save at LoaderImpl.java:305), which has no missing 
parents
[20-Apr-2020 11:42:34.634]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0 stored 
as values in memory (estimated size 166.1 KiB, free 18.4 GiB)
[20-Apr-2020 11:42:34.945]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0 
stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB)
[20-Apr-2020 11:42:34.949]  INFO [dispatcher-BlockManagerMaster 
org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 in 
memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB)
[20-Apr-2020 11:42:34.953]  INFO [dag-scheduler-event-loop 
org.apache.spark.SparkContext:57] q: - Created broadcast 0 from broadcast at 
DAGScheduler.scala:1206
[20-Apr-2020 11:42:34.980]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks 
from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) (first 
15 tasks are for partitions Vector(0))
[20-Apr-2020 11:42:34.981]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.TaskSchedulerImpl:57] q: - Adding task set 0.0 with 
1 tasks
Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError at 
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
 at 
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) at 
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
 at 
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) 
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
 at 
java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
 at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) at 
org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at 
com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at 
com.esotericsoftware.kryo.io.Output.close(Output.java:196) at 

[jira] [Updated] (SPARK-31496) Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError

2020-04-20 Thread Tomas Shestakov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Shestakov updated SPARK-31496:

Description: 
Local spark with one core (local[1]) while trying to save Dataset to parquet 
local file cause OOM. 
{code:java}
SparkSession sparkSession = SparkSession.builder()
.appName("Loader impl test")
.master("local[1]")
.config("spark.ui.enabled", false)
.config("spark.sql.datetime.java8API.enabled", true)
.config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
.config("spark.kryoserializer.buffer.max", "1g")
.config("spark.executor.memory", "4g")
.config("spark.driver.memory", "8g")
.getOrCreate();
{code}
{noformat}
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.967]  INFO [boundedElastic-2 
o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output Committer 
Algorithm version is 1
[20-Apr-2020 11:42:27.969]  INFO [boundedElastic-2 
o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using user defined output 
committer class org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.970]  INFO [boundedElastic-2 
o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output Committer 
Algorithm version is 1
[20-Apr-2020 11:42:27.973]  INFO [boundedElastic-2 
o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using output committer 
class org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:34.371]  INFO [boundedElastic-2 
org.apache.spark.SparkContext:57] q: - Starting job: save at LoaderImpl.java:305
[20-Apr-2020 11:42:34.389]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Got job 0 (save at 
LoaderImpl.java:305) with 1 output partitions
[20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0 
(save at LoaderImpl.java:305)
[20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Parents of final stage: List()
[20-Apr-2020 11:42:34.392]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Missing parents: 
List()[20-Apr-2020 11:42:34.398]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting ResultStage 0 
(MapPartitionsRDD[6] at save at LoaderImpl.java:305), which has no missing 
parents
[20-Apr-2020 11:42:34.634]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0 stored 
as values in memory (estimated size 166.1 KiB, free 18.4 GiB)
[20-Apr-2020 11:42:34.945]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0 
stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB)
[20-Apr-2020 11:42:34.949]  INFO [dispatcher-BlockManagerMaster 
org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 in 
memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB)
[20-Apr-2020 11:42:34.953]  INFO [dag-scheduler-event-loop 
org.apache.spark.SparkContext:57] q: - Created broadcast 0 from broadcast at 
DAGScheduler.scala:1206
[20-Apr-2020 11:42:34.980]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks 
from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) (first 
15 tasks are for partitions Vector(0))
[20-Apr-2020 11:42:34.981]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.TaskSchedulerImpl:57] q: - Adding task set 0.0 with 
1 tasks
Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError at 
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
 at 
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) at 
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
 at 
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) 
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
 at 
java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
 at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) at 
org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at 
com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at 
com.esotericsoftware.kryo.io.Output.close(Output.java:196) at 

[jira] [Updated] (SPARK-31496) Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError

2020-04-20 Thread Tomas Shestakov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Shestakov updated SPARK-31496:

Description: 
Local spark with one core (local[1]) while trying to save Dataset to parquet 
local file cause OOM
{noformat}
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.967]  INFO [boundedElastic-2 
o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output Committer 
Algorithm version is 1
[20-Apr-2020 11:42:27.969]  INFO [boundedElastic-2 
o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using user defined output 
committer class org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.970]  INFO [boundedElastic-2 
o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output Committer 
Algorithm version is 1
[20-Apr-2020 11:42:27.973]  INFO [boundedElastic-2 
o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using output committer 
class org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:34.371]  INFO [boundedElastic-2 
org.apache.spark.SparkContext:57] q: - Starting job: save at LoaderImpl.java:305
[20-Apr-2020 11:42:34.389]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Got job 0 (save at 
LoaderImpl.java:305) with 1 output partitions
[20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0 
(save at LoaderImpl.java:305)
[20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Parents of final stage: List()
[20-Apr-2020 11:42:34.392]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Missing parents: 
List()[20-Apr-2020 11:42:34.398]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting ResultStage 0 
(MapPartitionsRDD[6] at save at LoaderImpl.java:305), which has no missing 
parents
[20-Apr-2020 11:42:34.634]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0 stored 
as values in memory (estimated size 166.1 KiB, free 18.4 GiB)
[20-Apr-2020 11:42:34.945]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0 
stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB)
[20-Apr-2020 11:42:34.949]  INFO [dispatcher-BlockManagerMaster 
org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 in 
memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB)
[20-Apr-2020 11:42:34.953]  INFO [dag-scheduler-event-loop 
org.apache.spark.SparkContext:57] q: - Created broadcast 0 from broadcast at 
DAGScheduler.scala:1206
[20-Apr-2020 11:42:34.980]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks 
from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) (first 
15 tasks are for partitions Vector(0))
[20-Apr-2020 11:42:34.981]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.TaskSchedulerImpl:57] q: - Adding task set 0.0 with 
1 tasks
Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError at 
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
 at 
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) at 
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
 at 
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) 
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
 at 
java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
 at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) at 
org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at 
com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at 
com.esotericsoftware.kryo.io.Output.close(Output.java:196) at 
org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:273)
 at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:158) at 
org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$writeObject$1(ParallelCollectionRDD.scala:65)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343) at 
org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51)
 at 

[jira] [Updated] (SPARK-31496) Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError

2020-04-20 Thread Tomas Shestakov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Shestakov updated SPARK-31496:

Description: 
Local spark with one core (local[1]) while trying to save Dataset to parquet 
local file cause OOM
{noformat}
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.967]  INFO [boundedElastic-2 
o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output Committer 
Algorithm version is 1
[20-Apr-2020 11:42:27.969]  INFO [boundedElastic-2 
o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using user defined output 
committer class org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:27.970]  INFO [boundedElastic-2 
o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - File Output Committer 
Algorithm version is 1
[20-Apr-2020 11:42:27.973]  INFO [boundedElastic-2 
o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using output committer 
class org.apache.parquet.hadoop.ParquetOutputCommitter
[20-Apr-2020 11:42:34.371]  INFO [boundedElastic-2 
org.apache.spark.SparkContext:57] q: - Starting job: save at LoaderImpl.java:305
[20-Apr-2020 11:42:34.389]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Got job 0 (save at 
LoaderImpl.java:305) with 1 output partitions
[20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0 
(save at LoaderImpl.java:305)
[20-Apr-2020 11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Parents of final stage: List()
[20-Apr-2020 11:42:34.392]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Missing parents: 
List()[20-Apr-2020 11:42:34.398]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting ResultStage 0 
(MapPartitionsRDD[6] at save at LoaderImpl.java:305), which has no missing 
parents
[20-Apr-2020 11:42:34.634]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0 stored 
as values in memory (estimated size 166.1 KiB, free 18.4 GiB)
[20-Apr-2020 11:42:34.945]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0 
stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB)
[20-Apr-2020 11:42:34.949]  INFO [dispatcher-BlockManagerMaster 
org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 in 
memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB)
[20-Apr-2020 11:42:34.953]  INFO [dag-scheduler-event-loop 
org.apache.spark.SparkContext:57] q: - Created broadcast 0 from broadcast at 
DAGScheduler.scala:1206
[20-Apr-2020 11:42:34.980]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks 
from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) (first 
15 tasks are for partitions Vector(0))
[20-Apr-2020 11:42:34.981]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.TaskSchedulerImpl:57] q: - Adding task set 0.0 with 
1 tasksException in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError 
at 
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
 at 
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) at 
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
 at 
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) 
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
 at 
java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
 at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) at 
org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at 
com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at 
com.esotericsoftware.kryo.io.Output.close(Output.java:196) at 
org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:273)
 at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:158) at 
org.apache.spark.rdd.ParallelCollectionPartition.$anonfun$writeObject$1(ParallelCollectionRDD.scala:65)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1343) at 
org.apache.spark.rdd.ParallelCollectionPartition.writeObject(ParallelCollectionRDD.scala:51)
 at 

[jira] [Created] (SPARK-31496) Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError

2020-04-20 Thread Tomas Shestakov (Jira)
Tomas Shestakov created SPARK-31496:
---

 Summary: Exception in thread "dispatcher-event-loop-1" 
java.lang.OutOfMemoryError
 Key: SPARK-31496
 URL: https://issues.apache.org/jira/browse/SPARK-31496
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
 Environment: Windows 10 (1909)

JDK 11.0.6

spark-3.0.0-preview2-bin-hadoop3.2

local[1]

 

 
Reporter: Tomas Shestakov


Local spark with one core (local[1]) while trying to save Dataset to parquet 
local file cause OOM
{noformat}
[20-Apr-2020 11:42:27.877]  INFO [boundedElastic-2 
o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - Using default output 
committer for Parquet: 
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.877]  
INFO [boundedElastic-2 o.a.s.s.e.datasources.parquet.ParquetFileFormat:57] q: - 
Using default output committer for Parquet: 
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.967]  
INFO [boundedElastic-2 o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - 
File Output Committer Algorithm version is 1[20-Apr-2020 11:42:27.969]  INFO 
[boundedElastic-2 o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using 
user defined output committer class 
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:27.970]  
INFO [boundedElastic-2 o.a.h.mapreduce.lib.output.FileOutputCommitter:108] q: - 
File Output Committer Algorithm version is 1[20-Apr-2020 11:42:27.973]  INFO 
[boundedElastic-2 o.a.s.s.e.d.SQLHadoopMapReduceCommitProtocol:57] q: - Using 
output committer class 
org.apache.parquet.hadoop.ParquetOutputCommitter[20-Apr-2020 11:42:34.371]  
INFO [boundedElastic-2 org.apache.spark.SparkContext:57] q: - Starting job: 
save at LoaderImpl.java:305[20-Apr-2020 11:42:34.389]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - Got 
job 0 (save at LoaderImpl.java:305) with 1 output partitions[20-Apr-2020 
11:42:34.390]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Final stage: ResultStage 0 
(save at LoaderImpl.java:305)[20-Apr-2020 11:42:34.390]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - 
Parents of final stage: List()[20-Apr-2020 11:42:34.392]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - 
Missing parents: List()[20-Apr-2020 11:42:34.398]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.DAGScheduler:57] q: - 
Submitting ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305), 
which has no missing parents[20-Apr-2020 11:42:34.634]  INFO 
[dag-scheduler-event-loop org.apache.spark.storage.memory.MemoryStore:57] q: - 
Block broadcast_0 stored as values in memory (estimated size 166.1 KiB, free 
18.4 GiB)[20-Apr-2020 11:42:34.945]  INFO [dag-scheduler-event-loop 
org.apache.spark.storage.memory.MemoryStore:57] q: - Block broadcast_0_piece0 
stored as bytes in memory (estimated size 58.0 KiB, free 18.4 GiB)[20-Apr-2020 
11:42:34.949]  INFO [dispatcher-BlockManagerMaster 
org.apache.spark.storage.BlockManagerInfo:57] q: - Added broadcast_0_piece0 in 
memory on DESKTOP-A1:58276 (size: 58.0 KiB, free: 18.4 GiB)[20-Apr-2020 
11:42:34.953]  INFO [dag-scheduler-event-loop org.apache.spark.SparkContext:57] 
q: - Created broadcast 0 from broadcast at DAGScheduler.scala:1206[20-Apr-2020 
11:42:34.980]  INFO [dag-scheduler-event-loop 
org.apache.spark.scheduler.DAGScheduler:57] q: - Submitting 1 missing tasks 
from ResultStage 0 (MapPartitionsRDD[6] at save at LoaderImpl.java:305) (first 
15 tasks are for partitions Vector(0))[20-Apr-2020 11:42:34.981]  INFO 
[dag-scheduler-event-loop org.apache.spark.scheduler.TaskSchedulerImpl:57] q: - 
Adding task set 0.0 with 1 tasksException in thread "dispatcher-event-loop-1" 
java.lang.OutOfMemoryError at 
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
 at 
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) at 
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
 at 
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) 
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
 at 
java.base/java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1859)
 at java.base/java.io.ObjectOutputStream.write(ObjectOutputStream.java:712) at 
org.apache.spark.util.Utils$$anon$2.write(Utils.scala:153) at 
com.esotericsoftware.kryo.io.Output.flush(Output.java:185) at 
com.esotericsoftware.kryo.io.Output.close(Output.java:196) at 
org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:273)
 at org.apache.spark.util.Utils$.serializeViaNestedStream(Utils.scala:158) at