Hi,

I am running spark job.

Master memory - 5G
executor memort 10G(running on 4 node)

My job is getting killed as no of partition increase to 20K.

16/07/18 14:53:13 INFO DAGScheduler: Got job 17 (foreachPartition at
WriteToKafka.java:45) with 13524 output partitions (allowLocal=false)
16/07/18 14:53:13 INFO DAGScheduler: Final stage: ResultStage
640(foreachPartition at WriteToKafka.java:45)
16/07/18 14:53:13 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 518, ShuffleMapStage 639)
16/07/18 14:53:23 INFO DAGScheduler: Missing parents: List()
16/07/18 14:53:23 INFO DAGScheduler: Submitting ResultStage 640
(MapPartitionsRDD[271] at map at BuildSolrDocs.java:209), which has no
missing
parents
16/07/18 14:53:23 INFO MemoryStore: ensureFreeSpace(8248) called with
curMem=41923262, maxMem=2778778828
16/07/18 14:53:23 INFO MemoryStore: Block broadcast_90 stored as values in
memory (estimated size 8.1 KB, free 2.5 GB)
Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError:
Java heap space
        at
org.apache.spark.util.io.ByteArrayChunkOutputStream.allocateNewChunkIfNeeded(ByteArrayChunkOutputStream.scala:66)
        at
org.apache.spark.util.io.ByteArrayChunkOutputStream.write(ByteArrayChunkOutputStream.scala:55)
        at
org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294)
        at
org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:273)
        at
org.apache.spark.io.SnappyOutputStreamWrapper.flush(CompressionCodec.scala:197)
        at
java.io.ObjectOutputStream$BlockDataOutputStream.flush(ObjectOutputStream.java:1822)


Help needed.

-- 
Thanks and Regards,

Saurav Sinha

Contact: 9742879062

Reply via email to