GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/20673
[SPARK-23515] Use input/output streams for large events in JsonProtocol.sparkEventToJson ## What changes were proposed in this pull request? `def sparkEventToJson(event: SparkListenerEvent)` has a fallback method which creates a JSON object by turning an unrecognized event to Json and then parsing it again. This method materializes the whole string to parse the json record, which is unnecessary, and can cause OOMs as seen in the stack trace below: ``` java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.String.<init>(String.java:207) at java.lang.StringBuilder.toString(StringBuilder.java:407) at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235) at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20) at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42) at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3736) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726) at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:20) at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:50) at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:103) ``` We should just use the stream parsing to avoid such OOMs. ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/brkyvz/spark eventLoggingJson Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20673.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20673 ---- commit 774188003c5b1c1a000d69f5996dce580c7a1432 Author: Burak Yavuz <brkyvz@...> Date: 2018-02-25T20:07:22Z use streams for large events ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org