GitHub user brkyvz opened a pull request:

    https://github.com/apache/spark/pull/20673

    [SPARK-23515] Use input/output streams for large events in 
JsonProtocol.sparkEventToJson

    ## What changes were proposed in this pull request?
    
    `def sparkEventToJson(event: SparkListenerEvent)`
    
    has a fallback method which creates a JSON object by turning an 
unrecognized event to Json and then parsing it again. This method materializes 
the whole string to parse the json record, which is unnecessary, and can cause 
OOMs as seen in the stack trace below:
    
    ```
    java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at java.lang.StringBuilder.toString(StringBuilder.java:407)
    at 
com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356)
    at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235)
    at 
org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20)
    at 
org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
    at 
org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
    at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3736)
    at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726)
    at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:20)
    at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:50)
    at 
org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:103)
    ```
    
    We should just use the stream parsing to avoid such OOMs.
    
    ## How was this patch tested?
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brkyvz/spark eventLoggingJson

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20673
    
----
commit 774188003c5b1c1a000d69f5996dce580c7a1432
Author: Burak Yavuz <brkyvz@...>
Date:   2018-02-25T20:07:22Z

    use streams for large events

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to