[jira] [Commented] (SPARK-47150) String length (...) exceeds the maximum length (20000000)

Parag Kesar (Jira) Tue, 06 May 2025 13:52:13 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-47150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949835#comment-17949835
 ]


Parag Kesar commented on SPARK-47150:
-------------------------------------

[~steven.aerts], thanks for addressing the issue in 
https://issues.apache.org/jira/browse/SPARK-49872 [via  
https://github.com/apache/spark/pull/49163.|https://github.com/apache/spark/pull/49163.]
 Will it apply to Spark 3.5 as well?

> String length (...) exceeds the maximum length (20000000)
> ---------------------------------------------------------
>
>                 Key: SPARK-47150
>                 URL: https://issues.apache.org/jira/browse/SPARK-47150
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 3.5.0
>            Reporter: Sergii Mikhtoniuk
>            Priority: Minor
>
> Upgrading to Spark 3.5.0 introduced a regression for us where our query 
> gateway (Livy) fails with an error:
> {code:java}
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length 
> (20054016) exceeds the maximum length (20000000)
> (sorry, unable to provide full stack trace){code}
> The root of this problem is the breaking change in {{jackson}} that (in the 
> name of "safety") introduced some JSON size limits, see: 
> [https://github.com/FasterXML/jackson-core/issues/1014]
> Looks like {{JSONOptions}} in Spark already [support configuring this 
> limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
>  but there seems to be no way to set it globally or pass it down to 
> [{{DataFrame::toJSON()}}|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.toJSON.html]
>  which our Apache Livy server is using when transmitting data.
> Livy is an old project and transferring dataframes via JSON is super 
> inefficient, and we really should move to something like Spark Connect, but I 
> believe this issue can happen to many people working with basic GeoJSON data.
> Spark can handle very large strings, and this arbitrary limit just gets in a 
> way of output serialization for no good reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47150) String length (...) exceeds the maximum length (20000000)

Reply via email to