[ https://issues.apache.org/jira/browse/SPARK-47150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949835#comment-17949835 ]
Parag Kesar commented on SPARK-47150: ------------------------------------- [~steven.aerts], thanks for addressing the issue in https://issues.apache.org/jira/browse/SPARK-49872 [via https://github.com/apache/spark/pull/49163.|https://github.com/apache/spark/pull/49163.] Will it apply to Spark 3.5 as well? > String length (...) exceeds the maximum length (20000000) > --------------------------------------------------------- > > Key: SPARK-47150 > URL: https://issues.apache.org/jira/browse/SPARK-47150 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 3.5.0 > Reporter: Sergii Mikhtoniuk > Priority: Minor > > Upgrading to Spark 3.5.0 introduced a regression for us where our query > gateway (Livy) fails with an error: > {code:java} > com.fasterxml.jackson.core.exc.StreamConstraintsException: String length > (20054016) exceeds the maximum length (20000000) > (sorry, unable to provide full stack trace){code} > The root of this problem is the breaking change in {{jackson}} that (in the > name of "safety") introduced some JSON size limits, see: > [https://github.com/FasterXML/jackson-core/issues/1014] > Looks like {{JSONOptions}} in Spark already [support configuring this > limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58], > but there seems to be no way to set it globally or pass it down to > [{{DataFrame::toJSON()}}|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.toJSON.html] > which our Apache Livy server is using when transmitting data. > Livy is an old project and transferring dataframes via JSON is super > inefficient, and we really should move to something like Spark Connect, but I > believe this issue can happen to many people working with basic GeoJSON data. > Spark can handle very large strings, and this arbitrary limit just gets in a > way of output serialization for no good reason. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org