[jira] [Commented] (SPARK-49872) Spark History UI -- StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000)

Rhys Jones (Jira) Thu, 12 Dec 2024 06:17:19 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-49872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905197#comment-17905197
 ]


Rhys Jones commented on SPARK-49872:
------------------------------------

I'd +1 the approach in [~steven.aerts] PR here. We're running a single SHS 
cluster in HA on Spark 3.5.3 reading from a very large Spark event log 
distribution (~75-80k log files). We allow our clients any version of Spark to 
run on our cluster from 2.3 -> 3.5.3, and so this bug has hit us particularly 
on some of the more current versions of Spark logs where AQE and physical plans 
seem to generate some pretty large json blobs within the logs. Having the 
ability to be backwards- and forwards-compatible by disabling the check, as 
opposed to coming up with some sort of larger and larger integer to configure 
this check with seems more accommodating.

> Spark History UI -- StreamConstraintsException: String length (20054016) 
> exceeds the maximum length (20000000)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-49872
>                 URL: https://issues.apache.org/jira/browse/SPARK-49872
>             Project: Spark
>          Issue Type: Bug
>          Components: UI
>    Affects Versions: 3.5.3
>            Reporter: Anthony Sgro
>            Priority: Major
>              Labels: pull-request-available
>
> There is an issue with the Spark History UI with large amounts of event logs.
> The root of this problem is the breaking change in jackson that (in the name 
> of "safety") introduced some JSON size limits, see: 
> [https://github.com/FasterXML/jackson-core/issues/1014]
> Looks like {{JSONOptions}} in Spark already [support configuring this 
> limit|https://github.com/apache/spark/blob/c2dbb6d04bc9c781fb4a7673e5acf2c67b99c203/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L55-L58],
>  but there seems to be no way to set it globally.
> Spark should be able to handle strings of arbitrary length. I have tried 
> configuring rolling event logs, pruning event logs, etc. but this issue is 
> not fixed or causes so much data loss that the spark history ui is completely 
> useless.
> Perhaps a solution could be to add a config like:
> {code:java}
> spark.history.server.jsonStreamReadConstraints.maxStringLength=<new_value> 
> {code}
> This has a workaround for reading JSON during your application:
> {code:java}
> spark.read.option("maxStringLen", 100000000).json(path) {code}
> But this is not an option for accessing the Spark History UI. Here is the 
> full stack trace
> {code:java}
> HTTP ERROR 500 
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length
>  (20054016) exceeds the maximum length (20000000)
> URI:/history/application_1728009195451_0002/1/jobs/
> STATUS:500
> MESSAGE:com.fasterxml.jackson.core.exc.StreamConstraintsException: String 
> length (20054016) exceeds the maximum length (20000000)
> SERVLET:org.apache.spark.deploy.history.HistoryServer$$anon$1-582a764a
> CAUSED BY:com.fasterxml.jackson.core.exc.StreamConstraintsException: String 
> length (20054016) exceeds the maximum length (20000000) 
> com.fasterxml.jackson.core.exc.StreamConstraintsException: String length 
> (20054016) exceeds the maximum length (20000000)
>       at 
> com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:324)
>       at 
> com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
>       at 
> com.fasterxml.jackson.core.util.TextBuffer.finishCurrentSegment(TextBuffer.java:939)
>       at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2240)
>       at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2206)
>       at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:323)
>       at 
> com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer._deserializeContainerNoRecursion(JsonNodeDeserializer.java:572)
>       at 
> com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:100)
>       at 
> com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:25)
>       at 
> com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
>       at 
> com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4867)
>       at 
> com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3219)
>       at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:927)
>       at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:88)
>       at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:59)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3(FsHistoryProvider.scala:1143)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3$adapted(FsHistoryProvider.scala:1141)
>       at 
> org.apache.spark.util.SparkErrorUtils.tryWithResource(SparkErrorUtils.scala:48)
>       at 
> org.apache.spark.util.SparkErrorUtils.tryWithResource$(SparkErrorUtils.scala:46)
>       at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:95)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1141)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1139)
>       at scala.collection.immutable.List.foreach(List.scala:431)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1139)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.rebuildAppStore(FsHistoryProvider.scala:1120)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.createInMemoryStore(FsHistoryProvider.scala:1358)
>       at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:347)
>       at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:199)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:163)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:134)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:161)
>       at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:55)
>       at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:51)
>       at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>       at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>       at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>       at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>       at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
>       at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>       at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:88)
>       at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:100)
>       at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:256)
>       at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:104)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
>       at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
>       at 
> org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
>       at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>       at 
> org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>       at 
> org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
>       at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
>       at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>       at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
>       at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>       at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
>       at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>       at 
> org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
>       at 
> org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
>       at 
> org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
>       at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>       at org.apache.spark.ui.ProxyRedirectHandler.handle(JettyUtils.scala:582)
>       at 
> org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>       at org.sparkproject.jetty.server.Server.handle(Server.java:516)
>       at 
> org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
>       at 
> org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
>       at 
> org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
>       at 
> org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
>       at 
> org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>       at 
> org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
>       at 
> org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
>       at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
>       at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
>       at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
>       at 
> org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
>       at 
> org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
>       at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
>       at 
> org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
>       at java.lang.Thread.run(Thread.java:750){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-49872) Spark History UI -- StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000)

Reply via email to