[ https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kent Yao resolved SPARK-46330. ------------------------------ Fix Version/s: 3.4.3 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 44260 [https://github.com/apache/spark/pull/44260] > Loading of Spark UI blocks for a long time when HybridStore enabled > ------------------------------------------------------------------- > > Key: SPARK-46330 > URL: https://issues.apache.org/jira/browse/SPARK-46330 > Project: Spark > Issue Type: Bug > Components: UI > Affects Versions: 3.1.2, 3.3.1 > Reporter: Zhou Yifan > Assignee: Zhou Yifan > Priority: Major > Labels: pull-request-available > Fix For: 3.4.3, 3.5.1, 4.0.0 > > > In our SparkHistoryServer, we used these two property to speed up Spark UI's > loading: > {code:java} > spark.history.store.hybridStore.enabled true > spark.history.store.hybridStore.maxMemoryUsage 16g {code} > Occasionally, we found it took minutes to load a small eventlog which usually > took seconds. > In the jstack output of SparkHistoryServer, we found that 4 threads were > blocked and waiting to lock > *org.apache.spark.deploy.history.FsHistoryProvider* object monitor, which was > locked by thread "spark-history-task-0" closing a HybridStore. > {code:java} > "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 > tid=0x00007f4044042800 nid=0x8d98 waiting for monitor entry > [0x00007f3f64760000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386) > - waiting to lock <0x00000004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194) > at > org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182) > at > org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown > Source) > at > org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154) > at > org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71) > at > org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > - locked <0x000000066effc3e8> (a > org.sparkproject.guava.cache.LocalCache$StrongAccessEntry) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108) > at > org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120) > at > org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251) > at > org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99) > "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x00007f431e55b800 > nid=0x1ac6 in Object.wait() [0x00007f41b2cc9000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1252) > - locked <0x000000063ccbc9f0> (a java.lang.Thread) > at java.lang.Thread.join(Thread.java:1326) > at > org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106) > at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911) > at > org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown > Source) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911) > - locked <0x00000004c64433f0> (a > org.apache.spark.deploy.history.FsHistoryProvider) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7$adapted(FsHistoryProvider.scala:498){code} > > *HybridStore#close,* may took long if there was still a lot of data waiting > to be written to disk when closing. > I tried a 1.64 GB eventlog. It took 93944 ms to write all data to disk. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org