[jira] [Issue Comment Deleted] (SPARK-29322) History server is stuck reading incomplete event log file compressed with zstd
[ https://issues.apache.org/jira/browse/SPARK-29322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-29322: - Comment: was deleted (was: For event log, we seem to still use "com.github.luben:zstd-jni" unless specified manually. https://github.com/apache/spark/blob/3b1674cb1f244598463e879477d89632b0817578/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L74-L79 https://github.com/apache/spark/blob/3b1674cb1f244598463e879477d89632b0817578/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala#L54-L67 https://github.com/apache/spark/blob/3b1674cb1f244598463e879477d89632b0817578/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala#L200-L236) > History server is stuck reading incomplete event log file compressed with zstd > -- > > Key: SPARK-29322 > URL: https://issues.apache.org/jira/browse/SPARK-29322 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Priority: Major > Attachments: history-server-1.jstack, history-server-2.jstack, > history-server-3.jstack, history-server-4.jstack > > > While working on SPARK-28869, I've discovered the issue that reading > inprogress event log file on zstd compression could lead the thread being > stuck. I just experimented with Spark History Server and observed same issue. > I'll attach the jstack files. > This is very easy to reproduce: setting configuration as below > - spark.eventLog.enabled=true > - spark.eventLog.compress=true > - spark.eventLog.compression.codec=zstd > and start Spark application. While the application is running, load the > application in SHS webpage. It may succeed to replay the event log, but high > likely it will be stuck and loading page will be also stuck. > Only listing the thread stack trace being stuck across jstack files: > {code} > 2019-10-02 11:32:36 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode): > ... > "qtp2072313080-30" #30 daemon prio=5 os_prio=31 tid=0x7ff5b90e7800 > nid=0x9703 runnable [0x7f22] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:156) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x0007b5f97c60> (a > org.apache.hadoop.fs.BufferedFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:257) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:228) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196) > - locked <0x0007b5f97b58> (a > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker) > at java.io.DataInputStream.read(DataInputStream.java:149) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x0007b5f97af8> (a java.io.BufferedInputStream) > at > com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:129) > at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:107) > - locked <0x0007b5f97ac0> (a com.github.luben.zstd.ZstdInputStream) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x0007b5cd3bd0> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x0007b5f94a00> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x0007b5f94a00> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74) >
[jira] [Issue Comment Deleted] (SPARK-29322) History server is stuck reading incomplete event log file compressed with zstd
[ https://issues.apache.org/jira/browse/SPARK-29322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-29322: - Comment: was deleted (was: I'll work on PR to propose removing zstd from supported compressions for event log. We may want to apply another approach: we can discuss further in the PR.) > History server is stuck reading incomplete event log file compressed with zstd > -- > > Key: SPARK-29322 > URL: https://issues.apache.org/jira/browse/SPARK-29322 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Priority: Major > Attachments: history-server-1.jstack, history-server-2.jstack, > history-server-3.jstack, history-server-4.jstack > > > While working on SPARK-28869, I've discovered the issue that reading > inprogress event log file on zstd compression could lead the thread being > stuck. I just experimented with Spark History Server and observed same issue. > I'll attach the jstack files. > Only listing the thread stack trace being stuck across jstack files: > {code} > 2019-10-02 11:32:36 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode): > ... > "qtp2072313080-30" #30 daemon prio=5 os_prio=31 tid=0x7ff5b90e7800 > nid=0x9703 runnable [0x7f22] >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.readBytes(Native Method) > at java.io.FileInputStream.read(FileInputStream.java:255) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:156) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x0007b5f97c60> (a > org.apache.hadoop.fs.BufferedFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:257) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:228) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196) > - locked <0x0007b5f97b58> (a > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker) > at java.io.DataInputStream.read(DataInputStream.java:149) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x0007b5f97af8> (a java.io.BufferedInputStream) > at > com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:129) > at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:107) > - locked <0x0007b5f97ac0> (a com.github.luben.zstd.ZstdInputStream) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > - locked <0x0007b5cd3bd0> (a java.io.BufferedInputStream) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > - locked <0x0007b5f94a00> (a java.io.InputStreamReader) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.readLine(BufferedReader.java:324) > - locked <0x0007b5f94a00> (a java.io.InputStreamReader) > at java.io.BufferedReader.readLine(BufferedReader.java:389) > at > scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74) > at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:884) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:80) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$rebuildAppStore$5(FsHistoryProvider.scala:976) > at > org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$rebuildAppStore$5$adapted(FsHistoryProvider.scala:975) > at > org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$662/1267867461.apply(Unknown > Source) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2567) > at > org.apac