[jira] [Issue Comment Deleted] (SPARK-29322) History server is stuck reading incomplete event log file compressed with zstd

2019-10-01 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-29322:
-
Comment: was deleted

(was: For event log, we seem to still use "com.github.luben:zstd-jni" unless 
specified manually.

https://github.com/apache/spark/blob/3b1674cb1f244598463e879477d89632b0817578/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L74-L79

https://github.com/apache/spark/blob/3b1674cb1f244598463e879477d89632b0817578/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala#L54-L67

https://github.com/apache/spark/blob/3b1674cb1f244598463e879477d89632b0817578/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala#L200-L236)

> History server is stuck reading incomplete event log file compressed with zstd
> --
>
> Key: SPARK-29322
> URL: https://issues.apache.org/jira/browse/SPARK-29322
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Major
> Attachments: history-server-1.jstack, history-server-2.jstack, 
> history-server-3.jstack, history-server-4.jstack
>
>
> While working on SPARK-28869, I've discovered the issue that reading 
> inprogress event log file on zstd compression could lead the thread being 
> stuck. I just experimented with Spark History Server and observed same issue. 
> I'll attach the jstack files.
> This is very easy to reproduce: setting configuration as below
> - spark.eventLog.enabled=true
> - spark.eventLog.compress=true
> - spark.eventLog.compression.codec=zstd
> and start Spark application. While the application is running, load the 
> application in SHS webpage. It may succeed to replay the event log, but high 
> likely it will be stuck and loading page will be also stuck.
> Only listing the thread stack trace being stuck across jstack files:
> {code}
> 2019-10-02 11:32:36
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode):
> ...
> "qtp2072313080-30" #30 daemon prio=5 os_prio=31 tid=0x7ff5b90e7800 
> nid=0x9703 runnable [0x7f22]
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:156)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0x0007b5f97c60> (a 
> org.apache.hadoop.fs.BufferedFSInputStream)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:257)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276)
>   at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:228)
>   at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196)
>   - locked <0x0007b5f97b58> (a 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0x0007b5f97af8> (a java.io.BufferedInputStream)
>   at 
> com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:129)
>   at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:107)
>   - locked <0x0007b5f97ac0> (a com.github.luben.zstd.ZstdInputStream)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0x0007b5cd3bd0> (a java.io.BufferedInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <0x0007b5f94a00> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.readLine(BufferedReader.java:324)
>   - locked <0x0007b5f94a00> (a java.io.InputStreamReader)
>   at java.io.BufferedReader.readLine(BufferedReader.java:389)
>   at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
> 

[jira] [Issue Comment Deleted] (SPARK-29322) History server is stuck reading incomplete event log file compressed with zstd

2019-10-01 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-29322:
-
Comment: was deleted

(was: I'll work on PR to propose removing zstd from supported compressions for 
event log. We may want to apply another approach: we can discuss further in the 
PR.)

> History server is stuck reading incomplete event log file compressed with zstd
> --
>
> Key: SPARK-29322
> URL: https://issues.apache.org/jira/browse/SPARK-29322
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Major
> Attachments: history-server-1.jstack, history-server-2.jstack, 
> history-server-3.jstack, history-server-4.jstack
>
>
> While working on SPARK-28869, I've discovered the issue that reading 
> inprogress event log file on zstd compression could lead the thread being 
> stuck. I just experimented with Spark History Server and observed same issue. 
> I'll attach the jstack files.
> Only listing the thread stack trace being stuck across jstack files:
> {code}
> 2019-10-02 11:32:36
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode):
> ...
> "qtp2072313080-30" #30 daemon prio=5 os_prio=31 tid=0x7ff5b90e7800 
> nid=0x9703 runnable [0x7f22]
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:156)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0x0007b5f97c60> (a 
> org.apache.hadoop.fs.BufferedFSInputStream)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:436)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:257)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:276)
>   at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:228)
>   at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196)
>   - locked <0x0007b5f97b58> (a 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0x0007b5f97af8> (a java.io.BufferedInputStream)
>   at 
> com.github.luben.zstd.ZstdInputStream.readInternal(ZstdInputStream.java:129)
>   at com.github.luben.zstd.ZstdInputStream.read(ZstdInputStream.java:107)
>   - locked <0x0007b5f97ac0> (a com.github.luben.zstd.ZstdInputStream)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0x0007b5cd3bd0> (a java.io.BufferedInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <0x0007b5f94a00> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.readLine(BufferedReader.java:324)
>   - locked <0x0007b5f94a00> (a java.io.InputStreamReader)
>   at java.io.BufferedReader.readLine(BufferedReader.java:389)
>   at 
> scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
>   at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:884)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511)
>   at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:80)
>   at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$rebuildAppStore$5(FsHistoryProvider.scala:976)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$rebuildAppStore$5$adapted(FsHistoryProvider.scala:975)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$662/1267867461.apply(Unknown
>  Source)
>   at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2567)
>   at 
> org.apac