Hi devs,

I've discovered an issue with event logger, specifically reading incomplete
event log file which is compressed with 'zstd' - the reader thread got
stuck on reading that file.

This is very easy to reproduce: setting configuration as below

- spark.eventLog.enabled=true
- spark.eventLog.compress=true
- spark.eventLog.compression.codec=zstd

and start Spark application. While the application is running, load the
application in SHS webpage. It may succeed to replay the event log, but
high likely it will be stuck and loading page will be also stuck.

Please refer SPARK-29322 for more details.

As the issue only occurs with 'zstd', the simplest approach is dropping
support of 'zstd' for event log. More general approach would be introducing
timeout on reading event log file, but it should be able to differentiate
thread being stuck vs thread busy with reading huge event log file.

Which approach would be preferred in Spark community, or would someone
propose better ideas for handling this?

Thanks,
Jungtaek Lim (HeartSaVioR)

Reply via email to