[ https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665690#comment-16665690 ]
Thomas Graves commented on SPARK-25855: --------------------------------------- it seems like it depends on whether you care to see the event logs before its finished. If you are using the driver UI then generally people would use it while its running and once its finished it sounds like it would show up and you could see from history server. So probably not a problem there. But if you are using history server to view all UI's and expect logs to be there, it would be a big problem. So it does sound like its better off by default as to not confuse users. Were you going to make it configurable? > Don't use Erasure Coding for event log files > -------------------------------------------- > > Key: SPARK-25855 > URL: https://issues.apache.org/jira/browse/SPARK-25855 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Imran Rashid > Priority: Major > > While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a > bug with the event logs. The main issue was a bug in hdfs (HDFS-14027), but > it did make us wonder whether Spark should be using EC for event log files in > general. Its a poor choice because EC currently implements {{hflush()}} or > {{hsync()}} as no-ops, which mean you won't see anything in your event logs > until the app is complete. That isn't necessarily a bug, but isn't really > great. So I think we should ensure EC is always off for event logs. > IIUC there is *not* a problem with applications which die without properly > closing the outputstream. It'll take a while for the NN to realize the > client is gone and finish the block, but the data should get there eventually. > Also related are SPARK-24787 & SPARK-19531. > The space savings from EC would be nice as the event logs can get somewhat > large, but I think other factors outweigh this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org