[ https://issues.apache.org/jira/browse/SPARK-24787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579854#comment-16579854 ]
Thomas Graves commented on SPARK-24787: --------------------------------------- Yes it was caused by hsync, hsync has to go to the namenode in addition to the datanode, hflush is datanode only operation. We saw huge increase in dropped events with this on large jobs, we reverted the change and went back to only hflush and it stopped dropping. Talked to one of our hdfs experts and he said hsync is expensive. it might depend on how loaded your hdfs cluster is. > Events being dropped at an alarming rate due to hsync being slow for > eventLogging > --------------------------------------------------------------------------------- > > Key: SPARK-24787 > URL: https://issues.apache.org/jira/browse/SPARK-24787 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI > Affects Versions: 2.3.0, 2.3.1 > Reporter: Sanket Reddy > Priority: Minor > > [https://github.com/apache/spark/pull/16924/files] updates the length of the > inprogress files allowing history server being responsive. > Although we have a production job that has 60000 tasks per stage and due to > hsync being slow it starts dropping events and the history server has wrong > stats due to events being dropped. > A viable solution is not to make it sync very frequently or make it > configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org