[ https://issues.apache.org/jira/browse/SPARK-30294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim updated SPARK-30294: --------------------------------- Issue Type: Improvement (was: Bug) > Read-only state store unnecessarily creates and deletes the temp file for > delta file every batch > ------------------------------------------------------------------------------------------------ > > Key: SPARK-30294 > URL: https://issues.apache.org/jira/browse/SPARK-30294 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.0.0 > Reporter: Jungtaek Lim > Priority: Minor > > [https://github.com/apache/spark/blob/d38f8167483d4d79e8360f24a8c0bffd51460659/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L143-L155] > {code:java} > /** Abort all the updates made on this store. This store will not be > usable any more. */ > override def abort(): Unit = { > // This if statement is to ensure that files are deleted only if there > are changes to the > // StateStore. We have two StateStores for each task, one which is used > only for reading, and > // the other used for read+write. We don't want the read-only to delete > state files. > if (state == UPDATING) { > state = ABORTED > cancelDeltaFile(compressedStream, deltaFileStream) > } else { > state = ABORTED > } > logInfo(s"Aborted version $newVersion for $this") > } {code} > Despite of the comment, read-only state store also does the same things for > preparing write - creates the temporary file, initializes output streams for > the file, closes these output streams, and deletes the temporary file. That > is just unnecessary and gives confusion as according to the log messages two > different instances seem to write to same delta file. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org