[jira] [Updated] (SPARK-30294) Read-only state store unnecessarily creates and deletes the temp file for delta file every batch

Jungtaek Lim (Jira) Sat, 19 Sep 2020 01:40:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-30294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jungtaek Lim updated SPARK-30294:
---------------------------------
    Issue Type: Improvement  (was: Bug)

> Read-only state store unnecessarily creates and deletes the temp file for 
> delta file every batch
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30294
>                 URL: https://issues.apache.org/jira/browse/SPARK-30294
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Priority: Minor
>
> [https://github.com/apache/spark/blob/d38f8167483d4d79e8360f24a8c0bffd51460659/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L143-L155]
> {code:java}
>     /** Abort all the updates made on this store. This store will not be 
> usable any more. */
>     override def abort(): Unit = {
>       // This if statement is to ensure that files are deleted only if there 
> are changes to the
>       // StateStore. We have two StateStores for each task, one which is used 
> only for reading, and
>       // the other used for read+write. We don't want the read-only to delete 
> state files.
>       if (state == UPDATING) {
>         state = ABORTED
>         cancelDeltaFile(compressedStream, deltaFileStream)
>       } else {
>         state = ABORTED
>       }
>       logInfo(s"Aborted version $newVersion for $this")
>     } {code}
> Despite of the comment, read-only state store also does the same things for 
> preparing write - creates the temporary file, initializes output streams for 
> the file, closes these output streams, and deletes the temporary file. That 
> is just unnecessary and gives confusion as according to the log messages two 
> different instances seem to write to same delta file.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30294) Read-only state store unnecessarily creates and deletes the temp file for delta file every batch

Reply via email to