[jira] [Updated] (SPARK-21370) Avoid doing anything on HDFSBackedStateStore.abort() when there are no updates to commit
[ https://issues.apache.org/jira/browse/SPARK-21370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-21370: Description: During Streaming Aggregation, we have two StateStores per task, one used as read-only in StateStoreRestoreExec, and one read-write used in `StateStoreSaveExec`. `StateStore.abort` will be called for these StateStores if they haven't committed their results. We need to make sure that abort in read-only store after a commit in the read-write store doesn't accidentally lead to the deletion of state. This JIRA proposes a test for this case. StateStore implementations should successfully handle this use case. was: Currently the HDFSBackedStateStore sets it's state as UPDATING as it is initialized. For every trigger, we create two state stores, one used by "StateStoreRestore" operator to only read data and one by "StateStoreSave" operator to write updates. So, the "Restore" StateStore is read-only. This state store gets "aborted" after a task is completed, and this abort attempts to delete files This can be avoided if there is an INITIALIZED state and abort deletes files only when there is an update to the state store using "put" or "remove". > Avoid doing anything on HDFSBackedStateStore.abort() when there are no > updates to commit > > > Key: SPARK-21370 > URL: https://issues.apache.org/jira/browse/SPARK-21370 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 2.1.1 >Reporter: Burak Yavuz >Assignee: Burak Yavuz >Priority: Minor > > During Streaming Aggregation, we have two StateStores per task, one used as > read-only in StateStoreRestoreExec, and one read-write used in > `StateStoreSaveExec`. `StateStore.abort` will be called for these StateStores > if they haven't committed their results. We need to make sure that abort in > read-only store after a commit in the read-write store doesn't > accidentally lead to the deletion of state. > This JIRA proposes a test for this case. StateStore implementations should > successfully handle this use case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21370) Avoid doing anything on HDFSBackedStateStore.abort() when there are no updates to commit
[ https://issues.apache.org/jira/browse/SPARK-21370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz updated SPARK-21370: Issue Type: Test (was: Improvement) > Avoid doing anything on HDFSBackedStateStore.abort() when there are no > updates to commit > > > Key: SPARK-21370 > URL: https://issues.apache.org/jira/browse/SPARK-21370 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 2.1.1 >Reporter: Burak Yavuz >Assignee: Burak Yavuz >Priority: Minor > > Currently the HDFSBackedStateStore sets it's state as UPDATING as it is > initialized. > For every trigger, we create two state stores, one used by > "StateStoreRestore" operator to only read data and one by "StateStoreSave" > operator to write updates. So, the "Restore" StateStore is read-only. This > state store gets "aborted" after a task is completed, and this abort attempts > to delete files > This can be avoided if there is an INITIALIZED state and abort deletes files > only when there is an update to the state store using "put" or "remove". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21370) Avoid doing anything on HDFSBackedStateStore.abort() when there are no updates to commit
[ https://issues.apache.org/jira/browse/SPARK-21370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-21370: -- Priority: Minor (was: Major) > Avoid doing anything on HDFSBackedStateStore.abort() when there are no > updates to commit > > > Key: SPARK-21370 > URL: https://issues.apache.org/jira/browse/SPARK-21370 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.1.1 >Reporter: Burak Yavuz >Assignee: Burak Yavuz >Priority: Minor > > Currently the HDFSBackedStateStore sets it's state as UPDATING as it is > initialized. > For every trigger, we create two state stores, one used by > "StateStoreRestore" operator to only read data and one by "StateStoreSave" > operator to write updates. So, the "Restore" StateStore is read-only. This > state store gets "aborted" after a task is completed, and this abort attempts > to delete files > This can be avoided if there is an INITIALIZED state and abort deletes files > only when there is an update to the state store using "put" or "remove". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21370) Avoid doing anything on HDFSBackedStateStore.abort() when there are no updates to commit
[ https://issues.apache.org/jira/browse/SPARK-21370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-21370: -- Description: Currently the HDFSBackedStateStore sets it's state as UPDATING as it is initialized. For every trigger, we create two state stores, one used by "StateStoreRestore" operator to only read data and one by "StateStoreSave" operator to write updates. So, the "Restore" StateStore is read-only. This state store gets "aborted" after a task is completed, and this abort attempts to delete files This can be avoided if there is an INITIALIZED state and abort deletes files only when there is an update to the state store using "put" or "remove". was: Currently the HDFSBackedStateStore sets it's state as UPDATING as it is initialized. For every trigger, we create two state stores, one used during "Restore" and one during "Save". The "Restore" StateStore is read-only. This state store gets "aborted" after a task is completed, which results in a file being created and immediately deleted. This can be avoided if there is an INITIALIZED state and abort deletes files only when there is an update to the state store using "put" or "remove". > Avoid doing anything on HDFSBackedStateStore.abort() when there are no > updates to commit > > > Key: SPARK-21370 > URL: https://issues.apache.org/jira/browse/SPARK-21370 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.1.1 >Reporter: Burak Yavuz >Assignee: Burak Yavuz > > Currently the HDFSBackedStateStore sets it's state as UPDATING as it is > initialized. > For every trigger, we create two state stores, one used by > "StateStoreRestore" operator to only read data and one by "StateStoreSave" > operator to write updates. So, the "Restore" StateStore is read-only. This > state store gets "aborted" after a task is completed, and this abort attempts > to delete files > This can be avoided if there is an INITIALIZED state and abort deletes files > only when there is an update to the state store using "put" or "remove". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21370) Avoid doing anything on HDFSBackedStateStore.abort() when there are no updates to commit
[ https://issues.apache.org/jira/browse/SPARK-21370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-21370: -- Summary: Avoid doing anything on HDFSBackedStateStore.abort() when there are no updates to commit (was: Clarify In-Memory State Store purpose (read-only, read-write) with an additional state) > Avoid doing anything on HDFSBackedStateStore.abort() when there are no > updates to commit > > > Key: SPARK-21370 > URL: https://issues.apache.org/jira/browse/SPARK-21370 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.1.1 >Reporter: Burak Yavuz >Assignee: Burak Yavuz > > Currently the HDFSBackedStateStore sets it's state as UPDATING as it is > initialized. > For every trigger, we create two state stores, one used during "Restore" and > one during "Save". The "Restore" StateStore is read-only. This state store > gets "aborted" after a task is completed, which results in a file being > created and immediately deleted. > This can be avoided if there is an INITIALIZED state and abort deletes files > only when there is an update to the state store using "put" or "remove". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org