[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2024-07-10 Thread Weijie Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-12692:
---
Fix Version/s: 2.0.0
   (was: 1.20.0)

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2024-03-11 Thread lincoln lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee updated FLINK-12692:

Fix Version/s: (was: 1.19.0)

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 1.20.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2024-03-11 Thread lincoln lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee updated FLINK-12692:

Fix Version/s: 1.20.0

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 1.19.0, 1.20.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2023-10-13 Thread Jing Ge (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Ge updated FLINK-12692:

Fix Version/s: 1.19.0
   (was: 1.18.0)

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 1.19.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2023-03-23 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-12692:
-
Fix Version/s: 1.18.0
   (was: 1.17.0)

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 1.18.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2022-08-22 Thread Xingbo Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingbo Huang updated FLINK-12692:
-
Fix Version/s: 1.17.0
   (was: 1.16.0)

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 1.17.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2022-04-12 Thread Yun Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Gao updated FLINK-12692:

Fix Version/s: 1.16.0

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 1.15.0, 1.16.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2021-09-28 Thread Xintong Song (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song updated FLINK-12692:
-
Fix Version/s: (was: 1.14.0)
   1.15.0

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
> Fix For: 1.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2021-06-18 Thread Flink Jira Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-12692:
---
Labels: pull-request-available stale-assigned  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 14, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it. If the "warning_label" label is not removed in 7 days, the 
issue will be automatically unassigned.


> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>  Labels: pull-request-available, stale-assigned
> Fix For: 1.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2021-04-29 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-12692:
-
Fix Version/s: (was: 1.13.0)
   1.14.0

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2020-12-07 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-12692:
---
Fix Version/s: (was: 1.12.0)
   1.13.0

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2020-05-18 Thread Yu Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-12692:
--
Fix Version/s: (was: 1.11.0)
   1.12.0

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2019-12-08 Thread Yu Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-12692:
--
Fix Version/s: (was: 1.10.0)
   1.11.0

Sorry but we have to postpone the work to 1.11.0 due to comparative limited 
review resource. We will try to supply a trial version in 
[flink-packages|https://flink-packages.org] for those who'd like to try this 
out in production. Will give a note here once the trial version is ready.

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2019-07-16 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-12692:
-
Fix Version/s: (was: 1.9.0)
   1.10.0

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12692:
---
Labels: pull-request-available  (was: )

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend

2019-05-31 Thread Yu Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-12692:
--
Fix Version/s: 1.9.0

We target at completing this work before the 1.9.0 release.

> Support disk spilling in HeapKeyedStateBackend
> --
>
> Key: FLINK-12692
> URL: https://issues.apache.org/jira/browse/FLINK-12692
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Major
> Fix For: 1.9.0
>
>
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, 
> since state lives as Java objects on the heap and the de/serialization only 
> happens during state snapshot and restore, it outperforms 
> {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its 
> shortcomings, and the most painful one is the difficulty to estimate the 
> maximum heap size (Xmx) to set, and we will suffer from GC impact once the 
> heap memory is not enough to hold all state data. There’re several 
> (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the 
> serialized data size).
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data 
> to disk before heap memory is exhausted. We will monitor the heap usage and 
> choose the coldest data to spill, and reload them when heap memory is 
> regained after data removing or TTL expiration, automatically.
> More details please refer to the design doc and mailing list discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)