[ https://issues.apache.org/jira/browse/FLINK-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308081#comment-16308081 ]
ASF GitHub Bot commented on FLINK-8297: --------------------------------------- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5185 The concept of this looks good. However, if we want to merge it I think we have to make this a standalone thing and not depend on other states because this can have unforeseen consequences for future developments. I can think of several cases where the current approach would lead to surprising problems: - a user inspects a savepoint and finds a `MapState` and a `ValueState` instead of the `ListState` they're expecting (becomes a problem when we have tools for inspecting savepoints and also is problematic for compatibility of the savepoint format between different state backends) - (related to the above) the "binary format" of the savepoint is different between the two list implementations. This leads to problems if you want to change the implementation between restoring from a savepoint and when you want to switch backends (which we currently don't support). - if/when we have metrics for user states this would export metrics for a `MapState` and a `ValueState` and not for one expected `ListState` > RocksDBListState stores whole list in single byte[] > --------------------------------------------------- > > Key: FLINK-8297 > URL: https://issues.apache.org/jira/browse/FLINK-8297 > Project: Flink > Issue Type: Improvement > Components: Core > Affects Versions: 1.4.0, 1.3.2 > Reporter: Jan Lukavský > > RocksDBListState currently keeps whole list of data in single RocksDB > key-value pair, which implies that the list actually must fit into memory. > Larger lists are not supported and end up with OOME or other error. The > RocksDBListState could be modified so that individual items in list are > stored in separate keys in RocksDB and can then be iterated over. A simple > implementation could reuse existing RocksDBMapState, with key as index to the > list and a single RocksDBValueState keeping track of how many items has > already been added to the list. Because this implementation might be less > efficient in come cases, it would be good to make it opt-in by a construct > like > {{new RocksDBStateBackend().enableLargeListsPerKey()}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)