[ https://issues.apache.org/jira/browse/FLINK-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297476#comment-16297476 ]
ASF GitHub Bot commented on FLINK-8297: --------------------------------------- GitHub user je-ik opened a pull request: https://github.com/apache/flink/pull/5185 [FLINK-8297] [flink-rocksdb] optionally use RocksDBMapState internally for storing lists ## What is the purpose of the change Enable storing lists not fitting to memory per single key. ## Brief change log ## Verifying this change This change added tests and can be verified as follows: passes additional tests for RocksDBStateBackend.enableLargeListsPerKey() ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: yes - The serializers: no - The runtime per-record code paths (performance sensitive): no, backward compatible - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? yes - If yes, how is the feature documented? JavaDocs You can merge this pull request into a Git repository by running: $ git pull https://github.com/datadrivencz/flink rocksdb-backend-memory-optimization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5185.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5185 ---- commit f1bbaa30901ba8a54b02908fd3eb3615301b4400 Author: Jan Lukavsky <je...@seznam.cz> Date: 2017-12-14T20:42:06Z [FLINK-8297] [flink-rocksdb] optionally use RocksDBMapState internally for storing lists ---- > RocksDBListState stores whole list in single byte[] > --------------------------------------------------- > > Key: FLINK-8297 > URL: https://issues.apache.org/jira/browse/FLINK-8297 > Project: Flink > Issue Type: Improvement > Components: Core > Affects Versions: 1.4.0, 1.3.2 > Reporter: Jan Lukavský > > RocksDBListState currently keeps whole list of data in single RocksDB > key-value pair, which implies that the list actually must fit into memory. > Larger lists are not supported and end up with OOME or other error. The > RocksDBListState could be modified so that individual items in list are > stored in separate keys in RocksDB and can then be iterated over. A simple > implementation could reuse existing RocksDBMapState, with key as index to the > list and a single RocksDBValueState keeping track of how many items has > already been added to the list. Because this implementation might be less > efficient in come cases, it would be good to make it opt-in by a construct > like > {{new RocksDBStateBackend().enableLargeListsPerKey()}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)