[ 
https://issues.apache.org/jira/browse/FLINK-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297476#comment-16297476
 ] 

ASF GitHub Bot commented on FLINK-8297:
---------------------------------------

GitHub user je-ik opened a pull request:

    https://github.com/apache/flink/pull/5185

    [FLINK-8297] [flink-rocksdb] optionally use RocksDBMapState internally for 
storing lists

    ## What is the purpose of the change
    
    Enable storing lists not fitting to memory per single key.
    
    ## Brief change log
    
    ## Verifying this change
    
    This change added tests and can be verified as follows:
      passes additional tests for RocksDBStateBackend.enableLargeListsPerKey()
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no, backward 
compatible
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? yes
      - If yes, how is the feature documented? JavaDocs


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/datadrivencz/flink 
rocksdb-backend-memory-optimization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5185
    
----
commit f1bbaa30901ba8a54b02908fd3eb3615301b4400
Author: Jan Lukavsky <je...@seznam.cz>
Date:   2017-12-14T20:42:06Z

    [FLINK-8297] [flink-rocksdb] optionally use RocksDBMapState internally for 
storing lists

----


> RocksDBListState stores whole list in single byte[]
> ---------------------------------------------------
>
>                 Key: FLINK-8297
>                 URL: https://issues.apache.org/jira/browse/FLINK-8297
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Jan Lukavský
>
> RocksDBListState currently keeps whole list of data in single RocksDB 
> key-value pair, which implies that the list actually must fit into memory. 
> Larger lists are not supported and end up with OOME or other error. The 
> RocksDBListState could be modified so that individual items in list are 
> stored in separate keys in RocksDB and can then be iterated over. A simple 
> implementation could reuse existing RocksDBMapState, with key as index to the 
> list and a single RocksDBValueState keeping track of how many items has 
> already been added to the list. Because this implementation might be less 
> efficient in come cases, it would be good to make it opt-in by a construct 
> like
> {{new RocksDBStateBackend().enableLargeListsPerKey()}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to