Yue Ma created FLINK-24597: ------------------------------ Summary: RocksdbStateBackend getKeysAndNamespaces would return duplicate data when using MapState Key: FLINK-24597 URL: https://issues.apache.org/jira/browse/FLINK-24597 Project: Flink Issue Type: Bug Components: API / State Processor, Runtime / State Backends Affects Versions: 1.13.3, 1.12.4, 1.14.0 Reporter: Yue Ma Attachments: image-2021-10-20-14-23-20-333.png
For example, in RocksdbStateBackend , if we worked in VoidNamespace , and And use the ValueState like below . {code:java} // insert record for (int i = 0; i < 3; ++i) { keyedStateBackend.setCurrentKey(i); testValueState.update(String.valueOf(i)); } {code} Then we get all the keysAndNamespace according the method RocksDBKeyedStateBackend#getKeysAndNamespaces().The result of the traversal is <1,VoidNamespace>,<2,VoidNamespace>,<3,VoidNamespace> ,which is as expected. Thus,if we use MapState , and update the MapState with different user key, the getKeysAndNamespaces would return duplicate data with same keyAndNamespace. {code:java} // insert record for (int i = 0; i < 3; ++i) { keyedStateBackend.setCurrentKey(i); mapState.put("userKeyA_" + i, "userValue"); mapState.put("userKeyB_" + i, "userValue"); } {code} The result of the traversal is <1,VoidNamespace>,<1,VoidNamespace>,<2,VoidNamespace>,<2,VoidNamespace>,<3,VoidNamespace>,<3,VoidNamespace>. By reading the code, I found that the main reason for this problem is in the implementation of _RocksStateKeysAndNamespaceIterator_. In the _hasNext_ method, when a new keyAndNamespace is created, there is no comparison with the previousKeyAndNamespace. So we can refer to RocksStateKeysIterator to implement the same logic should solve this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)