Prashant Wason created HUDI-667: ----------------------------------- Summary: HoodieTestDataGenerator does not delete keys correctly Key: HUDI-667 URL: https://issues.apache.org/jira/browse/HUDI-667 Project: Apache Hudi (incubating) Issue Type: Bug Reporter: Prashant Wason
HoodieTestDataGenerator is used to generate sample data for unit-tests. It allows generating HoodieRecords for insert/update/delete. It maintains the record keys in a HashMap. private final Map<Integer, KeyPartition> existingKeys; There are two issues in the implementation: # Delete from existingKeys uses KeyPartition rather than Integer keys # Inserting records after deletes is not correctly handled The implementation uses the Integer key so that values can be looked up randomly. Assume three values were inserted, then the HashMap will hold: 0 -> KeyPartition1 1 -> KeyPartition2 2 -> KeyPartition3 Now if we delete KeyPartition2 (generate a random record for deletion), the HashMap will be: 0 -> KeyPartition1 2 -> KeyPartition3 Now if we issue a insertBatch() then the insert is existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the KeyPartition3 already in the map rather than actually inserting a new entry in the map. -- This message was sent by Atlassian Jira (v8.3.4#803005)