Prashant Wason created HUDI-667:
-----------------------------------

             Summary: HoodieTestDataGenerator does not delete keys correctly
                 Key: HUDI-667
                 URL: https://issues.apache.org/jira/browse/HUDI-667
             Project: Apache Hudi (incubating)
          Issue Type: Bug
            Reporter: Prashant Wason


HoodieTestDataGenerator is used to generate sample data for unit-tests. It 
allows generating HoodieRecords for insert/update/delete. It maintains the 
record keys in a HashMap.

private final Map<Integer, KeyPartition> existingKeys;

There are two issues in the implementation:
 # Delete from existingKeys uses KeyPartition rather than Integer keys
 # Inserting records after deletes is not correctly handled

The implementation uses the Integer key so that values can be looked up 
randomly. Assume three values were inserted, then the HashMap will hold:

0 -> KeyPartition1
1 -> KeyPartition2
2 -> KeyPartition3

Now if we delete KeyPartition2  (generate a random record for deletion), the 
HashMap will be:

0 -> KeyPartition1
2 -> KeyPartition3

 

Now if we issue a insertBatch() then the insert is 
existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the 
KeyPartition3 already in the map rather than actually inserting a new entry in 
the map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to