kpurella opened a new issue #2062: URL: https://github.com/apache/hudi/issues/2062
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? yes - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. not yet - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Seeing Duplicate records in _ro and _rt table after 2 incremental runs. A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1.Ingest first partition with partitionpath=year=2020/month=08/day=01 2.Ingest first partition once more with same partitionpath 3.Ingest second partition with partitionpath=year=2020/month=08/day=02 **Expected behavior** HUDI should merge Deltas to avoid duplicates. Hudi version : 0.5.2-incubating Spark version : 2.4.5 Hive version : 2.3.6 Hadoop version : 2.8.5 Storage (HDFS/S3/GCS..) : s3 Running on Docker? (yes/no) : no EMR : 5.30.1 Configuration hoodie.index.type=GLOBAL_BLOOM hoodie.compact.inline.max.delta.commits=10 hoodie.datasource.write.table.type=MOR hoodie.datasource.write.operation=upsert Test Data: partition year=2020/month=08/day=01 388128891|13511|1|N|2014-10-17 328587935|13109|7|A|2015-02-02 329770530|13113|1|N|2013-07-26 388128892|13553|7|A|2014-10-17 388128893|24886|1|N|2014-10-17 388128894|24887|7|A|2014-10-17 388128895|24888|1|N|2014-10-17 388128896|24968|7|A|2014-10-17 328587936|13110|1|N|2015-02-01 328587937|13116|7|A|2015-02-01 328587938|13122|1|A|2015-02-01 328587939|13248|1|A|2015-02-01 328587940|13118|3|A|2015-02-01 388128896|25110|3|A|2020-08-01 328587935|13119|2|A|2020-08-01 328587941|13115|2|A|2020-08-01 partition year=2020/month=08/day=02 388128896|25110|6|N|2020-08-02 328587935|13119|4|N|2020-08-02 328587941|13115|7|N|2020-08-02 328587938|13122|7|N|2015-02-02 328587939|13248|5|A|2015-02-02 128587939|33248|6|A|2015-02-02 0: jdbc:hive2://xx.xxx.xx.xx:10000> select `_hoodie_commit_time`,`_hoodie_record_key`,col1,col2,moddate,partitionpath from test_ro where col1='388128896' and col2=25110; +----------------------+----------------------------------+------------+------------+-------------------+----------------------------+ | _hoodie_commit_time | _hoodie_record_key | col1 | col2 | moddate | partitionpath | +----------------------+----------------------------------+------------+------------+-------------------+----------------------------+ | 20200901003201 | col1:388128896,col2:25110 | 388128896 | 25110 | 1596240000000000 | year=2020/month=08/day=01 | | 20200901003731 | col1:388128896,col2:25110 | 388128896 | 25110 | 1596326400000000 | year=2020/month=08/day=02 | +----------------------+----------------------------------+------------+------------+-------------------+----------------------------+ Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` Thank you for looking into this, please let me know if you need any more details- ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org