stym06 opened a new issue #4318:
URL: https://github.com/apache/hudi/issues/4318


   **Describe the problem you faced**
   We have written IOT data from Kafka to Azure blob using Deltastreamer 
utility in continuous mode and are querying the table through Presto. We are 
seeing duplicate records with the same _hoodie_record_key but different commit 
file number and different parquet file in the same partition path.
   
   ```
    _hoodie_commit_time |   _hoodie_commit_seqno   |                           
_hoodie_record_key                            | _hoodie_partition_path |        
                     _hoodie_file_name                             | 
master_timestamp |   timest
   
---------------------+--------------------------+-------------------------------------------------------------------------+------------------------+---------------------------------------------------------------------------+------------------+---------
    20211206002458      | 20211206002458_1_4116796 | 
vehicle_identification_number:P53ACDCB2AKA00081,timestamp:1638708846929 | 
dt=2021-12-05          | 
5885895a-78d1-468b-9e7b-045d77644d1c-0_1-1706-2959_20211206002458.parquet |    
1638708851906 | 16387088
    20211206120116      | 20211206120116_1_1745292 | 
vehicle_identification_number:P53ACDCB2AKA00081,timestamp:1638708846929 | 
dt=2021-12-05          | 
df619ce7-cd21-41fc-9e6b-68386748bde4-0_1-470-1174_20211206120116.parquet  |    
1638708851906 | 1638708
   ```
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   
   * Spark version : 2.4.4
   
   * Hive version : 3.1.2
   
   * Hadoop version : 3.1.2
    
   * Storage (HDFS/S3/GCS..) : Azure Blob
   
   * Running on Docker? (yes/no) : K8s
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to