Re: [I] duplicated records when use insert overwrite [hudi]

2024-06-07 Thread via GitHub


ad1happy2go commented on issue #11358:
URL: https://github.com/apache/hudi/issues/11358#issuecomment-2155135667

   @njalan If the data which you are inserting has dups, then insert overwrite 
will create dups in the table.
   
   Can you please share us the timeline to look further
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] duplicated records when use insert overwrite [hudi]

2024-06-04 Thread via GitHub


njalan commented on issue #11358:
URL: https://github.com/apache/hudi/issues/11358#issuecomment-2147567547

   @ad1happy2go I don't think I am using multi writers. is there any parameter 
for multi writers?  We have checked after that there is dup records.  In my 
understanding that  there should me only one commit time in final table when I 
use  insert_overwrite. Why I can see two multiple  commit times from the final 
table and one commit time is that from target table before this overwrite.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] duplicated records when use insert overwrite [hudi]

2024-05-31 Thread via GitHub


ad1happy2go commented on issue #11358:
URL: https://github.com/apache/hudi/issues/11358#issuecomment-2142473345

   @njalan Also as I understood, data what you are writing is output of 10 
tables. SO when you are doing insert_overwrite, Does that source data frame 
contains dups?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] duplicated records when use insert overwrite [hudi]

2024-05-31 Thread via GitHub


ad1happy2go commented on issue #11358:
URL: https://github.com/apache/hudi/issues/11358#issuecomment-2141806002

   @njalan Are you using multi writers? Can you come up with a reproducible 
script. You are using very old Hudi version though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] duplicated records when use insert overwrite [hudi]

2024-05-29 Thread via GitHub


njalan opened a new issue, #11358:
URL: https://github.com/apache/hudi/issues/11358

   There are multiple commit time exists in hoodie table and also  duplicated 
records exists when use insert overwrite into the target table. There are like 
10 tables join in the query.
   
   **Environment Description**
   
   * Hudi version : 0.9
   
   * Spark version : 3.0.1
   
   * Hive version : 3.2
   
   * Hadoop version :3.2
   
   * Storage (HDFS/S3/GCS..) : s3
   
   * Running on Docker? (yes/no) :no
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org