Re: [I] duplicated records when use insert overwrite [hudi]
ad1happy2go commented on issue #11358: URL: https://github.com/apache/hudi/issues/11358#issuecomment-2155135667 @njalan If the data which you are inserting has dups, then insert overwrite will create dups in the table. Can you please share us the timeline to look further -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] duplicated records when use insert overwrite [hudi]
njalan commented on issue #11358: URL: https://github.com/apache/hudi/issues/11358#issuecomment-2147567547 @ad1happy2go I don't think I am using multi writers. is there any parameter for multi writers? We have checked after that there is dup records. In my understanding that there should me only one commit time in final table when I use insert_overwrite. Why I can see two multiple commit times from the final table and one commit time is that from target table before this overwrite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] duplicated records when use insert overwrite [hudi]
ad1happy2go commented on issue #11358: URL: https://github.com/apache/hudi/issues/11358#issuecomment-2142473345 @njalan Also as I understood, data what you are writing is output of 10 tables. SO when you are doing insert_overwrite, Does that source data frame contains dups? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] duplicated records when use insert overwrite [hudi]
ad1happy2go commented on issue #11358: URL: https://github.com/apache/hudi/issues/11358#issuecomment-2141806002 @njalan Are you using multi writers? Can you come up with a reproducible script. You are using very old Hudi version though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] duplicated records when use insert overwrite [hudi]
njalan opened a new issue, #11358: URL: https://github.com/apache/hudi/issues/11358 There are multiple commit time exists in hoodie table and also duplicated records exists when use insert overwrite into the target table. There are like 10 tables join in the query. **Environment Description** * Hudi version : 0.9 * Spark version : 3.0.1 * Hive version : 3.2 * Hadoop version :3.2 * Storage (HDFS/S3/GCS..) : s3 * Running on Docker? (yes/no) :no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org