hudi-bot opened a new issue, #14607: URL: https://github.com/apache/hudi/issues/14607
Version used Hudi 0.5.3 + S3 + EMR, when bulk importing large amount of data (400gb) Hudi fails with exception: _HoodieCommitException: Failed to complete commit 20200619190257 due to finalize errors._ _Caused by: org.apache.hudi.exception.HoodieIOException: Consistency check failed to ensure all files APPEAR_ The log line right before exception says: _Removing duplicate data files created due to spark retries before committing Paths=[list of files] ._ When __ checking s3 location I can verify that files are not there. When checking .hoodie/.temp/commitId/partion location, I can verify that files with the same name but with the ".marker" extension are present. Exception occurs most of the time we try to import large amount of data. Attached are stack trace of exception as well as code snippet that does bulk_import. [^stackTrace.txt][^codeSnppet.txt] ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-1030 - Type: Bug - Attachment(s): - 20/Jun/20 01:00;zuyanton;codeSnppet.txt;https://issues.apache.org/jira/secure/attachment/13006094/codeSnppet.txt - 20/Jun/20 01:00;zuyanton;stackTrace.txt;https://issues.apache.org/jira/secure/attachment/13006093/stackTrace.txt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
