hudi-bot opened a new issue, #14607:
URL: https://github.com/apache/hudi/issues/14607

   Version used Hudi 0.5.3 + S3 + EMR, when bulk importing large amount of data 
(400gb) Hudi fails with exception:
   
    _HoodieCommitException: Failed to complete commit 20200619190257 due to 
finalize errors._
   _Caused by: org.apache.hudi.exception.HoodieIOException: Consistency check 
failed to ensure all files APPEAR_
   
   The log line right before exception says: _Removing duplicate data files 
created due to spark retries before committing Paths=[list of files] ._ When __ 
checking s3 location I can verify that files are not there. When checking 
.hoodie/.temp/commitId/partion location, I can verify that files with the same 
name but with the ".marker" extension are present. Exception occurs most of the 
time we try to import large amount of data. Attached are stack trace of 
exception as well as code snippet that does bulk_import. 
[^stackTrace.txt][^codeSnppet.txt]
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-1030
   - Type: Bug
   - Attachment(s):
     - 20/Jun/20 
01:00;zuyanton;codeSnppet.txt;https://issues.apache.org/jira/secure/attachment/13006094/codeSnppet.txt
     - 20/Jun/20 
01:00;zuyanton;stackTrace.txt;https://issues.apache.org/jira/secure/attachment/13006093/stackTrace.txt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to