[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862479#comment-16862479 ]
Kefevs Pirkibo commented on NIFI-6367: -------------------------------------- >From the NIFI logs, had to dig this up from the archives. Not cut'n'paste, so >beware of typos. ... org.apache.nifi.processor.exception.FlowFileAcecssException: Failed to import data from com.amazonaws.services.s3.model.S3ObjectInputStream@.... for ... due to com.amazonaws.SdkClientException: Unable to verify integrity of data download. Client calculated content hash didn't match hash calculated by Amazon S3. That data may be corrupt. ... at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession:..) at org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:..) ... at java.lang.Thread.run(Thread.java:..) Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data download. Client calculated content hash didn't match hash calculated by Amazon S3. The data may be corrupt. at com.amazonaws.services.s3.internal.DigestValidationInputStream.validateMD5Digest(DigestValidateionInputStream.java:..) at.... ... In this case with s3cmd the output from the same files would be (for reference) WARNING: MD5 signatures do not match: computed= .... , received= .... ... If there is further interest in log sample I can try to see what I can do. > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > -------------------------------------------------------------------------------------------- > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. > Reporter: Kefevs Pirkibo > Assignee: Evan Reynolds > Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)