[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862479#comment-16862479
 ] 

Kefevs Pirkibo commented on NIFI-6367:
--------------------------------------

>From the NIFI logs, had to dig this up from the archives. Not cut'n'paste, so 
>beware of typos.
...
org.apache.nifi.processor.exception.FlowFileAcecssException: Failed to import 
data from com.amazonaws.services.s3.model.S3ObjectInputStream@.... for
...
due to com.amazonaws.SdkClientException: Unable to verify integrity of data 
download. Client calculated content hash didn't match hash calculated by Amazon 
S3. That data may be corrupt.
...
at 
org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession:..)
at 
org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:..)
...
at java.lang.Thread.run(Thread.java:..)
Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data 
download. Client calculated content hash didn't match hash calculated by Amazon 
S3. The data may be corrupt.
at 
com.amazonaws.services.s3.internal.DigestValidationInputStream.validateMD5Digest(DigestValidateionInputStream.java:..)
at....
...

In this case with s3cmd the output from the same files would be (for reference)
WARNING: MD5 signatures do not match: computed= .... , received= ....

...
If there is further interest in log sample I can try to see what I can do.

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> --------------------------------------------------------------------------------------------
>
>                 Key: NIFI-6367
>                 URL: https://issues.apache.org/jira/browse/NIFI-6367
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.7.1
>         Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>            Reporter: Kefevs Pirkibo
>            Assignee: Evan Reynolds
>            Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to