[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

Evan Reynolds (JIRA) Wed, 12 Jun 2019 13:00:57 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862419#comment-16862419
 ]


Evan Reynolds commented on NIFI-6367:
-------------------------------------

[~ste...@apache.org] - thank you for finding the Hadoop logic - that does 
match, actually. The GET link you linked calls ChangeTracker which checks for 
null:
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ChangeTracker.java#L174

So mimicking that - I'm thinking we should just add a null check here:
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/FetchS3Object.java#L107

And if it's null, logging a message and throwing an IOException so it uses the 
current error handling?

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> --------------------------------------------------------------------------------------------
>
>                 Key: NIFI-6367
>                 URL: https://issues.apache.org/jira/browse/NIFI-6367
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.7.1
>         Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>            Reporter: Kefevs Pirkibo
>            Assignee: Evan Reynolds
>            Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

Reply via email to