[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861457#comment-16861457
 ] 

Evan Reynolds commented on NIFI-6367:
-------------------------------------

I can confirm the behavior, and replicated it this way:

In FetchS3Object.java in onTrigger, I called 
request.withMatchingETagConstraint("bad value") to try to trigger this and see 
what would happen. (I modified the file when downloading it but that did not 
trigger this so I was just trying to generate the behavior to see what happened 
and what behavior to replicate in a unit test.)

Doing that caused S3 to refuse to download the file - but it did so not by 
throwing an exception or anything visible, but by returning a null to s3Object. 
The code then calls s3Object which causes a null pointer failure. That is not 
handled, so the request is penalized and will try again next time. (I checked 
Amazon's sample code to see how they were handling it, but they were not 
checking for nulls either!)

It seems like there's a callback that might work to look for an error, or we 
can just check the null value - but one question that I have to ask is what is 
the desired behavior? If the file was corrupted on download, then retrying it 
might actually be the right thing to do. But then it caused problems in this 
case! But I worry that fixing it for this case will break things for the more 
common cases.

Right now I'm leaning towards either doing nothing, or else seeing if there is 
a callback that will tell me what the error actually was so we can do a better 
job logging what happened, and leaving it at that. 

Thoughts? [~ste...@apache.org] ? 

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> --------------------------------------------------------------------------------------------
>
>                 Key: NIFI-6367
>                 URL: https://issues.apache.org/jira/browse/NIFI-6367
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.7.1
>         Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>            Reporter: Kefevs Pirkibo
>            Assignee: Evan Reynolds
>            Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to