[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867575#comment-16867575
 ] 

Kefevs Pirkibo commented on NIFI-6367:
--------------------------------------

[~evanthx] - Here is the whole thing. Hope this helps.

org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import 
data from com.amazonaws.services.s3.model.S3ObjectInputStream@.... for
StandardFlowFileRecord[uuid=..., claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=...., container=default, section=...], 
offset=..., length=...], offset=..., name=..., size=...] 
 due to com.amazonaws.SdkClientException= Unable to verify integrity of data 
download. Client calculated content has didn't match has calculated by Amazon 
S3. The data may be corrupt.
 at 
org.apache.nifi.controller.repository.StandardProcess.Session.importFrom(StandardProcessSession.java:3004)
 at 
org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:108)
 at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
 at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
 at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data 
download. Client calculated content hash didn't match hash calculated by Amazon 
S3. The data may be corrupt.
 at 
com.amazonaws.services.s3.internal.DigestValidationInputStream.ValidateMD5Digest(DigestValidationInputStream.java:79)
 at com.amazonaws.services.s3.internal. 
DigestValidationInputStream.read(DigestValidationInputStream.java:61)
 at 
com.amazonaws.internal.sdkFilterInputStream.read(SdkFilterInputStream.java:82)
 at java.io.FilterInputStream.read(FilterInputStream.java:107)
 at 
org.apache.nifi.controller.repository.io.TaskTerminationInputStream.read(TaskTerminationInputStream.java:62)
 at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:35)
 at 
org.apache.nifi.controller.repository.FileSystemRepository.importFrom(FileSystemRepository.java:744)
 at 
org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2994)
 ... 12 common frames omitted

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> --------------------------------------------------------------------------------------------
>
>                 Key: NIFI-6367
>                 URL: https://issues.apache.org/jira/browse/NIFI-6367
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.7.1
>         Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>            Reporter: Kefevs Pirkibo
>            Assignee: Evan Reynolds
>            Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to