[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867575#comment-16867575 ]
Kefevs Pirkibo commented on NIFI-6367: -------------------------------------- [~evanthx] - Here is the whole thing. Hope this helps. org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from com.amazonaws.services.s3.model.S3ObjectInputStream@.... for StandardFlowFileRecord[uuid=..., claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=...., container=default, section=...], offset=..., length=...], offset=..., name=..., size=...] due to com.amazonaws.SdkClientException= Unable to verify integrity of data download. Client calculated content has didn't match has calculated by Amazon S3. The data may be corrupt. at org.apache.nifi.controller.repository.StandardProcess.Session.importFrom(StandardProcessSession.java:3004) at org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:108) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data download. Client calculated content hash didn't match hash calculated by Amazon S3. The data may be corrupt. at com.amazonaws.services.s3.internal.DigestValidationInputStream.ValidateMD5Digest(DigestValidationInputStream.java:79) at com.amazonaws.services.s3.internal. DigestValidationInputStream.read(DigestValidationInputStream.java:61) at com.amazonaws.internal.sdkFilterInputStream.read(SdkFilterInputStream.java:82) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.nifi.controller.repository.io.TaskTerminationInputStream.read(TaskTerminationInputStream.java:62) at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:35) at org.apache.nifi.controller.repository.FileSystemRepository.importFrom(FileSystemRepository.java:744) at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2994) ... 12 common frames omitted > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > -------------------------------------------------------------------------------------------- > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. > Reporter: Kefevs Pirkibo > Assignee: Evan Reynolds > Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)