[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918182#comment-16918182 ] ASF subversion and git services commented on NIFI-6367: --- Commit e2ca50e66a3b1a7d810ea8eac256d21bca3fd07f in nifi's branch refs/heads/master from Evan Reynolds [ https://gitbox.apache.org/repos/asf?p=nifi.git;h=e2ca50e ] NIFI-6367 - This closes #3563. more error handling for FetchS3Object Signed-off-by: Joe Witt > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > Fix For: 1.10.0 > > Time Spent: 50m > Remaining Estimate: 0h > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876494#comment-16876494 ] Evan Reynolds commented on NIFI-6367: - [~kefevs] - that did help! Thank you! It didn't throw the handled exceptions in your case, it threw an exception type that tells NiFi to reprocess the flowfile. I added two extra error checks - a null (as I could see that happen when testing) and also to check that exception to see if we should really retry or not - [https://github.com/apache/nifi/pull/3562] I think that will fix it up. > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867634#comment-16867634 ] Kefevs Pirkibo commented on NIFI-6367: -- [~joewitt] - The file is not routed to failure, it's stuck in the input que, beeing run over and over again. I'm assuming it's rolled back; but I'd prefer someone else conclude on the mechanic used for the file being stuck there. > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867597#comment-16867597 ] Joseph Witt commented on NIFI-6367: --- Does the flowfile get routed to a failure relationship or is the session in nifi rolled back? If it is routed to failure then it is on the person designing the flow to pick their poison in terms of whether to retry or not. Or, if the person has insufficient failure data to make that decision we need to offer more context (code change). Or, if it is rolledback we need to catch/look for this case in particular and ensure it is routed to failure and/or some relationship making it clear that the md5 doesn't match. Is the case here that a ListS3 has given a flowfile with a file path and md5 but then during FetchS3 the md5 of the downloaded item doesn't match? Or rather it is that the S3 client lib itself is getting a different md5 reported as metadata which it finds doesn't match to the actual data? > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867575#comment-16867575 ] Kefevs Pirkibo commented on NIFI-6367: -- [~evanthx] - Here is the whole thing. Hope this helps. org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from com.amazonaws.services.s3.model.S3ObjectInputStream@ for StandardFlowFileRecord[uuid=..., claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=, container=default, section=...], offset=..., length=...], offset=..., name=..., size=...] due to com.amazonaws.SdkClientException= Unable to verify integrity of data download. Client calculated content has didn't match has calculated by Amazon S3. The data may be corrupt. at org.apache.nifi.controller.repository.StandardProcess.Session.importFrom(StandardProcessSession.java:3004) at org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:108) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data download. Client calculated content hash didn't match hash calculated by Amazon S3. The data may be corrupt. at com.amazonaws.services.s3.internal.DigestValidationInputStream.ValidateMD5Digest(DigestValidationInputStream.java:79) at com.amazonaws.services.s3.internal. DigestValidationInputStream.read(DigestValidationInputStream.java:61) at com.amazonaws.internal.sdkFilterInputStream.read(SdkFilterInputStream.java:82) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.nifi.controller.repository.io.TaskTerminationInputStream.read(TaskTerminationInputStream.java:62) at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:35) at org.apache.nifi.controller.repository.FileSystemRepository.importFrom(FileSystemRepository.java:744) at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2994) ... 12 common frames omitted > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865843#comment-16865843 ] Evan Reynolds commented on NIFI-6367: - [~kefevs] - thank you! It threw a SdkClientException - that is NOT exactly the same as the issue I found, so I'm really glad you posted that. But SdkClientException extends AmazonClientException, which is caught (I checked 1.7.1 code and they are caught there too) so I'm not certain I quite have this figured out yet. I made the unit tests throw that exception and it transferred the file to the failure queue, which isn't the behavior you saw. Do you still have the stack trace? I'd love to see the lines for FetchS3Object and a few lines of where it went afterwards if that's possible, mostly to use the line numbers to double check what it did in the code. (Since you had to type it over - two or three filenames and line numbers are all I really need.) > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862479#comment-16862479 ] Kefevs Pirkibo commented on NIFI-6367: -- >From the NIFI logs, had to dig this up from the archives. Not cut'n'paste, so >beware of typos. ... org.apache.nifi.processor.exception.FlowFileAcecssException: Failed to import data from com.amazonaws.services.s3.model.S3ObjectInputStream@ for ... due to com.amazonaws.SdkClientException: Unable to verify integrity of data download. Client calculated content hash didn't match hash calculated by Amazon S3. That data may be corrupt. ... at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession:..) at org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:..) ... at java.lang.Thread.run(Thread.java:..) Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data download. Client calculated content hash didn't match hash calculated by Amazon S3. The data may be corrupt. at com.amazonaws.services.s3.internal.DigestValidationInputStream.validateMD5Digest(DigestValidateionInputStream.java:..) at ... In this case with s3cmd the output from the same files would be (for reference) WARNING: MD5 signatures do not match: computed= , received= ... If there is further interest in log sample I can try to see what I can do. > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862419#comment-16862419 ] Evan Reynolds commented on NIFI-6367: - [~ste...@apache.org] - thank you for finding the Hadoop logic - that does match, actually. The GET link you linked calls ChangeTracker which checks for null: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ChangeTracker.java#L174 So mimicking that - I'm thinking we should just add a null check here: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/FetchS3Object.java#L107 And if it's null, logging a message and throwing an IOException so it uses the current error handling? > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861973#comment-16861973 ] Steve Loughran commented on NIFI-6367: -- * change tracking logic for both versions and etags: https://github.com/apache/hadoop/tree/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl * how this used in GET operations: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L196 > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861971#comment-16861971 ] Steve Loughran commented on NIFI-6367: -- HADOOP-16085 added tracking of etag/version ID in two ways # if the value is known before hand, we use it in the initial GET # otherwise, cache the values in the first response and fail if on a subsequent GET in the same file read (After a seek/paged read) then the result can be processed server side or client side. # server side: see a null and remap to a failure # client-side: validate the header and (Based on config) warn or fail If you use the AWS Transfer manager it doesn't handle changes during a copy that well: https://github.com/aws/aws-sdk-java/issues/1644 > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861484#comment-16861484 ] Evan Reynolds commented on NIFI-6367: - More looking I don't think the callback I saw was relevant to this code, so ignore that - it would just be a null check on the value coming back from client.getObject. > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861457#comment-16861457 ] Evan Reynolds commented on NIFI-6367: - I can confirm the behavior, and replicated it this way: In FetchS3Object.java in onTrigger, I called request.withMatchingETagConstraint("bad value") to try to trigger this and see what would happen. (I modified the file when downloading it but that did not trigger this so I was just trying to generate the behavior to see what happened and what behavior to replicate in a unit test.) Doing that caused S3 to refuse to download the file - but it did so not by throwing an exception or anything visible, but by returning a null to s3Object. The code then calls s3Object which causes a null pointer failure. That is not handled, so the request is penalized and will try again next time. (I checked Amazon's sample code to see how they were handling it, but they were not checking for nulls either!) It seems like there's a callback that might work to look for an error, or we can just check the null value - but one question that I have to ask is what is the desired behavior? If the file was corrupted on download, then retrying it might actually be the right thing to do. But then it caused problems in this case! But I worry that fixing it for this case will break things for the more common cases. Right now I'm leaning towards either doing nothing, or else seeing if there is a callback that will tell me what the error actually was so we can do a better job logging what happened, and leaving it at that. Thoughts? [~ste...@apache.org] ? > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Assignee: Evan Reynolds >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again
[ https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860991#comment-16860991 ] Steve Loughran commented on NIFI-6367: -- # What's the specific exception stack trace people see on an MD5 error? # whatever is doing the D/L is presumably retrying, and clearly isn't giving up. I don't know the NiFi logic here; in hadoop-aws I don't see us handling an MD5 error at all, which is why I'm curious as to what you saw. If its going through the Amazon SDK and it's transfer manager, it may be that bit of code which is trying to be helpful > FetchS3Processor responds to md5 error on download by doing download again, > again, and again > > > Key: NIFI-6367 > URL: https://issues.apache.org/jira/browse/NIFI-6367 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework >Affects Versions: 1.7.1 > Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 > enviroment (non public). Enviroment / S3 had errors that introduced md5 > errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the > input que of the processor. >Reporter: Kefevs Pirkibo >Priority: Critical > > (6months old, but don't see changes in the relevant parts of the code, though > I might be mistaken. This might be hard to replicate, so suggest a code > wizard check if this is still a problem. ) > Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non > public). The enviroment and S3 had in combination hardware errors that > resulted in sporadic md5 errors on the same files over and over again. Md5 > errors resulted in an unhandled AmazonClientException, and the file was > downloaded yet again. (Reverted to the input que, first in line.) In our case > this was identified after a number of days, with substantial bandwidth usage. > It did not help that the FetchS3Objects where running with multiple > instances, and after days accumulated the bad md5 checksum files for > continuous download. > Suggest: Someone code savy check what happens to files that are downloaded > with bad md5, if they are reverted to the que due to uncought exception or > other means, then this is still a potential problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)