[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-08-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918182#comment-16918182
 ] 

ASF subversion and git services commented on NIFI-6367:
---

Commit e2ca50e66a3b1a7d810ea8eac256d21bca3fd07f in nifi's branch 
refs/heads/master from Evan Reynolds
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=e2ca50e ]

NIFI-6367 - This closes #3563. more error handling for FetchS3Object

Signed-off-by: Joe Witt 


> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
> Fix For: 1.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-07-01 Thread Evan Reynolds (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876494#comment-16876494
 ] 

Evan Reynolds commented on NIFI-6367:
-

[~kefevs] - that did help! Thank you!

It didn't throw the handled exceptions in your case, it threw an exception type 
that tells NiFi to reprocess the flowfile. 

I added two extra error checks - a null (as I could see that happen when 
testing) and also to check that exception to see if we should really retry or 
not -
[https://github.com/apache/nifi/pull/3562]

I think that will fix it up.

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-19 Thread Kefevs Pirkibo (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867634#comment-16867634
 ] 

Kefevs Pirkibo commented on NIFI-6367:
--

[~joewitt] - The file is not routed to failure, it's stuck in the input que, 
beeing run over and over again. I'm assuming it's rolled back; but I'd prefer 
someone else conclude on the mechanic used for the file being stuck there.

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-19 Thread Joseph Witt (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867597#comment-16867597
 ] 

Joseph Witt commented on NIFI-6367:
---

Does the flowfile get routed to a failure relationship or is the session in 
nifi rolled back?

If it is routed to failure then it is on the person designing the flow to pick 
their poison in terms of whether to retry or not.  Or, if the person has 
insufficient failure data to make that decision we need to offer more context 
(code change).  Or, if it is rolledback we need to catch/look for this case in 
particular and ensure it is routed to failure and/or some relationship making 
it clear that the md5 doesn't match.

Is the case here that a ListS3 has given a flowfile with a file path and md5 
but then during FetchS3 the md5 of the downloaded item doesn't match?  Or 
rather it is that the S3 client lib itself is getting a different md5 reported 
as metadata which it finds doesn't match  to the actual data?

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-19 Thread Kefevs Pirkibo (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867575#comment-16867575
 ] 

Kefevs Pirkibo commented on NIFI-6367:
--

[~evanthx] - Here is the whole thing. Hope this helps.

org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import 
data from com.amazonaws.services.s3.model.S3ObjectInputStream@ for
StandardFlowFileRecord[uuid=..., claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=, container=default, section=...], 
offset=..., length=...], offset=..., name=..., size=...] 
 due to com.amazonaws.SdkClientException= Unable to verify integrity of data 
download. Client calculated content has didn't match has calculated by Amazon 
S3. The data may be corrupt.
 at 
org.apache.nifi.controller.repository.StandardProcess.Session.importFrom(StandardProcessSession.java:3004)
 at 
org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:108)
 at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
 at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
 at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data 
download. Client calculated content hash didn't match hash calculated by Amazon 
S3. The data may be corrupt.
 at 
com.amazonaws.services.s3.internal.DigestValidationInputStream.ValidateMD5Digest(DigestValidationInputStream.java:79)
 at com.amazonaws.services.s3.internal. 
DigestValidationInputStream.read(DigestValidationInputStream.java:61)
 at 
com.amazonaws.internal.sdkFilterInputStream.read(SdkFilterInputStream.java:82)
 at java.io.FilterInputStream.read(FilterInputStream.java:107)
 at 
org.apache.nifi.controller.repository.io.TaskTerminationInputStream.read(TaskTerminationInputStream.java:62)
 at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:35)
 at 
org.apache.nifi.controller.repository.FileSystemRepository.importFrom(FileSystemRepository.java:744)
 at 
org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2994)
 ... 12 common frames omitted

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-17 Thread Evan Reynolds (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865843#comment-16865843
 ] 

Evan Reynolds commented on NIFI-6367:
-

[~kefevs] - thank you! It threw a SdkClientException - that is NOT exactly the 
same as the issue I found, so I'm really glad you posted that. But 
SdkClientException extends AmazonClientException, which is caught (I checked 
1.7.1 code and they are caught there too) so I'm not certain I quite have this 
figured out yet. I made the unit tests throw that exception and it transferred 
the file to the failure queue, which isn't the behavior you saw.

Do you still have the stack trace? I'd love to see the lines for FetchS3Object 
and a few lines of where it went afterwards if that's possible, mostly to use 
the line numbers to double check what it did in the code. (Since you had to 
type it over - two or three filenames and line numbers are all I really need.)

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-12 Thread Kefevs Pirkibo (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862479#comment-16862479
 ] 

Kefevs Pirkibo commented on NIFI-6367:
--

>From the NIFI logs, had to dig this up from the archives. Not cut'n'paste, so 
>beware of typos.
...
org.apache.nifi.processor.exception.FlowFileAcecssException: Failed to import 
data from com.amazonaws.services.s3.model.S3ObjectInputStream@ for
...
due to com.amazonaws.SdkClientException: Unable to verify integrity of data 
download. Client calculated content hash didn't match hash calculated by Amazon 
S3. That data may be corrupt.
...
at 
org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession:..)
at 
org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:..)
...
at java.lang.Thread.run(Thread.java:..)
Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data 
download. Client calculated content hash didn't match hash calculated by Amazon 
S3. The data may be corrupt.
at 
com.amazonaws.services.s3.internal.DigestValidationInputStream.validateMD5Digest(DigestValidateionInputStream.java:..)
at
...

In this case with s3cmd the output from the same files would be (for reference)
WARNING: MD5 signatures do not match: computed=  , received= 

...
If there is further interest in log sample I can try to see what I can do.

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-12 Thread Evan Reynolds (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862419#comment-16862419
 ] 

Evan Reynolds commented on NIFI-6367:
-

[~ste...@apache.org] - thank you for finding the Hadoop logic - that does 
match, actually. The GET link you linked calls ChangeTracker which checks for 
null:
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ChangeTracker.java#L174

So mimicking that - I'm thinking we should just add a null check here:
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/FetchS3Object.java#L107

And if it's null, logging a message and throwing an IOException so it uses the 
current error handling?

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-12 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861973#comment-16861973
 ] 

Steve Loughran commented on NIFI-6367:
--

* change tracking logic for both versions and etags: 
https://github.com/apache/hadoop/tree/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl

* how this used in GET operations: 
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L196


> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-12 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861971#comment-16861971
 ] 

Steve Loughran commented on NIFI-6367:
--

HADOOP-16085 added tracking of etag/version ID in two ways

# if the value is known before hand, we use it in the initial GET
# otherwise, cache the values in the first response and fail if on a subsequent 
GET in the same file read (After a seek/paged read) then the result can be 
processed server side or client side. 
# server side: see a null and remap to a failure
# client-side: validate the header and (Based on config) warn or fail

If you use the AWS Transfer manager it doesn't handle changes during a copy 
that well: https://github.com/aws/aws-sdk-java/issues/1644

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-11 Thread Evan Reynolds (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861484#comment-16861484
 ] 

Evan Reynolds commented on NIFI-6367:
-

More looking I don't think the callback I saw was relevant to this code, so 
ignore that - it would just be a null check on the value coming back from 
client.getObject.

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-11 Thread Evan Reynolds (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861457#comment-16861457
 ] 

Evan Reynolds commented on NIFI-6367:
-

I can confirm the behavior, and replicated it this way:

In FetchS3Object.java in onTrigger, I called 
request.withMatchingETagConstraint("bad value") to try to trigger this and see 
what would happen. (I modified the file when downloading it but that did not 
trigger this so I was just trying to generate the behavior to see what happened 
and what behavior to replicate in a unit test.)

Doing that caused S3 to refuse to download the file - but it did so not by 
throwing an exception or anything visible, but by returning a null to s3Object. 
The code then calls s3Object which causes a null pointer failure. That is not 
handled, so the request is penalized and will try again next time. (I checked 
Amazon's sample code to see how they were handling it, but they were not 
checking for nulls either!)

It seems like there's a callback that might work to look for an error, or we 
can just check the null value - but one question that I have to ask is what is 
the desired behavior? If the file was corrupted on download, then retrying it 
might actually be the right thing to do. But then it caused problems in this 
case! But I worry that fixing it for this case will break things for the more 
common cases.

Right now I'm leaning towards either doing nothing, or else seeing if there is 
a callback that will tell me what the error actually was so we can do a better 
job logging what happened, and leaving it at that. 

Thoughts? [~ste...@apache.org] ? 

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Assignee: Evan Reynolds
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-6367) FetchS3Processor responds to md5 error on download by doing download again, again, and again

2019-06-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860991#comment-16860991
 ] 

Steve Loughran commented on NIFI-6367:
--

# What's the specific exception stack trace people see on an MD5 error?
# whatever is doing the D/L is presumably retrying, and clearly isn't giving 
up. I don't know the NiFi logic here; in hadoop-aws I don't see us handling an 
MD5 error at all, which is why I'm curious as to what you saw. If its going 
through the Amazon SDK and it's transfer manager, it may be that bit of code 
which is trying to be helpful

> FetchS3Processor responds to md5 error on download by doing download again, 
> again, and again
> 
>
> Key: NIFI-6367
> URL: https://issues.apache.org/jira/browse/NIFI-6367
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.7.1
> Environment: NIFI (CentOS 7.2) with FetchS3Object running towards S3 
> enviroment (non public). Enviroment / S3 had errors that introduced md5 
> errors on sub 0.5% of downloads. Downloads with md5 errors accumulated in the 
> input que of the processor.
>Reporter: Kefevs Pirkibo
>Priority: Critical
>
> (6months old, but don't see changes in the relevant parts of the code, though 
> I might be mistaken. This might be hard to replicate, so suggest a code 
> wizard check if this is still a problem. )
> Case: NIFI running with FetchS3Object processor(s) towards S3 enviroment (non 
> public). The enviroment and S3 had in combination hardware errors that 
> resulted in sporadic md5 errors on the same files over and over again. Md5 
> errors resulted in an unhandled AmazonClientException, and the file was 
> downloaded yet again. (Reverted to the input que, first in line.) In our case 
> this was identified after a number of days, with substantial bandwidth usage. 
> It did not help that the FetchS3Objects where running with multiple 
> instances, and after days accumulated the bad md5 checksum files for 
> continuous download.
> Suggest: Someone code savy check what happens to files that are downloaded 
> with bad md5, if they are reverted to the que due to uncought exception or 
> other means, then this is still a potential problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)