Re: NIFI-4715 : ListS3 list duplicate files when incoming file throughput to S3 is high

2017-12-26 Thread Koji Kawamura
Hi Milan,

Thanks for your contribution! I reviewed the PR and posted a comment there.
Would you check that?

Koji

On Sat, Dec 23, 2017 at 7:15 AM, Milan Das  wrote:

> I have logged a defect in NIFI. ListS3 is generation duplicate flows  when
> S3 throughput is high.
>
>
>
> Root cause is:
> When the file gets uploaded to S3 simultaneously when List S3 is in
> progress.
> onTrigger--> maxTimestamp is initiated as 0L.
> This is clearing keys as per the code below
>
> When lastModifiedTime on S3 object is same as currentTimestamp for the
> listed key it should be skipped. As the key is cleared, it is loading the
> same file again.
> I think fix should be to initiate the maxTimestamp with currentTimestamp
> not 0L.
>
>
>
>
>
>
>
> https://issues.apache.org/jira/browse/ NIFI-4715
> 
>
>
>
> The fix I did already seems ok and working for us.
>
> long maxTimestamp = currentTimestamp;
>
>
>
> Wanted to check thought from other experts or of there is any other know
> fix .
>
>
>
>
>
> Regards,
>
>
>
> [image: graph]
>
> *Milan Das*
> Sr. System Architect
>
> email: m...@interset.com
> mobile: +1 678 216 5660 <(678)%20216-5660>
>
> [image: dIn icon] 
>
> www.interset.com
>
>
>
>
>


NIFI-4715 : ListS3 list duplicate files when incoming file throughput to S3 is high

2017-12-22 Thread Milan Das
I have logged a defect in NIFI. ListS3 is generation duplicate flows  when S3 
throughput is high.

 

Root cause is: 
When the file gets uploaded to S3 simultaneously when List S3 is in progress.
onTrigger--> maxTimestamp is initiated as 0L.
This is clearing keys as per the code below

When lastModifiedTime on S3 object is same as currentTimestamp for the listed 
key it should be skipped. As the key is cleared, it is loading the same file 
again. 
I think fix should be to initiate the maxTimestamp with currentTimestamp not 0L.

 

 

 

https://issues.apache.org/jira/browse/ NIFI-4715

 

The fix I did already seems ok and working for us.

long maxTimestamp = currentTimestamp;

 

Wanted to check thought from other experts or of there is any other know fix .

 

 

Regards,

 

Milan Das
Sr. System Architect
email: m...@interset.com
mobile: +1 678 216 5660
www.interset.com