Github user adamlamar commented on the issue:

    https://github.com/apache/nifi/pull/2361
  
    @ijokarumawak From the <a 
href="https://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html#v2-RESTBucketGET-requests";>AWS
 S3 API documentation</a> (see the `continuation-token` section):
    
    > Amazon S3 lists objects in UTF-8 character encoding in lexicographical 
order
    
    I really wish we could take the approach you suggested (would certainly 
make things easier), but since the entries are in lexicographical/alphabetical 
order, we must iterate over the entire listing before updating 
`currentTimestamp`. Otherwise we risk skipping keys newer than 
`currentTimestamp` but older than keys in the middle of the list. The 
lexicographical ordering also matches my experience when using the API.
    
    Unfortunately this does also mean that duplicates are possible when a 
listing fails, like the `IOException` scenario you mentioned. This is an 
existing limitation in ListS3.
    
    I appreciate your help getting this reviewed! :)


---

Reply via email to