[ https://issues.apache.org/jira/browse/NIFI-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450709#comment-15450709 ]
Joseph Gresock commented on NIFI-2631: -------------------------------------- I'm totally fine with committing after every batch -- I tend to leave the option there when modifying someone else's code, since I'm not sure if they had a use case I hadn't thought of. But if Adam agrees, I'd say let's just make it part of the behavior. > ListS3 improvements: "Use versions" and "Commit mode" > ----------------------------------------------------- > > Key: NIFI-2631 > URL: https://issues.apache.org/jira/browse/NIFI-2631 > Project: Apache NiFi > Issue Type: Improvement > Affects Versions: 0.7.0 > Reporter: Joseph Gresock > Assignee: Joseph Gresock > Priority: Minor > Fix For: 1.1.0, 0.8.0 > > > Our team needs to be able to list individual versions in S3. We also ran > into a use case where a bucket with many objects (over 1 million in our case) > seemed to cause ListS3 to run forever. The S3 list command finished in a few > minutes, but we believe it was taking a very long time for NiFi to commit all > the flow files at once. > To handle this use case, we added a Commit Mode property to ListS3 that > allows you specify that you want to commit "Per page" vs. "Once". This has > proven to correctly emit the flow files as the S3 paging progresses. > We also implemented support for S3 List Versions, which includes the > "s3.version" and "s3.isLatest" attributes if applicable. The "s3.version" > attribute can in turn be used in the FetchS3 processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)