[ https://issues.apache.org/jira/browse/NIFI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joey Frazee updated NIFI-7886: ------------------------------ Affects Version/s: (was: 1.13.2) (was: 1.13.1) > FetchAzureBlobStorage, FetchS3Object, and FetchGCSObject processors should be > able to fetch ranges > -------------------------------------------------------------------------------------------------- > > Key: NIFI-7886 > URL: https://issues.apache.org/jira/browse/NIFI-7886 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions > Affects Versions: 1.12.0, 1.13.0 > Reporter: Paul Kelly > Assignee: Paul Kelly > Priority: Minor > Labels: azureblob, gcs, s3 > Fix For: 1.14.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Azure Blob Storage, AWS S3, and Google Cloud Storage all support retrieving > byte ranges of stored objects. Current versions of NiFi processors for these > services do not support fetching by byte range. > Allowing to fetch by range would allow multiple enhancements: > * Parallelized downloads > ** Faster speeds if the bandwidth delay product of the connection is lower > than the available bandwidth > ** Load distribution over a cluster > * Cost savings > ** If the file is large and only part of the file is needed, the desired > part of the file can be downloaded, saving bandwidth costs by not retrieving > unnecessary bytes > ** Download failures would only need to retry the failed segment, rather > than the full file > * Download extremely large files > ** Ability to download files that are larger than the available content repo > by downloading a segment and moving it off to a system with more capacity > before downloading another segment > > Some of these enhancements would require an upstream processor to generate > multiple flow files, each covering a different part of the overall range. > Something like this: > ListS3 -> ExecuteGroovyScript (to split into multiple flow files with > different range attributes) -> FetchS3Object. -- This message was sent by Atlassian Jira (v8.3.4#803005)