Hi Samrat, Thanks for driving this! I think it would be good to start a separate [DISCUSS][FLIP-555] thread. We can refer this thread there as previous discussions, but IMO it would be good to have the FLIP it's own thread, following the FLIP process [1].
I'm happy to review myself in the next couple days. Best, Ferenc [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals On Tuesday, February 3rd, 2026 at 22:42, Samrat Deb <[email protected]> wrote: > > > Hi, > I conducted a benchmarking comparison of state checkpointing to S3, > comparing the proposed native S3 implementation with flink-s3-fs-presto. > The results are promising. The native implementation performs better under > the setup used. > PTAL at the benchmark document for detailed analysis with logs and setup > details[1] > > As a next step, FLIP-555[2] is out for review. PTAL > > Cheers, > Samrat > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem > > > On Wed, Nov 12, 2025 at 1:21 AM Samrat Deb [email protected] wrote: > > > Hi Gabor, > > > > Apologies for the delayed response. > > > > > - A migration guide would be excellent from the old connectors. That way > > > users can see how much effort it is. > > > > Yes, that’s one of the key aspects. I’ve tested the patch on S3. The > > configuration remains exactly the same. The only change required is to > > place the new `flink-s3-fs-native` JAR in the `plugins` directory and > > remove the `flink-s3-fs-hadoop` JAR from there. > > I haven’t documented a detailed design or migration plan yet. I’m waiting > > for the first round of benchmark and comparison test results. > > > > > - One of the key points from operational perspective is to have a way to > > > make IOPS usage > > > configurable. As on oversimplified explanation just to get a taste this > > > can > > > be kept under control in 2 ways and places: > > > 1. In Hadoop s3a set `fs.s3a.limit.total` > > > 2. In connector set `s3.multipart.upload.min.file.size` and > > > `s3.multipart.upload.min.part.size` > > > Do I understand it correctly that this is intended to be covered by the > > > following configs? > > > > > | s3.upload.min.part.size | 5242880 | Minimum part size for multipart > > > uploads (5MB) | > > > | s3.upload.max.concurrent.uploads | CPU cores | Maximum concurrent > > > uploads > > > per stream | > > > > Yes, the POC patch currently includes three configurations[1]: > > 1. `s3.upload.min.part.size` > > 2. `s3.upload.max.concurrent.uploads` > > 3. `s3.read.buffer.size` > > > > The idea is to start by supporting configurable IOPS through these > > parameters. > > Do you think these minimal configs are sufficient to begin with? > > > > > > I am now drafting a formal benchmark plan based on these specifics and > > > > will share it with this thread in the coming days for feedback. > > > > Waiting for the details. > > > > Still Waiting for my employer to approve resources for the purpose 😅 > > > > Cheers, > > Samrat > > > > [1] > > https://github.com/apache/flink/pull/27187/files#diff-f1e31c70c03cb943bc0e62fe456ca8d0b6bb63ae56c062d68f54ce2806b43f45R38 > > > > On Wed, Nov 5, 2025 at 5:34 PM Gabor Somogyi [email protected] > > wrote: > > > > > Hi Samrat, > > > > > > Thanks for the contribution! I've had a slight look at the code which is > > > promising. > > > > > > I've a couple of questions/remarks: > > > - A migration guide would be excellent from the old connectors. That way > > > users can see how much effort it is. > > > - One of the key points from operational perspective is to have a way to > > > make IOPS usage > > > configurable. As on oversimplified explanation just to get a taste this > > > can > > > be kept under control in 2 ways and places: > > > 1. In Hadoop s3a set `fs.s3a.limit.total` > > > 2. In connector set `s3.multipart.upload.min.file.size` and > > > `s3.multipart.upload.min.part.size` > > > Do I understand it correctly that this is intended to be covered by the > > > following configs? > > > > > > | s3.upload.min.part.size | 5242880 | Minimum part size for multipart > > > uploads (5MB) | > > > | s3.upload.max.concurrent.uploads | CPU cores | Maximum concurrent > > > uploads > > > per stream | > > > > > > > I am now drafting a formal benchmark plan based on these specifics and > > > > will share it with this thread in the coming days for feedback. > > > > Waiting for the details. > > > > > > BR, > > > G > > > > > > On Wed, Nov 5, 2025 at 7:08 AM Samrat Deb [email protected] wrote: > > > > > > > Hi all, > > > > > > > > I have a working POC for the Native S3 filesystem, which is now > > > > available > > > > as a draft PR [1]. > > > > The POC is functional and has been validated in a local setup with > > > > Minio. > > > > It's important to note that it does not yet have complete test coverage. > > > > > > > > The immediate next step is to conduct a comprehensive benchmark to > > > > compare > > > > its performance against the existing `flink-s3-fs-hadoop` and > > > > `flink-s3-fs-presto` implementations. > > > > > > > > I've had a very meaningful discussion with Piotr Nowojski about this > > > > offline. I am grateful for his detailed guidance on defining a rigorous > > > > benchmarking strategy, including specific cluster configurations, job > > > > workloads, and key metrics for evaluating both checkpoint/recovery > > > > performance and pure throughput. > > > > I am now drafting a formal benchmark plan based on these specifics and > > > > will > > > > share it with this thread in the coming days for feedback. > > > > > > > > Cheers, > > > > Samrat > > > > > > > > [1] https://github.com/apache/flink/pull/27187 > > > > > > > > On Wed, Oct 29, 2025 at 9:31 PM Samrat Deb [email protected] > > > > wrote: > > > > > > > > > thank you Martijn for clarifying . > > > > > i will proceed with creating a task. > > > > > > > > > > Thanks Mate for the pointer to Minio for testing. > > > > > minio is good to use for testing . > > > > > > > > > > Cheers, > > > > > Samrat > > > > > > > > > > On Mon, 27 Oct 2025 at 11:55 PM, Mate Czagany [email protected] > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Just to add to the MinIO licensing concerns, I could not see any > > > > > > recent > > > > > > change to the license itself, they have changed the license from > > > > > > Apache > > > > > > 2.0 > > > > > > to AGPL-3.0 in 2021, and the Docker image used by the tests (which > > > > > > is > > > > > > from > > > > > > 2022) already contains the AGPL-3.0 license. This should not be an > > > > > > issue > > > > > > as > > > > > > Flink does not distribute nor makes MinIO available over the > > > > > > network, > > > > > > it's > > > > > > only used by the tests. > > > > > > > > > > > > What's changed recently is that MinIO no longer publishes Docker > > > > > > images > > > > > > to > > > > > > the public [1], so it might be worth it to look into using > > > > > > alternative > > > > > > solutions in the future, e.g. Garage [2]. > > > > > > > > > > > > Best regards, > > > > > > Mate > > > > > > > > > > > > [1] > > > > > > https://github.com/minio/minio/issues/21647#issuecomment-3418675115 > > > > > > [2] https://garagehq.deuxfleurs.fr/ > > > > > > > > > > > > On Mon, Oct 27, 2025 at 5:48 PM Ferenc Csaky > > > > > > <[email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > Really nice to see people chime into this thread. I agree with > > > > > > > Martijn > > > > > > > about the > > > > > > > development approach. There will be some iterations until we can > > > > > > > stabilize > > > > > > > this anyways, > > > > > > > so we can try to shoot getting out a good enough MVP, then fix > > > > > > > issues > > > > > > > + > > > > > > > reach feature > > > > > > > parity with the existing implementations on the go. > > > > > > > > > > > > > > I am not a licensing expert but AFAIK the previous images that > > > > > > > were > > > > > > > released under the > > > > > > > acceptable license can be continued to use. For most integration > > > > > > > tests, > > > > > > > we > > > > > > > use an > > > > > > > ancient image anyways [1]. There is another place where the latest > > > > > > > img > > > > > > > gets pulled [2], > > > > > > > I guess it would be good to apply an explicit that tag there. But > > > > > > > AFAIK > > > > > > > they stop > > > > > > > publishing to Docker Hub, so I would anticipate we cannot end up > > > > > > > pulling > > > > > > > an image with > > > > > > > a forbidden license. > > > > > > > > > > > > > > Best, > > > > > > > Ferenc > > > > > > > > > > > > > > [1] > > > > > > https://github.com/apache/flink/blob/fd1a97768b661f19783afe70d93a0a8d3d625b2a/flink-test-utils-parent/flink-test-utils-junit/src/main/java/org/apache/flink/util/DockerImageVersions.java#L39 > > > > > > > > > > [2] > > > > > > https://github.com/apache/flink/blob/fd1a97768b661f19783afe70d93a0a8d3d625b2a/flink-end-to-end-tests/test-scripts/common_s3_minio.sh#L51 > > > > > > > > > > On Sunday, October 26th, 2025 at 22:05, Martijn Visser < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > Hi Samrat, > > > > > > > > > > > > > > > > First of all, thanks for the proposal. It's long overdue to get > > > > > > > > this > > > > > > > > in a > > > > > > > > better state. > > > > > > > > > > > > > > > > With regards to the schemes, I would say to ship an initial > > > > > > > > release > > > > > > > > that > > > > > > > > does not include support for s3a and s3p, and focus first on > > > > > > > > getting > > > > > > > > this > > > > > > > > new implementation into a stable state. When that's done, as a > > > > > > > > follow-up, > > > > > > > > we can consider adding support for s3a and s3p on this > > > > > > > > implementation, > > > > > > > > and > > > > > > > > when that's there consider deprecating the older > > > > > > > > implementations. It > > > > > > > > will > > > > > > > > probably take multiple releases before we have this in a stable > > > > > > > > state. > > > > > > > > > > > > > > > > Not directly related to this, but given that MinIO decided to > > > > > > > > change > > > > > > > > their > > > > > > > > license, do we also need to refactor existing tests to not use > > > > > > > > MinIO > > > > > > > > anymore but something else? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Martijn > > > > > > > > > > > > > > > > On Sat, Oct 25, 2025 at 1:38 AM Samrat Deb [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > One clarifying question regarding the URI schemes: > > > > > > > > > > > > > > > > > > Currently, the Flink ecosystem uses multiple schemes to > > > > > > > > > differentiate > > > > > > > > > between S3 implementations: s3a:// for the Hadoop-based > > > > > > > > > connector > > > > > > > > > and > > > > > > > > > s3p://[1] for the Presto-based one, which is often recommended > > > > > > > > > for > > > > > > > > > checkpointing. > > > > > > > > > > > > > > > > > > A key goal of the proposed flink-s3-fs-native is to unify > > > > > > > > > these > > > > > > > > > into a > > > > > > > > > single implementation. With that in mind, what should be the > > > > > > > > > strategy > > > > > > > > > for > > > > > > > > > scheme support? Should the new native s3 filesystem register > > > > > > > > > only > > > > > > > > > for > > > > > > > > > the > > > > > > > > > simple s3:// scheme, aiming to deprecate the others? Or would > > > > > > > > > it > > > > > > > > > be > > > > > > > > > beneficial to also support s3a:// and s3p:// to provide a > > > > > > > > > smoother > > > > > > > > > migration path for users who may have these schemes in their > > > > > > > > > existing > > > > > > > > > job > > > > > > > > > configurations? > > > > > > > > > Cheers, > > > > > > > > > Samrat > > > > > > > > > > > > > > > > > > [1] https://github.com/generalui/s3p > > > > > > > > > > > > > > > > > > On Wed, Oct 22, 2025 at 6:31 PM Piotr Nowojski > > > > > > > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Samrat, > > > > > > > > > > > > > > > > > > > > > 1. Even if the specifics are hazy, could you recall the > > > > > > > > > > > general > > > > > > > > > > > nature of those concerns? For instance, were they related > > > > > > > > > > > to > > > > > > > > > > > S3's > > > > > > > > > > > eventual > > > > > > > > > > > consistency model, which has since improved, the atomicity > > > > > > > > > > > of > > > > > > > > > > > Multipart > > > > > > > > > > > Upload commits, or perhaps complex failure/recovery > > > > > > > > > > > scenarios > > > > > > > > > > > during > > > > > > > > > > > the > > > > > > > > > > > commit phase? > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > *8. *The flink-s3-fs-presto connector explicitly throws an > > > > > > > > > > > `UnsupportedOperationException` when > > > > > > > > > > > `createRecoverableWriter()` is > > > > > > > > > > > called. > > > > > > > > > > > Was this a deliberate design choice to keep the Presto > > > > > > > > > > > connector > > > > > > > > > > > lightweight and optimized specifically for checkpointing, > > > > > > > > > > > or > > > > > > > > > > > were > > > > > > > > > > > there > > > > > > > > > > > other technical challenges that prevented its > > > > > > > > > > > implementation > > > > > > > > > > > at > > > > > > > > > > > the > > > > > > > > > > > time? > > > > > > > > > > > Any context on this would be very helpful > > > > > > > > > > > > > > > > > > > > I very vaguely remember that at least one of those concerns > > > > > > > > > > was > > > > > > > > > > with > > > > > > > > > > respect to how long > > > > > > > > > > does it take for the S3 to make some certain operations > > > > > > > > > > visible. > > > > > > > > > > That you > > > > > > > > > > think you have > > > > > > > > > > uploaded and committed a file, but in reality it might not > > > > > > > > > > be > > > > > > > > > > visible for > > > > > > > > > > tens of seconds. > > > > > > > > > > > > > > > > > > > > Sorry, I don't remember more (or even if there was more). I > > > > > > > > > > was > > > > > > > > > > only > > > > > > > > > > superficially involved > > > > > > > > > > in the S3 connector back then - just participated/overheard > > > > > > > > > > some > > > > > > > > > > discussions. > > > > > > > > > > > > > > > > > > > > > 2. It's clear that implementing an efficient > > > > > > > > > > > PathsCopyingFileSystem[2] > > > > > > > > > > > is > > > > > > > > > > > a non-negotiable requirement for performance. Is there any > > > > > > > > > > > benchmark > > > > > > > > > > > numbers available that can be used as reference and > > > > > > > > > > > evaluate > > > > > > > > > > > new > > > > > > > > > > > implementation deviation ? > > > > > > > > > > > > > > > > > > > > I only have the numbers that I put in the original Flip [1]. > > > > > > > > > > I > > > > > > > > > > don't > > > > > > > > > > remember the benchmark > > > > > > > > > > setup, but it must have been something simple. Like just let > > > > > > > > > > some > > > > > > > > > > job > > > > > > > > > > accumulate 1GB of state > > > > > > > > > > and measure how long the state downloading phase of recovery > > > > > > > > > > was > > > > > > > > > > taking. > > > > > > > > > > > > > > > > > > > > > 3. Do you recall the workload characteristics for that > > > > > > > > > > > PoC? > > > > > > > > > > > Specifically, > > > > > > > > > > > was the 30-40% performance advantage of s5cmd observed > > > > > > > > > > > when > > > > > > > > > > > copying > > > > > > > > > > > many > > > > > > > > > > > small files (like checkpoint state) or larger, > > > > > > > > > > > multi-gigabyte > > > > > > > > > > > files? > > > > > > > > > > > > > > > > > > > > It was just a regular mix of compacted RocksDB sst files, > > > > > > > > > > with > > > > > > > > > > total > > > > > > > > > > state > > > > > > > > > > size 1 or at most > > > > > > > > > > a couple of GBs. So most of the files were around ~64MB or > > > > > > > > > > ~128MB, > > > > > > > > > > with a > > > > > > > > > > couple of > > > > > > > > > > smaller L0 files, and maybe one larger L2 file. > > > > > > > > > > > > > > > > > > > > > 4. The idea of a switchable implementation sounds great. > > > > > > > > > > > Would > > > > > > > > > > > you > > > > > > > > > > > envision this as a configuration flag (e.g., > > > > > > > > > > > s3.native.copy.strategy=s5cmd > > > > > > > > > > > or s3.native.copy.strategy=sdk) that selects the backend > > > > > > > > > > > implementation > > > > > > > > > > > at > > > > > > > > > > > runtime? Also on contrary is it worth adding configuration > > > > > > > > > > > that > > > > > > > > > > > exposes > > > > > > > > > > > some level of implementation level information ? > > > > > > > > > > > > > > > > > > > > I think something like that should be fine, assuming that > > > > > > > > > > `s5cmd` > > > > > > > > > > will > > > > > > > > > > again > > > > > > > > > > prove significantly faster and/or more cpu efficient. If > > > > > > > > > > not, if > > > > > > > > > > the > > > > > > > > > > SDKv2 > > > > > > > > > > has > > > > > > > > > > already improved and caught up with the `s5cmd`, then it > > > > > > > > > > probably > > > > > > > > > > doesn't > > > > > > > > > > make sense to keep `s5cmd` support. > > > > > > > > > > > > > > > > > > > > > 5. My understanding is that the key takeaway here is to > > > > > > > > > > > avoid > > > > > > > > > > > the > > > > > > > > > > > file-by-file stream-based copy used in the vanilla > > > > > > > > > > > connector > > > > > > > > > > > and > > > > > > > > > > > leverage > > > > > > > > > > > bulk operations, which PathsCopyingFileSystem[2] enables. > > > > > > > > > > > This > > > > > > > > > > > seems > > > > > > > > > > > most > > > > > > > > > > > critical during state download on recovery. please suggest > > > > > > > > > > > if > > > > > > > > > > > my > > > > > > > > > > > inference > > > > > > > > > > > is in right direction > > > > > > > > > > > > > > > > > > > > Yes, but you should also make the bult transfer > > > > > > > > > > configurable. > > > > > > > > > > How > > > > > > > > > > many > > > > > > > > > > bulk > > > > > > > > > > transfers > > > > > > > > > > can be happening in parallel etc. > > > > > > > > > > > > > > > > > > > > > 6. The warning about `s5cmd` causing OOMs sounds like > > > > > > > > > > > indication to > > > > > > > > > > > consider `S3TransferManager`[3] implementation, which > > > > > > > > > > > might > > > > > > > > > > > offer > > > > > > > > > > > more > > > > > > > > > > > granular control over buffering and in-flight requests. Do > > > > > > > > > > > you > > > > > > > > > > > think > > > > > > > > > > > exploring more on `S3TransferManager` would be valuable ? > > > > > > > > > > > > > > > > > > > > I'm pretty sure if you start hundreds of bulk transfers in > > > > > > > > > > parallel > > > > > > > > > > via > > > > > > > > > > the > > > > > > > > > > `S3TransferManager` you can get the same problems with > > > > > > > > > > running > > > > > > > > > > out of > > > > > > > > > > memory or exceeding available network throughput. I don't > > > > > > > > > > know > > > > > > > > > > if > > > > > > > > > > `S3TransferManager` is better or worse in that regard to be > > > > > > > > > > honest. > > > > > > > > > > > > > > > > > > > > > 7. The insight on AWS aggressively dropping packets > > > > > > > > > > > instead of > > > > > > > > > > > gracefully > > > > > > > > > > > throttling is invaluable. Currently i have limited > > > > > > > > > > > understanding > > > > > > > > > > > on how > > > > > > > > > > > aws > > > > > > > > > > > behaves at throttling I will deep dive more into it and > > > > > > > > > > > look for clarification based on findings or doubt. To > > > > > > > > > > > counter > > > > > > > > > > > this, > > > > > > > > > > > were > > > > > > > > > > > you thinking of a configurable rate limiter within the > > > > > > > > > > > filesystem > > > > > > > > > > > itself > > > > > > > > > > > (e.g., setting max bandwidth or max concurrent requests), > > > > > > > > > > > or > > > > > > > > > > > something > > > > > > > > > > > more > > > > > > > > > > > dynamic that could adapt to network conditions? > > > > > > > > > > > > > > > > > > > > Flat rate limiting is tricky because AWS offers burst > > > > > > > > > > network > > > > > > > > > > capacity, > > > > > > > > > > which > > > > > > > > > > comes very handy, and in the vast majority of cases works > > > > > > > > > > fine. > > > > > > > > > > But > > > > > > > > > > for > > > > > > > > > > some jobs > > > > > > > > > > if you exceed that burst capacity, AWS starts dropping your > > > > > > > > > > packets > > > > > > > > > > and > > > > > > > > > > then the > > > > > > > > > > problems happen. On the other hand, if rate limit to your > > > > > > > > > > normal > > > > > > > > > > capacity, > > > > > > > > > > you > > > > > > > > > > are leaving a lot of network throughput unused during > > > > > > > > > > recoveries. > > > > > > > > > > > > > > > > > > > > At the same time AWS doesn't share details for the burst > > > > > > > > > > capacity, so > > > > > > > > > > it's > > > > > > > > > > sometimes > > > > > > > > > > tricky to configure the whole system properly. I don't have > > > > > > > > > > an > > > > > > > > > > universal > > > > > > > > > > good answer > > > > > > > > > > for that :( > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Piotrek > > > > > > > > > > > > > > > > > > > > wt., 21 paź 2025 o 21:40 Samrat Deb [email protected] > > > > > > > > > > napisał(a): > > > > > > > > > > > > > > > > > > > > > Hi Gabor/ Ferenc > > > > > > > > > > > > > > > > > > > > > > Thank you for sharing the pointer and valuable feedback. > > > > > > > > > > > > > > > > > > > > > > The link to the custom `XmlResponsesSaxParser`[1] looks > > > > > > > > > > > scary > > > > > > > > > > > 😦 > > > > > > > > > > > and contains hidden complexity. > > > > > > > > > > > > > > > > > > > > > > 1. Could you share some context on why this custom parser > > > > > > > > > > > was > > > > > > > > > > > necessary? > > > > > > > > > > > Was it to work around a specific bug, a performance issue, > > > > > > > > > > > or > > > > > > > > > > > an > > > > > > > > > > > inconsistency in the S3 XML API responses that the default > > > > > > > > > > > AWS > > > > > > > > > > > SDK > > > > > > > > > > > parser > > > > > > > > > > > couldn't handle at the time? With sdk v2 what are core > > > > > > > > > > > functionality > > > > > > > > > > > that > > > > > > > > > > > is required to be intensively tested ? > > > > > > > > > > > > > > > > > > > > > > 2. You mentioned it has no Hadoop dependency, which is > > > > > > > > > > > great > > > > > > > > > > > news. > > > > > > > > > > > For > > > > > > > > > > > a > > > > > > > > > > > new native S3 connector, would integration simply require > > > > > > > > > > > implementing > > > > > > > > > > > a > > > > > > > > > > > new S3DelegationTokenProvider/Receiver pair using the AWS > > > > > > > > > > > SDK, > > > > > > > > > > > or > > > > > > > > > > > are > > > > > > > > > > > there > > > > > > > > > > > more subtle integration points with the framework that > > > > > > > > > > > should > > > > > > > > > > > be > > > > > > > > > > > accounted? > > > > > > > > > > > > > > > > > > > > > > 3. I remember solving Serialized Throwable exception issue > > > > > > > > > > > [2] > > > > > > > > > > > leading > > > > > > > > > > > to > > > > > > > > > > > a new bug [3], where an initial fix led to a regression > > > > > > > > > > > that > > > > > > > > > > > Gabor > > > > > > > > > > > later > > > > > > > > > > > solved with Ferenc providing a detailed root cause > > > > > > > > > > > insights > > > > > > > > > > > [4] > > > > > > > > > > > 😅. > > > > > > > > > > > Its hard to fully sure that all scenarios are covered > > > > > > > > > > > properly. > > > > > > > > > > > This is > > > > > > > > > > > one > > > > > > > > > > > of the example, there can be other unknowns. > > > > > > > > > > > what would be the best approach to test for and prevent > > > > > > > > > > > such > > > > > > > > > > > regressions > > > > > > > > > > > or > > > > > > > > > > > unknown unknowns, especially in the most sensitive parts > > > > > > > > > > > of > > > > > > > > > > > the > > > > > > > > > > > filesystem > > > > > > > > > > > logic? > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Samrat > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > https://github.com/apache/flink/blob/0e4e6d7082e83f098d0c1a94351babb3ea407aa8/flink-filesystems/flink-s3-fs-base/src/main/java/com/amazonaws/services/s3/model/transform/XmlResponsesSaxParser.java > > > > > > > > > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-28513 > > > > > > > > > > > [3] https://github.com/apache/flink/pull/25231 > > > > > > > > > > > [4] > > > > > > > > > > > https://github.com/apache/flink/pull/25231#issuecomment-2312059662 > > > > > > > > > > > > > > > > > > > > > > On Tue, 21 Oct 2025 at 3:49 PM, Gabor Somogyi < > > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Samrat, > > > > > > > > > > > > > > > > > > > > > > > > +1 on the direction that we move away from hadoop. > > > > > > > > > > > > > > > > > > > > > > > > This is a long standing discussion to replace the > > > > > > > > > > > > mentioned > > > > > > > > > > > > 2 > > > > > > > > > > > > connectors > > > > > > > > > > > > with something better. > > > > > > > > > > > > Both of them has it's own weaknesses, I've fixed several > > > > > > > > > > > > blockers > > > > > > > > > > > > inside > > > > > > > > > > > > them. > > > > > > > > > > > > > > > > > > > > > > > > There are definitely magic inside them, please see this > > > > > > > > > > > > [1] > > > > > > > > > > > > for > > > > > > > > > > > > example > > > > > > > > > > > > and > > > > > > > > > > > > there are more🙂 > > > > > > > > > > > > I think the most sensitive part is the recovery because > > > > > > > > > > > > hard > > > > > > > > > > > > to > > > > > > > > > > > > test > > > > > > > > > > > > all > > > > > > > > > > > > cases. > > > > > > > > > > > > > > > > > > > > > > > > @Ferenc > > > > > > > > > > > > > > > > > > > > > > > > > One thing that comes to my mind that will need some > > > > > > > > > > > > > changes > > > > > > > > > > > > > and its > > > > > > > > > > > > > involvement > > > > > > > > > > > > > to this change is not trivial is the delegation token > > > > > > > > > > > > > framework. > > > > > > > > > > > > > Currently > > > > > > > > > > > > > it > > > > > > > > > > > > > is also tied to the Hadoop stuff and has some abstract > > > > > > > > > > > > > classes > > > > > > > > > > > > > in the > > > > > > > > > > > > > base > > > > > > > > > > > > > S3 FS > > > > > > > > > > > > > module. > > > > > > > > > > > > > > > > > > > > > > > > The delegation token framework has no dependency on > > > > > > > > > > > > hadoop > > > > > > > > > > > > so > > > > > > > > > > > > there > > > > > > > > > > > > is > > > > > > > > > > > > no > > > > > > > > > > > > blocker on the road, > > > > > > > > > > > > but I'm here to help if any question appears. > > > > > > > > > > > > > > > > > > > > > > > > BR, > > > > > > > > > > > > G > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > https://github.com/apache/flink/blob/0e4e6d7082e83f098d0c1a94351babb3ea407aa8/flink-filesystems/flink-s3-fs-base/src/main/java/com/amazonaws/services/s3/model/transform/XmlResponsesSaxParser.java#L95-L104 > > > > > > > > > > > > > > > On Tue, Oct 14, 2025 at 8:19 PM Samrat Deb > > > > > > > > > > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > > > Poorvank (cc'ed) and I are writing to start a > > > > > > > > > > > > > discussion > > > > > > > > > > > > > about > > > > > > > > > > > > > a > > > > > > > > > > > > > potential > > > > > > > > > > > > > improvement for Flink, creating a new, native S3 > > > > > > > > > > > > > filesystem > > > > > > > > > > > > > independent > > > > > > > > > > > > > of > > > > > > > > > > > > > Hadoop/Presto. > > > > > > > > > > > > > > > > > > > > > > > > > > The goal of this proposal is to address several > > > > > > > > > > > > > challenges > > > > > > > > > > > > > related > > > > > > > > > > > > > to > > > > > > > > > > > > > Flink's S3 integration, simplifying > > > > > > > > > > > > > flink-s3-filesystem. > > > > > > > > > > > > > If > > > > > > > > > > > > > this > > > > > > > > > > > > > discussion > > > > > > > > > > > > > gains positive traction, the next step would be to > > > > > > > > > > > > > move > > > > > > > > > > > > > forward > > > > > > > > > > > > > with > > > > > > > > > > > > > a > > > > > > > > > > > > > formalised FLIP. > > > > > > > > > > > > > > > > > > > > > > > > > > The Challenges with the Current S3 Connectors > > > > > > > > > > > > > Currently, Flink offers two primary S3 filesystems, > > > > > > > > > > > > > flink-s3-fs-hadoop[1] > > > > > > > > > > > > > and flink-s3-fs-presto[2]. While functional, this > > > > > > > > > > > > > dual-connector > > > > > > > > > > > > > approach > > > > > > > > > > > > > has few issues: > > > > > > > > > > > > > > > > > > > > > > > > > > 1. The flink-s3-fs-hadoop connector adds an additional > > > > > > > > > > > > > dependency > > > > > > > > > > > > > to > > > > > > > > > > > > > manage. Upgrades like AWS SDK v2 are more dependent on > > > > > > > > > > > > > Hadoop/Presto > > > > > > > > > > > > > to > > > > > > > > > > > > > support first and leverage in flink-s3-filesystem. > > > > > > > > > > > > > Sometimes > > > > > > > > > > > > > it's > > > > > > > > > > > > > restrictive to leverage features directly from the AWS > > > > > > > > > > > > > SDK. > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The flink-s3-fs-presto connector was introduced to > > > > > > > > > > > > > mitigate > > > > > > > > > > > > > the > > > > > > > > > > > > > performance issues of the Hadoop connector, especially > > > > > > > > > > > > > for > > > > > > > > > > > > > checkpointing. > > > > > > > > > > > > > However, it lacks a RecoverableWriter implementation. > > > > > > > > > > > > > Sometimes it's confusing for Flink users, highlighting > > > > > > > > > > > > > the > > > > > > > > > > > > > need > > > > > > > > > > > > > for a > > > > > > > > > > > > > single, unified solution. > > > > > > > > > > > > > > > > > > > > > > > > > > Proposed Solution: > > > > > > > > > > > > > A Native, Hadoop-Free S3 Filesystem > > > > > > > > > > > > > > > > > > > > > > > > > > I propose we develop a new filesystem, let's call it > > > > > > > > > > > > > flink-s3-fs-native, > > > > > > > > > > > > > built directly on the modern AWS SDK for Java v2. This > > > > > > > > > > > > > approach > > > > > > > > > > > > > would > > > > > > > > > > > > > be > > > > > > > > > > > > > free of any Hadoop or Presto dependencies. I have done > > > > > > > > > > > > > a > > > > > > > > > > > > > small > > > > > > > > > > > > > prototype > > > > > > > > > > > > > to > > > > > > > > > > > > > validate [3] > > > > > > > > > > > > > > > > > > > > > > > > > > This is motivated by trino<>s3 [4]. The Trino project > > > > > > > > > > > > > successfully > > > > > > > > > > > > > undertook a similar migration, moving from > > > > > > > > > > > > > Hadoop-based > > > > > > > > > > > > > object > > > > > > > > > > > > > storage > > > > > > > > > > > > > clients to their own native implementations. > > > > > > > > > > > > > > > > > > > > > > > > > > The new Flink S3 filesystem would: > > > > > > > > > > > > > > > > > > > > > > > > > > 1. Provide a single, unified connector for all S3 > > > > > > > > > > > > > interactions, > > > > > > > > > > > > > from > > > > > > > > > > > > > state > > > > > > > > > > > > > backends to sinks. > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Implement a high-performance S3RecoverableWriter > > > > > > > > > > > > > using > > > > > > > > > > > > > S3's > > > > > > > > > > > > > Multipart > > > > > > > > > > > > > Upload feature, ensuring exactly-once sink semantics. > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Offer a clean, self-contained dependency, > > > > > > > > > > > > > drastically > > > > > > > > > > > > > simplifying > > > > > > > > > > > > > setup > > > > > > > > > > > > > and eliminating external dependencies. > > > > > > > > > > > > > > > > > > > > > > > > > > A Phased Migration Path > > > > > > > > > > > > > To ensure a smooth transition, we could adopt a phased > > > > > > > > > > > > > approach on > > > > > > > > > > > > > a > > > > > > > > > > > > > very > > > > > > > > > > > > > high level : > > > > > > > > > > > > > > > > > > > > > > > > > > Phase 1: > > > > > > > > > > > > > Introduce the new native S3 filesystem as an optional, > > > > > > > > > > > > > parallel > > > > > > > > > > > > > plugin. > > > > > > > > > > > > > This would allow for community testing and adoption > > > > > > > > > > > > > without > > > > > > > > > > > > > breaking > > > > > > > > > > > > > existing setups. > > > > > > > > > > > > > > > > > > > > > > > > > > Phase 2: > > > > > > > > > > > > > Once the native connector achieves feature parity and > > > > > > > > > > > > > proven > > > > > > > > > > > > > stability, > > > > > > > > > > > > > we > > > > > > > > > > > > > will update the documentation to recommend it as the > > > > > > > > > > > > > default > > > > > > > > > > > > > choice > > > > > > > > > > > > > for > > > > > > > > > > > > > all > > > > > > > > > > > > > S3 use cases. > > > > > > > > > > > > > > > > > > > > > > > > > > Phase 3: > > > > > > > > > > > > > In a future major release, the legacy > > > > > > > > > > > > > flink-s3-fs-hadoop > > > > > > > > > > > > > and > > > > > > > > > > > > > flink-s3-fs-presto connectors could be formally > > > > > > > > > > > > > deprecated, > > > > > > > > > > > > > with > > > > > > > > > > > > > clear > > > > > > > > > > > > > migration guides provided for users. > > > > > > > > > > > > > > > > > > > > > > > > > > I would love to hear the community's thoughts on this. > > > > > > > > > > > > > > > > > > > > > > > > > > A few questions to start the discussion: > > > > > > > > > > > > > > > > > > > > > > > > > > 1. What are the biggest pain points with the current > > > > > > > > > > > > > S3 > > > > > > > > > > > > > filesystem? > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Are there any critical features from the Hadoop S3A > > > > > > > > > > > > > client > > > > > > > > > > > > > that > > > > > > > > > > > > > are > > > > > > > > > > > > > essential to replicate in a native implementation? > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Would a simplified, non-dependent S3 experience be > > > > > > > > > > > > > a > > > > > > > > > > > > > valuable > > > > > > > > > > > > > improvement for Flink use cases? > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > Samrat > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > https://github.com/apache/flink/tree/master/flink-filesystems/flink-s3-fs-hadoop > > > > > > > > > > > > > > > > [2] > > > > > > https://github.com/apache/flink/tree/master/flink-filesystems/flink-s3-fs-presto > > > > > > > > > > > > > > > > [3] https://github.com/Samrat002/flink/pull/4 > > > > > > > > > > > > > [4] > > > > > > https://github.com/trinodb/trino/tree/master/lib/trino-filesystem-s3
