Hi all, Thanks Aleksandr and Ryan for the feedback.
Samrat and I looked closely at the existing flink-gs-fs-hadoop implementation. The current code is already split in a useful way: the generic FileSystem path is still Hadoop-backed, via GSFileSystem,GSFileSystemFactory, and ConfigUtils, but the RecoverableWriter path is already implemented directly on top of the Google Cloud Storage client. In particular, *GSRecoverableWriter, GSRecoverableFsDataOutputStream, GSRecoverableWriterCommitter, *and GSBlobStorage do not depend on Hadoop. So IMO the scope for *flink-gs-fs-native* looks fairly concrete. A first version should be able to reuse the existing recoverable writer path, while replacing the Hadoop-backed generic filesystem operations with direct GCS client operations: *open/read, getFileStatus, listStatus, delete, mkdirs, rename/copy semantics*, and *native factory/auth/configuration* handling. The native S3 filesystem from FLIP-555 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem> looks like a useful reference point for this, since it implements Flink's FileSystem directly on the AWS SDK. On Aleksandr's umbrella FLIP for cross-cloud support: we would like to coordinate with that effort so the GCS work fits naturally into the broader Hadoop-less cloud filesystem direction. Ryan, thanks for offering to test an alpha. Production validation from a large GCS user would be very useful once there is an initial optional plugin. One more reference: FLINK-19481 <https://issues.apache.org/jira/browse/FLINK-19481> has been open since 2020 for adding a native GCS FileSystem. This discussion seems like a good opportunity to revive that work with the newer FLIP-555 direction as a model. Thanks, Poorvank On Tue, Jun 23, 2026 at 7:14 PM Ryan van Huuksloot via dev < [email protected]> wrote: > Hello, > > I wanted to jump in and say that I think this is a great effort. We've had > many issues with Hadoop being a dependency. > > Given our other priorities at Shopify, we don't have time to contribute in > 2026. However, when an alpha release is available, we would be happy to run > it against our system. > > Thanks, > Ryan van Huuksloot > Staff Engineer, Infrastructure | Streaming Platform > [image: Shopify] > <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> > > > On Tue, Jun 23, 2026 at 4:46 AM Samrat Deb <[email protected]> wrote: > > > Thank you, Aleksandr, for adding more to the proposal. > > Looking forward to collaborating on this project. > > > > Best, > > Samrat > > > > On Mon, Jun 22, 2026 at 10:46 PM Aleksandr Iushmanov < > [email protected]> > > wrote: > > > > > Hi Samrat, > > > > > > Thank you for working on this. I agree that the community would benefit > > > from introduction of the native filesystem implementation due to > similar > > > motivation to the one raised in [1]. I am actively working on an > > "Umbrella" > > > FLIP for cross-clouds support and your proposal naturally fills in the > > gap > > > for GCS cloud. > > > > > > Speaking of pain points related to hadoop connectors, I would like to > > > mention: > > > 1. Complexity of CVE management. > > > 2. Challenges with dependency upgrades including Java version upgrades. > > > 3. Lack of support for client-side encryption with custom key providers > > > (especially in cross-cloud manner). > > > > > > I am looking forward to collaborating with you on hadoop-less flink > file > > > systems support. > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem > > > > > > Kind regards, > > > Alex > > > > > > > > > On Mon, 22 Jun 2026 at 06:44, Samrat Deb <[email protected]> > wrote: > > > > > > > Hi all, > > > > > > > > Poorvank(cc'ed) & I would like to start a discussion about a > potential > > > > improvement for Flink's > > > > Google Cloud Storage integration to create a native GCS filesystem > > > > independent of Hadoop. Earlier we were able do for s3 [1] > > > > > > > > The entire effort is to move forward to a Hadoop-free Flink > Filesystem > > > and > > > > unlock potential performance benefits for Flink's focus requirements. > > > > > > > > The goal of this proposal is to explore whether Flink would benefit > > from > > > a > > > > first-class GCS filesystem implementation built directly on top of > > Google > > > > Cloud Storage client libraries rather than relying on the Hadoop > > > connector. > > > > If the discussion gains positive traction, the next step would be to > > > > prepare > > > > a formal FLIP. > > > > > > > > The Current State > > > > Today, Flink's GCS support is provided through flink-gs-fs-hadoop > [2], > > > > which is based on Google's Cloud Storage Hadoop connector [3]. > > > > > > > > This approach has served Flink well, but it also introduces some > > > > limitations: > > > > > > > > 1. > > > > > > > > Flink's GCS integration depends on the Hadoop filesystem > abstraction > > > and > > > > the Hadoop-based GCS connector. As a result, upgrades and feature > > > > adoption > > > > are tied to the evolution of those external components. > > > > 2. > > > > > > > > The dependency stack is larger than necessary for users who only > > > require > > > > Google Cloud Storage support. In practice, users must bring in > > > > Hadoop-based > > > > components even though the underlying storage system is an object > > > store. > > > > 3. > > > > > > > > Leveraging new capabilities from Google Cloud Storage often > requires > > > > waiting for support to become available through the Hadoop > connector > > > > before > > > > Flink can benefit from them. > > > > > > > > Proposed Direction > > > > > > > > I would like to explore the feasibility of a new filesystem > > > implementation, > > > > tentatively named flink-gs-fs-native, built directly on top of Google > > > Cloud > > > > Storage client libraries. > > > > > > > > The goals would be: > > > > > > > > 1. > > > > > > > > Provide a Hadoop-independent implementation of Flink's FileSystem > > API > > > > for > > > > Google Cloud Storage. > > > > 2. > > > > > > > > Reduce dependency complexity and make the GCS integration easier > to > > > > maintain and evolve. > > > > 3. > > > > > > > > Allow Flink to adopt new Google Cloud Storage features and > > performance > > > > improvements directly, without depending on Hadoop abstractions. > > > > 4. > > > > > > > > Continue supporting Flink features such as checkpointing, > > savepoints, > > > > state backends, and file sinks through a native implementation. > > > > > > > > A Possible Migration Path > > > > > > > > To ensure a smooth transition, a phased approach could be considered: > > > > > > > > Phase 1: > > > > Introduce the native GCS filesystem as an optional plugin alongside > the > > > > existing flink-gs-fs-hadoop connector. > > > > > > > > Phase 2: > > > > Gather community feedback, validate production readiness, and achieve > > > > feature parity with the existing implementation. > > > > > > > > Phase 3: > > > > If the native implementation proves mature and broadly adopted, > discuss > > > > whether the Hadoop-based implementation should remain, be deprecated, > > or > > > > continue to coexist. > > > > > > > > Questions for the Community > > > > > > > > 1. > > > > > > > > What are the biggest pain points users face today with > > > > flink-gs-fs-hadoop? > > > > 2. > > > > > > > > Are there any critical capabilities provided by the Hadoop-based > GCS > > > > connector that would be difficult or undesirable to reimplement? > > > > 3. > > > > > > > > Would a Hadoop-independent GCS filesystem provide meaningful value > > for > > > > your Flink deployments? > > > > 4. > > > > > > > > Are there specific GCS features or operational concerns that > should > > be > > > > considered from the beginning? > > > > > > > > Looking forward to hearing the community's thoughts. > > > > > > > > Best, > > > > Samrat > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem > > > > > > > > > > > > [2] > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-filesystems/flink-gs-fs-hadoop > > > > > > > > [3] > > > > > > https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs > > > > > > > > > >
