Re: [DISCUSSION] Native GCS Filesystem in Apache Flink

Poorvank Bhatia Wed, 24 Jun 2026 11:38:04 -0700

Hi all,

Thanks Aleksandr and Ryan for the feedback.


Samrat and I looked closely at the existing flink-gs-fs-hadoop
implementation. The current code is already split in a useful way: the
generic FileSystem path is still Hadoop-backed, via
GSFileSystem,GSFileSystemFactory, and ConfigUtils, but the
RecoverableWriter path is already implemented directly on top of the Google
Cloud Storage client.

In particular, *GSRecoverableWriter, GSRecoverableFsDataOutputStream,
GSRecoverableWriterCommitter, *and GSBlobStorage do not depend on Hadoop.

So IMO the scope for *flink-gs-fs-native* looks fairly concrete. A first
version should be able to reuse the existing recoverable writer path, while
replacing the Hadoop-backed generic filesystem operations with direct GCS
client operations: *open/read, getFileStatus, listStatus, delete, mkdirs,
rename/copy semantics*, and *native factory/auth/configuration* handling.

The native S3 filesystem from FLIP-555
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem>
looks like a useful reference point for this, since it implements Flink's
FileSystem directly on the AWS SDK.

On Aleksandr's umbrella FLIP for cross-cloud support: we would like to
coordinate with that effort so the GCS work fits naturally into the broader
Hadoop-less cloud filesystem direction.

Ryan, thanks for offering to test an alpha. Production validation from a
large GCS user would be very useful once there is an initial optional
plugin.

One more reference: FLINK-19481
<https://issues.apache.org/jira/browse/FLINK-19481> has been open since
2020 for adding a native GCS FileSystem. This discussion seems like a good
opportunity to revive that work with the newer FLIP-555 direction as a
model.

Thanks,
Poorvank

On Tue, Jun 23, 2026 at 7:14 PM Ryan van Huuksloot via dev <
[email protected]> wrote:

> Hello,
>
> I wanted to jump in and say that I think this is a great effort. We've had
> many issues with Hadoop being a dependency.
>
> Given our other priorities at Shopify, we don't have time to contribute in
> 2026. However, when an alpha release is available, we would be happy to run
> it against our system.
>
> Thanks,
> Ryan van Huuksloot
> Staff Engineer, Infrastructure | Streaming Platform
> [image: Shopify]
> <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>
>
> On Tue, Jun 23, 2026 at 4:46 AM Samrat Deb <[email protected]> wrote:
>
> > Thank you, Aleksandr, for adding more to the proposal.
> > Looking forward to collaborating on this project.
> >
> > Best,
> > Samrat
> >
> > On Mon, Jun 22, 2026 at 10:46 PM Aleksandr Iushmanov <
> [email protected]>
> > wrote:
> >
> > > Hi Samrat,
> > >
> > > Thank you for working on this. I agree that the community would benefit
> > > from introduction of the native filesystem implementation due to
> similar
> > > motivation to the one raised in [1]. I am actively working on an
> > "Umbrella"
> > > FLIP for cross-clouds support and your proposal naturally fills in the
> > gap
> > > for GCS cloud.
> > >
> > > Speaking of pain points related to hadoop connectors, I would like to
> > > mention:
> > > 1. Complexity of CVE management.
> > > 2. Challenges with dependency upgrades including Java version upgrades.
> > > 3. Lack of support for client-side encryption with custom key providers
> > > (especially in cross-cloud manner).
> > >
> > > I am looking forward to collaborating with you on hadoop-less flink
> file
> > > systems support.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
> > >
> > > Kind regards,
> > > Alex
> > >
> > >
> > > On Mon, 22 Jun 2026 at 06:44, Samrat Deb <[email protected]>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Poorvank(cc'ed) & I would like to start a discussion about a
> potential
> > > > improvement for Flink's
> > > > Google Cloud Storage integration to create a native GCS filesystem
> > > > independent of Hadoop. Earlier we were able do for s3 [1]
> > > >
> > > > The entire effort is to move forward to a Hadoop-free Flink
> Filesystem
> > > and
> > > > unlock potential performance benefits for Flink's focus requirements.
> > > >
> > > > The goal of this proposal is to explore whether Flink would benefit
> > from
> > > a
> > > > first-class GCS filesystem implementation built directly on top of
> > Google
> > > > Cloud Storage client libraries rather than relying on the Hadoop
> > > connector.
> > > > If the discussion gains positive traction, the next step would be to
> > > > prepare
> > > > a formal FLIP.
> > > >
> > > > The Current State
> > > > Today, Flink's GCS support is provided through flink-gs-fs-hadoop
> [2],
> > > > which is based on Google's Cloud Storage Hadoop connector [3].
> > > >
> > > > This approach has served Flink well, but it also introduces some
> > > > limitations:
> > > >
> > > >    1.
> > > >
> > > >    Flink's GCS integration depends on the Hadoop filesystem
> abstraction
> > > and
> > > >    the Hadoop-based GCS connector. As a result, upgrades and feature
> > > >    adoption
> > > >    are tied to the evolution of those external components.
> > > >    2.
> > > >
> > > >    The dependency stack is larger than necessary for users who only
> > > require
> > > >    Google Cloud Storage support. In practice, users must bring in
> > > >    Hadoop-based
> > > >    components even though the underlying storage system is an object
> > > store.
> > > >    3.
> > > >
> > > >    Leveraging new capabilities from Google Cloud Storage often
> requires
> > > >    waiting for support to become available through the Hadoop
> connector
> > > >    before
> > > >    Flink can benefit from them.
> > > >
> > > > Proposed Direction
> > > >
> > > > I would like to explore the feasibility of a new filesystem
> > > implementation,
> > > > tentatively named flink-gs-fs-native, built directly on top of Google
> > > Cloud
> > > > Storage client libraries.
> > > >
> > > > The goals would be:
> > > >
> > > >    1.
> > > >
> > > >    Provide a Hadoop-independent implementation of Flink's FileSystem
> > API
> > > > for
> > > >    Google Cloud Storage.
> > > >    2.
> > > >
> > > >    Reduce dependency complexity and make the GCS integration easier
> to
> > > >    maintain and evolve.
> > > >    3.
> > > >
> > > >    Allow Flink to adopt new Google Cloud Storage features and
> > performance
> > > >    improvements directly, without depending on Hadoop abstractions.
> > > >    4.
> > > >
> > > >    Continue supporting Flink features such as checkpointing,
> > savepoints,
> > > >    state backends, and file sinks through a native implementation.
> > > >
> > > > A Possible Migration Path
> > > >
> > > > To ensure a smooth transition, a phased approach could be considered:
> > > >
> > > > Phase 1:
> > > > Introduce the native GCS filesystem as an optional plugin alongside
> the
> > > > existing flink-gs-fs-hadoop connector.
> > > >
> > > > Phase 2:
> > > > Gather community feedback, validate production readiness, and achieve
> > > > feature parity with the existing implementation.
> > > >
> > > > Phase 3:
> > > > If the native implementation proves mature and broadly adopted,
> discuss
> > > > whether the Hadoop-based implementation should remain, be deprecated,
> > or
> > > > continue to coexist.
> > > >
> > > > Questions for the Community
> > > >
> > > >    1.
> > > >
> > > >    What are the biggest pain points users face today with
> > > >    flink-gs-fs-hadoop?
> > > >    2.
> > > >
> > > >    Are there any critical capabilities provided by the Hadoop-based
> GCS
> > > >    connector that would be difficult or undesirable to reimplement?
> > > >    3.
> > > >
> > > >    Would a Hadoop-independent GCS filesystem provide meaningful value
> > for
> > > >    your Flink deployments?
> > > >    4.
> > > >
> > > >    Are there specific GCS features or operational concerns that
> should
> > be
> > > >    considered from the beginning?
> > > >
> > > > Looking forward to hearing the community's thoughts.
> > > >
> > > > Best,
> > > > Samrat
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
> > > >
> > > >
> > > > [2]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-filesystems/flink-gs-fs-hadoop
> > > >
> > > > [3]
> > > >
> > https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs
> > > >
> > >
> >
>

Re: [DISCUSSION] Native GCS Filesystem in Apache Flink

Reply via email to