Re: [DISCUSS] FLIP-576 Filesystem-Plugin Observability (flink-s3-fs-native)

Roman Khachatryan Wed, 13 May 2026 12:42:18 -0700

Sounds great, thank you!

> I was thinking if the label name can be picked from its defined scheme,
> like from FileSystemFactory.getScheme(). Would it be better or not
> necessary complexity with little return?


I think it'd make more sense with Approach B (i.e. some
MonitoringFileSystem
that is a wrapper around the actual file system); but for Approach D I
don't see
much benefit.

Regards,
Roman


On Wed, May 13, 2026 at 6:45 PM Samrat Deb <[email protected]> wrote:

> Hi Roman,
> Thanks for the review.
>
>
> > 1. Is it possible to expose file size metrics? It might be helpful to
> > troubleshoot slow recoveries caused by downloading many small files for
> > example
>
>  Yes, this is feasible, and I'll include it. The native S3 FS already has
> the raw data at call sites. NativeS3InputStream wraps a GetObjectResponse.
> response.contentLength() gives the object size before the first byte is
> read.  NativeS3OutputStream already accumulates the byte count before the
> final PutObject is issued.
>
> > Is bulkCopyHelper covered by the proposal? I think it would be helpful
> > to have requests.size() and total bytes received as metrics
>
> Yes, NativeS3BulkCopyHelper will be in scope, but in the next phase. This
> is important for multipart upload and catching zombie files. On a
> high-level idea is to expose
> 1. s3.bulk-copy.files - Histogram of files-per-batch (i.e., requests.size()
> per copyFiles invocation). This is the "requests.size()" signal you
> mentioned.
> 2. s3.bulk-copy.bytes - Counter of total bytes transferred. After each
> FileDownload.completionFuture() resolves, Files.size(destinationPath) gives
> the exact byte count
>   without additional S3 API calls.
>
>
> 3. s3.bulk-copy.duration.ms - Histogram of end-to-end copyFiles wall-clock
> time.
>
> WDUT about adding these values?
>
> > 3. Ideally, such metrics should be exposed by other file systems; then
> I'd
> > suggest having "s3n" as a label rather than a part of the metric name
>
> Yes, suggestion to add label as s3n as part of metric name is a good idea.
> I will update the FLIP and add a section about it.
> So MetricGroup.addGroup("filesystem_type", "s3n") creates a key-value
> labelled subgroup. In Prometheus reporters, this flattens to
> requests{filesystem_type="s3n"}.
>
> I was thinking if the label name can be picked from its defined scheme,
> like from FileSystemFactory.getScheme(). Would it be better or not
> necessary complexity with little return?
>
> > We use something similar to Approach B internally; I don't think it "Adds
> > overhead to the per-record path"
> > (because we don't have per-record file operations); but it lacks
> > lower-level signals indeed.
> > So the recommended approach makes sense to me.
>
> Yes, for the lower-level signal approach, D is better and provides ease way
> to add metrics as per requirement.
>
> Bests,
> Samrat
>
>
> On Tue, May 5, 2026 at 4:32 PM Roman Khachatryan <[email protected]> wrote:
>
> > Hi Samrat,
> >
> > Thanks for the proposal, such a feature would be very helpful!
> >
> > I have several questions:
> > 1. Is it possible to expose file size metrics? It might be helpful to
> > troubleshoot slow recoveries caused by downloading many small files for
> > example
> > 2. Is bulkCopyHelper covered by the proposal? I think it would be helpful
> > to have requests.size() and total bytes received as metrics
> > 3. Ideally, such metrics should be exposed by other file systems; then
> I'd
> > suggest having "s3n" as a label rather than a part of metric name
> >
> > As for the "Open questions for community discussion" section, I agree
> with
> > both points:
> > - enable the feature by default and
> > - don't correlate with checkpoints (it might be more tricky than
> > ThreadLocal).
> >
> > We use something similar to Approach B internally; I don't think it "Adds
> > overhead to the per-record path"
> > (because we don't have per-record file operations); but it lacks
> > lower-level signals indeed.
> > So the recommended approach makes sense to me.
> >
> > Regards,
> > Roman
> >
> >
> > On Tue, May 5, 2026 at 11:58 AM Samrat Deb <[email protected]>
> wrote:
> >
> > > Hi All,
> > >
> > > I'd like to open a discussion on FLIP-576: Filesystem-Plugin
> > Observability
> > > for (flink-s3-fs-native)[1].
> > >
> > > Apache Flink’s filesystem layer is critical to core operations like
> > > checkpoints, savepoints, and state access. Most of which rely heavily
> on
> > > S3. Despite this, the current observability in s3<>flink is offering
> > little
> > > insight into underlying issues. Engineers lack visibility into key
> > failure
> > > signals, including S3 throttling, retry behaviour, slow operations,
> load
> > > distribution, multipart upload leaks, and intermittent stream failures.
> > As
> > > a result, diagnosing production issues often requires manual
> correlation
> > > across logs and external systems, making troubleshooting slow and
> > > unreliable. This observability gap significantly impacts the
> operability
> > of
> > > Flink in real-world large-scale deployments.
> > > This FLIP proposal addresses the same and builds support for native S3
> > FS.
> > >
> > > Looking forward to your feedback.
> > >
> > > Bests,
> > > Samrat
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957173
> > >
> >
>

Re: [DISCUSS] FLIP-576 Filesystem-Plugin Observability (flink-s3-fs-native)

Reply via email to