Re: [DISCUSS] FLIP-576 Filesystem-Plugin Observability (flink-s3-fs-native)

Samrat Deb Wed, 13 May 2026 09:46:11 -0700

Hi Roman,
Thanks for the review.

> 1. Is it possible to expose file size metrics? It might be helpful to
> troubleshoot slow recoveries caused by downloading many small files for
> example

 Yes, this is feasible, and I'll include it. The native S3 FS already has
the raw data at call sites. NativeS3InputStream wraps a GetObjectResponse.
response.contentLength() gives the object size before the first byte is
read.  NativeS3OutputStream already accumulates the byte count before the
final PutObject is issued.

> Is bulkCopyHelper covered by the proposal? I think it would be helpful
> to have requests.size() and total bytes received as metrics

Yes, NativeS3BulkCopyHelper will be in scope, but in the next phase. This
is important for multipart upload and catching zombie files. On a
high-level idea is to expose
1. s3.bulk-copy.files - Histogram of files-per-batch (i.e., requests.size()
per copyFiles invocation). This is the "requests.size()" signal you
mentioned.
2. s3.bulk-copy.bytes - Counter of total bytes transferred. After each
FileDownload.completionFuture() resolves, Files.size(destinationPath) gives
the exact byte count
  without additional S3 API calls.

3. s3.bulk-copy.duration.ms - Histogram of end-to-end copyFiles wall-clock
time.

WDUT about adding these values?

> 3. Ideally, such metrics should be exposed by other file systems; then I'd
> suggest having "s3n" as a label rather than a part of the metric name

Yes, suggestion to add label as s3n as part of metric name is a good idea.
I will update the FLIP and add a section about it.
So MetricGroup.addGroup("filesystem_type", "s3n") creates a key-value
labelled subgroup. In Prometheus reporters, this flattens to
requests{filesystem_type="s3n"}.

I was thinking if the label name can be picked from its defined scheme,
like from FileSystemFactory.getScheme(). Would it be better or not
necessary complexity with little return?

> We use something similar to Approach B internally; I don't think it "Adds
> overhead to the per-record path"
> (because we don't have per-record file operations); but it lacks
> lower-level signals indeed.
> So the recommended approach makes sense to me.

Yes, for the lower-level signal approach, D is better and provides ease way
to add metrics as per requirement.

Bests,
Samrat

On Tue, May 5, 2026 at 4:32 PM Roman Khachatryan <[email protected]> wrote:

> Hi Samrat,
>
> Thanks for the proposal, such a feature would be very helpful!
>
> I have several questions:
> 1. Is it possible to expose file size metrics? It might be helpful to
> troubleshoot slow recoveries caused by downloading many small files for
> example
> 2. Is bulkCopyHelper covered by the proposal? I think it would be helpful
> to have requests.size() and total bytes received as metrics
> 3. Ideally, such metrics should be exposed by other file systems; then I'd
> suggest having "s3n" as a label rather than a part of metric name
>
> As for the "Open questions for community discussion" section, I agree with
> both points:
> - enable the feature by default and
> - don't correlate with checkpoints (it might be more tricky than
> ThreadLocal).
>
> We use something similar to Approach B internally; I don't think it "Adds
> overhead to the per-record path"
> (because we don't have per-record file operations); but it lacks
> lower-level signals indeed.
> So the recommended approach makes sense to me.
>
> Regards,
> Roman
>
>
> On Tue, May 5, 2026 at 11:58 AM Samrat Deb <[email protected]> wrote:
>
> > Hi All,
> >
> > I'd like to open a discussion on FLIP-576: Filesystem-Plugin
> Observability
> > for (flink-s3-fs-native)[1].
> >
> > Apache Flink’s filesystem layer is critical to core operations like
> > checkpoints, savepoints, and state access. Most of which rely heavily on
> > S3. Despite this, the current observability in s3<>flink is offering
> little
> > insight into underlying issues. Engineers lack visibility into key
> failure
> > signals, including S3 throttling, retry behaviour, slow operations, load
> > distribution, multipart upload leaks, and intermittent stream failures.
> As
> > a result, diagnosing production issues often requires manual correlation
> > across logs and external systems, making troubleshooting slow and
> > unreliable. This observability gap significantly impacts the operability
> of
> > Flink in real-world large-scale deployments.
> > This FLIP proposal addresses the same and builds support for native S3
> FS.
> >
> > Looking forward to your feedback.
> >
> > Bests,
> > Samrat
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957173
> >
>

Re: [DISCUSS] FLIP-576 Filesystem-Plugin Observability (flink-s3-fs-native)

Reply via email to