On Thu, 12 Feb 2026 at 10:39, Romain Manni-Bucau <[email protected]>
wrote:

> Hi all,
>
> Is it intended that S3FileIO doesn't wire [aws
> sdk].ClientOverrideConfiguration.Builder#addMetricPublisher so basically,
> compared to hadoop-aws you can't retrieve metrics from Spark (or any other
> engine) and send them to a collector in a centralized manner?
> Is there another intended way?
>

already a PR up awaiting review by committers
https://github.com/apache/iceberg/pull/15122



>
> For plain hadoop-aws the workaround is to use (by reflection)
> S3AInstrumentation.getMetricsSystem().allSources() and wire it to a spark
> sink.
>

The intended way to do it there is to use the IOStatistics API which not
only lets you at the s3a stats, google cloud collects stuff the same way,
and there's explicit collection of things per thread for stream read and
write....

try setting

fs.iostatistics.logging.level info

to see what gets measured


> To be clear I do care about the byte written/read but more importantly
> about the latency, number of requests, statuses etc. The stats exposed
> through FileSystem in iceberg are < 10 whereas we should get >> 100 stats
> (taking hadoop as a ref).
>

AWS metrics are a very limited sets

software.amazon.awssdk.core.metrics.CoreMetric

The retry count is good here as it measures stuff beneath any application
code. With the rest signer, it'd make sense to also collect signing time,
as the RPC call to the signing endpoint would be included.

-steve

Reply via email to