Hi Steve,

Do you reference org.apache.iceberg.io.FileIOMetricsContext and
org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData? It misses most
of what I'm looking for (429 to cite a single one).
software.amazon.awssdk.metrics helps a bit but is not sink friendly.
Compared to hadoop-aws usage combining iceberg native and aws s3 client
ones kind of compensate the lack but what I would love to see
is org.apache.hadoop.fs.s3a.S3AInstrumentation and more particularly
org.apache.hadoop.fs.s3a.S3AInstrumentation.InputStreamStatistics#InputStreamStatistics
(I'm mainly reading for my use cases).


Romain Manni-Bucau
@rmannibucau <https://x.com/rmannibucau> | .NET Blog
<https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | Old
Blog <http://rmannibucau.wordpress.com> | Github
<https://github.com/rmannibucau> | LinkedIn
<https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
Javaccino founder (Java/.NET service - contact via linkedin)


Le jeu. 12 févr. 2026 à 15:50, Steve Loughran <[email protected]> a
écrit :

>
>
> On Thu, 12 Feb 2026 at 10:39, Romain Manni-Bucau <[email protected]>
> wrote:
>
>> Hi all,
>>
>> Is it intended that S3FileIO doesn't wire [aws
>> sdk].ClientOverrideConfiguration.Builder#addMetricPublisher so basically,
>> compared to hadoop-aws you can't retrieve metrics from Spark (or any other
>> engine) and send them to a collector in a centralized manner?
>> Is there another intended way?
>>
>
> already a PR up awaiting review by committers
> https://github.com/apache/iceberg/pull/15122
>
>
>
>>
>> For plain hadoop-aws the workaround is to use (by reflection)
>> S3AInstrumentation.getMetricsSystem().allSources() and wire it to a
>> spark sink.
>>
>
> The intended way to do it there is to use the IOStatistics API which not
> only lets you at the s3a stats, google cloud collects stuff the same way,
> and there's explicit collection of things per thread for stream read and
> write....
>
> try setting
>
> fs.iostatistics.logging.level info
>
> to see what gets measured
>
>
>> To be clear I do care about the byte written/read but more importantly
>> about the latency, number of requests, statuses etc. The stats exposed
>> through FileSystem in iceberg are < 10 whereas we should get >> 100 stats
>> (taking hadoop as a ref).
>>
>
> AWS metrics are a very limited sets
>
> software.amazon.awssdk.core.metrics.CoreMetric
>
> The retry count is good here as it measures stuff beneath any application
> code. With the rest signer, it'd make sense to also collect signing time,
> as the RPC call to the signing endpoint would be included.
>
> -steve
>

Reply via email to