Commented inline

Romain Manni-Bucau
@rmannibucau <https://x.com/rmannibucau> | .NET Blog
<https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | Old
Blog <http://rmannibucau.wordpress.com> | Github
<https://github.com/rmannibucau> | LinkedIn
<https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
Javaccino founder (Java/.NET service - contact via linkedin)


Le jeu. 12 févr. 2026 à 21:13, Steve Loughran <[email protected]> a
écrit :

>
>
> you get all thread local stats for a specific thread
> from IOStatisticsContext.getCurrentIOStatisticsContext().getIOStatistics()
>

How is it supposed to work, my understanding is that it is basically a
thread local like impl based on a map - important point being it works in
the same bound thread - whereas the data is pulled from the sink in a
scheduled executor thread so I would still need to do my registry/sync it
with spark metrics system no?


>
> take a snapshot and that and you have something json marshallable or java
> serializable which aggregates nicely
>
> Call  IOStatisticsContext.getCurrentIOStatisticsContext().reset() when
> your worker thread starts a specific task to ensure you only get the stats
> for that task (s3a & I think gcs).
>

Do you mean impl my own S3A or file io? This is the instrumentation I tried
to avoid since I think it should be built-in, not in apps.


>
> from the fs you getIOStatistics() and you get all the stats of all
> filesystems and streams after close(). which from a quick look at some s3
> io to a non-aws store shows a couple of failures, interestingly enough. We
> collect separate averages for success and failure on every op so you can
> see the difference.
>
> the JMX stats we collect are a very small subset of the statistics, stuff
> like "bytes drained in close"  and time to wait for an executor in the
> thread pool (action_executor_acquired) are important as they're generally
> sign of misconfigurations
>

Yep, my focus high level is to see if the tuning or tables must be tuned so
429, volume, latencies are key there.


>
>
> 2026-02-12 20:05:24,587 [main] INFO  statistics.IOStatisticsLogging
> (IOStatisticsLogging.java:logIOStatisticsAtLevel(269)) - IOStatistics:
> counters=((action_file_opened=1)
> (action_http_get_request=1)
> (action_http_head_request=26)
> (audit_request_execution=70)
> (audit_span_creation=22)
> (directories_created=4)
> (directories_deleted=2)
> (files_copied=2)
> (files_copied_bytes=14)
> (files_created=1)
> (files_deleted=4)
> (filesystem_close=1)
> (filesystem_initialization=1)
> (object_bulk_delete_request=1)
> (object_copy_requests=2)
> (object_delete_objects=6)
> (object_delete_request=4)
> (object_list_request=31)
> (object_metadata_request=26)
> (object_put_bytes=7)
> (object_put_request=5)
> (object_put_request_completed=5)
> (op_create=1)
> (op_createfile=2)
> (op_createfile.failures=1)
> (op_delete=3)
> (op_get_file_status=7)
> (op_get_file_status.failures=4)
> (op_hflush=1)
> (op_hsync=1)
> (op_list_files=2)
> (op_list_files.failures=1)
> (op_list_status=2)
> (op_list_status.failures=1)
> (op_mkdirs=2)
> (op_open=1)
> (op_rename=2)
> (store_client_creation=1)
> (store_io_request=70)
> (stream_read_bytes=7)
> (stream_read_close_operations=1)
> (stream_read_closed=1)
> (stream_read_opened=1)
> (stream_read_operations=1)
> (stream_read_remote_stream_drain=1)
> (stream_read_seek_policy_changed=1)
> (stream_read_total_bytes=7)
> (stream_write_block_uploads=2)
> (stream_write_bytes=7)
> (stream_write_total_data=14)
> (stream_write_total_time=290));
>
> gauges=();
>
> minimums=((action_executor_acquired.min=0)
> (action_file_opened.min=136)
> (action_http_get_request.min=140)
> (action_http_head_request.min=107)
> (filesystem_close.min=13)
> (filesystem_initialization.min=808)
> (object_bulk_delete_request.min=257)
> (object_delete_request.min=117)
> (object_list_request.min=113)
> (object_put_request.min=121)
> (op_create.min=148)
> (op_createfile.failures.min=111)
> (op_delete.min=117)
> (op_get_file_status.failures.min=226)
> (op_get_file_status.min=1)
> (op_list_files.failures.min=391)
> (op_list_files.min=138)
> (op_list_status.failures.min=458)
> (op_list_status.min=1056)
> (op_mkdirs.min=709)
> (op_rename.min=1205)
> (store_client_creation.min=718)
> (store_io_rate_limited_duration.min=0)
> (stream_read_remote_stream_drain.min=1));
>
> maximums=((action_executor_acquired.max=0)
> (action_file_opened.max=136)
> (action_http_get_request.max=140)
> (action_http_head_request.max=270)
> (filesystem_close.max=13)
> (filesystem_initialization.max=808)
> (object_bulk_delete_request.max=257)
> (object_delete_request.max=149)
> (object_list_request.max=1027)
> (object_put_request.max=289)
> (op_create.max=148)
> (op_createfile.failures.max=111)
> (op_delete.max=273)
> (op_get_file_status.failures.max=262)
> (op_get_file_status.max=254)
> (op_list_files.failures.max=391)
> (op_list_files.max=138)
> (op_list_status.failures.max=458)
> (op_list_status.max=1056)
> (op_mkdirs.max=2094)
> (op_rename.max=1523)
> (store_client_creation.max=718)
> (store_io_rate_limited_duration.max=0)
> (stream_read_remote_stream_drain.max=1));
>
> means=((action_executor_acquired.mean=(samples=1, sum=0, mean=0.0000))
> (action_file_opened.mean=(samples=1, sum=136, mean=136.0000))
> (action_http_get_request.mean=(samples=1, sum=140, mean=140.0000))
> (action_http_head_request.mean=(samples=26, sum=3543, mean=136.2692))
> (filesystem_close.mean=(samples=1, sum=13, mean=13.0000))
> (filesystem_initialization.mean=(samples=1, sum=808, mean=808.0000))
> (object_bulk_delete_request.mean=(samples=1, sum=257, mean=257.0000))
> (object_delete_request.mean=(samples=4, sum=525, mean=131.2500))
> (object_list_request.mean=(samples=31, sum=5651, mean=182.2903))
> (object_put_request.mean=(samples=5, sum=1066, mean=213.2000))
> (op_create.mean=(samples=1, sum=148, mean=148.0000))
> (op_createfile.failures.mean=(samples=1, sum=111, mean=111.0000))
> (op_delete.mean=(samples=3, sum=523, mean=174.3333))
> (op_get_file_status.failures.mean=(samples=4, sum=992, mean=248.0000))
> (op_get_file_status.mean=(samples=3, sum=365, mean=121.6667))
> (op_list_files.failures.mean=(samples=1, sum=391, mean=391.0000))
> (op_list_files.mean=(samples=1, sum=138, mean=138.0000))
> (op_list_status.failures.mean=(samples=1, sum=458, mean=458.0000))
> (op_list_status.mean=(samples=1, sum=1056, mean=1056.0000))
> (op_mkdirs.mean=(samples=2, sum=2803, mean=1401.5000))
> (op_rename.mean=(samples=2, sum=2728, mean=1364.0000))
> (store_client_creation.mean=(samples=1, sum=718, mean=718.0000))
> (store_io_rate_limited_duration.mean=(samples=5, sum=0, mean=0.0000))
> (stream_read_remote_stream_drain.mean=(samples=1, sum=1, mean=1.0000)));
>
> Anyway, no, S3FileIO doesn't have any of that. Keeps the code simple,
> which is in its favour.
>

Hmm, kind of simple but not prod friendly vs more complex but usable in
prod in my land to be honest.
Does it mean it will not be enhanced?
Another thing I don't get is why not reusing hadoop-aws in spark? It would
at least enable to mix datasources more nicely and focus in a single
location the work (it is already done).

Happy to help next week if you think it is generally interesting if there
is a consensus on "how".


>
>
> On Thu, 12 Feb 2026 at 18:40, Romain Manni-Bucau <[email protected]>
> wrote:
>
>> hmm, I'm not sure what you do propose to link it to spark sinks but
>> S3AInstrumentation.getMetricSystem().allSources for hadoop-aws and
>> MetricsPublisher for iceberg are the "least worse" solution I came with.
>> Clearly dirty but more efficient than reinstrumenting the whole stack
>> everywhere (pull vs push mode).
>>
>> Do you mean I should wrap everything to read the thread local every time
>> and maintain the registry in spark metricssystem?
>>
>> Another way to see it is to open JMX when using hadoop-aws, these are the
>> graphs I want to get into grafana at some point.
>>
>> Romain Manni-Bucau
>> @rmannibucau <https://x.com/rmannibucau> | .NET Blog
>> <https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> |
>> Old Blog <http://rmannibucau.wordpress.com> | Github
>> <https://github.com/rmannibucau> | LinkedIn
>> <https://www.linkedin.com/in/rmannibucau> | Book
>> <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
>> Javaccino founder (Java/.NET service - contact via linkedin)
>>
>>
>> Le jeu. 12 févr. 2026 à 19:19, Steve Loughran <[email protected]> a
>> écrit :
>>
>>>
>>> ok, stream level.
>>>
>>> No, it's not the same.
>>>
>>> For those s3a input stream stats, you don't need to go into the s3a
>>> internals
>>> 1. every source of IOStats implements InputStreamStatistics, which is
>>> hadoop-common code
>>> 2. in close() s3a input streams update thread level IOStatisticsContext (
>>> https://issues.apache.org/jira/browse/HADOOP-17461 ... some
>>> stabilisation so use with Hadoop 3.4.0/Spark 4.0+)
>>>
>>> The thread stuff is so streams opened and closed in, say, the parquet
>>> reader, update stats just for that worker thread even though you never get
>>> near the stream instance itself.
>>>
>>> Regarding iceberg fileio stats, well, maybe someone could add it to the
>>> classes. Spark 4+ could think about collecting the stats for each task and
>>> aggregating, as that was the original goal. You get that aggregation
>>> indirectly on s3a with the s3a committers, similar through abfs, but really
>>> spark should just collect and report it itself.
>>>
>>>
>>> On Thu, 12 Feb 2026 at 17:03, Romain Manni-Bucau <[email protected]>
>>> wrote:
>>>
>>>> Hi Steve,
>>>>
>>>> Do you reference org.apache.iceberg.io.FileIOMetricsContext and
>>>> org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData? It misses most
>>>> of what I'm looking for (429 to cite a single one).
>>>> software.amazon.awssdk.metrics helps a bit but is not sink friendly.
>>>> Compared to hadoop-aws usage combining iceberg native and aws s3 client
>>>> ones kind of compensate the lack but what I would love to see
>>>> is org.apache.hadoop.fs.s3a.S3AInstrumentation and more particularly
>>>> org.apache.hadoop.fs.s3a.S3AInstrumentation.InputStreamStatistics#InputStreamStatistics
>>>> (I'm mainly reading for my use cases).
>>>>
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://x.com/rmannibucau> | .NET Blog
>>>> <https://dotnetbirdie.github.io/> | Blog
>>>> <https://rmannibucau.github.io/> | Old Blog
>>>> <http://rmannibucau.wordpress.com> | Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>> <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
>>>> Javaccino founder (Java/.NET service - contact via linkedin)
>>>>
>>>>
>>>> Le jeu. 12 févr. 2026 à 15:50, Steve Loughran <[email protected]> a
>>>> écrit :
>>>>
>>>>>
>>>>>
>>>>> On Thu, 12 Feb 2026 at 10:39, Romain Manni-Bucau <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Is it intended that S3FileIO doesn't wire [aws
>>>>>> sdk].ClientOverrideConfiguration.Builder#addMetricPublisher so basically,
>>>>>> compared to hadoop-aws you can't retrieve metrics from Spark (or any 
>>>>>> other
>>>>>> engine) and send them to a collector in a centralized manner?
>>>>>> Is there another intended way?
>>>>>>
>>>>>
>>>>> already a PR up awaiting review by committers
>>>>> https://github.com/apache/iceberg/pull/15122
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> For plain hadoop-aws the workaround is to use (by reflection)
>>>>>> S3AInstrumentation.getMetricsSystem().allSources() and wire it to a
>>>>>> spark sink.
>>>>>>
>>>>>
>>>>> The intended way to do it there is to use the IOStatistics API which
>>>>> not only lets you at the s3a stats, google cloud collects stuff the same
>>>>> way, and there's explicit collection of things per thread for stream read
>>>>> and write....
>>>>>
>>>>> try setting
>>>>>
>>>>> fs.iostatistics.logging.level info
>>>>>
>>>>> to see what gets measured
>>>>>
>>>>>
>>>>>> To be clear I do care about the byte written/read but more
>>>>>> importantly about the latency, number of requests, statuses etc. The 
>>>>>> stats
>>>>>> exposed through FileSystem in iceberg are < 10 whereas we should get >> 
>>>>>> 100
>>>>>> stats (taking hadoop as a ref).
>>>>>>
>>>>>
>>>>> AWS metrics are a very limited sets
>>>>>
>>>>> software.amazon.awssdk.core.metrics.CoreMetric
>>>>>
>>>>> The retry count is good here as it measures stuff beneath any
>>>>> application code. With the rest signer, it'd make sense to also collect
>>>>> signing time, as the RPC call to the signing endpoint would be included.
>>>>>
>>>>> -steve
>>>>>
>>>>

Reply via email to