Agree, the idea behind was to use these stats - but they could be pulled
from thanos (or alike) too - to say "ok, if we go that path we'll be
throttled", kind of cost estimation but you are right.

Romain Manni-Bucau
@rmannibucau <https://x.com/rmannibucau> | .NET Blog
<https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> | Old
Blog <http://rmannibucau.wordpress.com> | Github
<https://github.com/rmannibucau> | LinkedIn
<https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
Javaccino founder (Java/.NET service - contact via linkedin)


Le ven. 13 févr. 2026 à 19:30, Steve Loughran <[email protected]> a
écrit :

>
>
> On Fri, 13 Feb 2026 at 12:53, Romain Manni-Bucau <[email protected]>
> wrote:
>
>> Hi Steve,
>>
>> Fully agree with you about all the points - and thanks for the details
>> BTW.
>> My main concern - and why I did sent the mail, is "what is provided by
>> default".
>> To make it more concrete what "stops" me to go too far is that it is not
>> built-in so basically I have to redo it myself...if we pull the logic to
>> its extent I can also just redo my full spark integration, see what I mean?
>>
>> I know most vendors did solve it somehow so my question is do we want to
>> integrate it as a standard in iceberg?
>> Should it relate to the REST catalog to have a metrics awareness/stats?
>>
>
> generally different stats though
> catalog: table, files etc
> stuff collected by the engine during a query: engine specific and based on
> the configuration and deployment.
>
> Probably more broadly relevant: end to end telemetry where the metrics go
> to whatever telemetry db  is deployed.
>
> There's also the need for the endpoint signers to log something about
> every request was signed, especially because they'll end up in the
> cloudtrail log as actions by the assumed role, not the principal querying
> the table.
>
>
>> Any will to move in that direction?
>>
>> Romain Manni-Bucau
>> @rmannibucau <https://x.com/rmannibucau> | .NET Blog
>> <https://dotnetbirdie.github.io/> | Blog <https://rmannibucau.github.io/> |
>> Old Blog <http://rmannibucau.wordpress.com> | Github
>> <https://github.com/rmannibucau> | LinkedIn
>> <https://www.linkedin.com/in/rmannibucau> | Book
>> <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
>> Javaccino founder (Java/.NET service - contact via linkedin)
>>
>>
>> Le ven. 13 févr. 2026 à 12:02, Steve Loughran <[email protected]> a
>> écrit :
>>
>>>
>>>
>>> On Thu, 12 Feb 2026 at 20:52, Romain Manni-Bucau <[email protected]>
>>> wrote:
>>>
>>>> Commented inline
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau <https://x.com/rmannibucau> | .NET Blog
>>>> <https://dotnetbirdie.github.io/> | Blog
>>>> <https://rmannibucau.github.io/> | Old Blog
>>>> <http://rmannibucau.wordpress.com> | Github
>>>> <https://github.com/rmannibucau> | LinkedIn
>>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>>> <https://www.packtpub.com/en-us/product/java-ee-8-high-performance-9781788473064>
>>>> Javaccino founder (Java/.NET service - contact via linkedin)
>>>>
>>>>
>>>> Le jeu. 12 févr. 2026 à 21:13, Steve Loughran <[email protected]> a
>>>> écrit :
>>>>
>>>>>
>>>>>
>>>>> you get all thread local stats for a specific thread
>>>>> from IOStatisticsContext.getCurrentIOStatisticsContext().getIOStatistics()
>>>>>
>>>>
>>>> How is it supposed to work, my understanding is that it is basically a
>>>> thread local like impl based on a map - important point being it works in
>>>> the same bound thread - whereas the data is pulled from the sink in a
>>>> scheduled executor thread so I would still need to do my registry/sync it
>>>> with spark metrics system no?
>>>>
>>>>
>>>>>
>>>>> take a snapshot and that and you have something json marshallable or
>>>>> java serializable which aggregates nicely
>>>>>
>>>>> Call  IOStatisticsContext.getCurrentIOStatisticsContext().reset() when
>>>>> your worker thread starts a specific task to ensure you only get the stats
>>>>> for that task (s3a & I think gcs).
>>>>>
>>>>
>>>> Do you mean impl my own S3A or file io? This is the instrumentation I
>>>> tried to avoid since I think it should be built-in, not in apps.
>>>>
>>>
>>> more that spark worker threads need to reset the stats once they pick up
>>> their next piece of work, collect the changes then push up the stats on
>>> task commit, and job commit aggregates these.
>>>
>>> The s3a committers do all this behind the scenes (first into the
>>> intermediate manifest then into the final _SUCCESS file). Now that spark
>>> builds with a version with the API, someone could consider doing it there
>>> and lining up with spark history server. Then whatever fs client, input
>>> stream or any other instrumented component would just add its numbers)
>>>
>>>
>>>>
>>>>>
>>>>> from the fs you getIOStatistics() and you get all the stats of all
>>>>> filesystems and streams after close(). which from a quick look at some s3
>>>>> io to a non-aws store shows a couple of failures, interestingly enough. We
>>>>> collect separate averages for success and failure on every op so you can
>>>>> see the difference.
>>>>>
>>>>> the JMX stats we collect are a very small subset of the statistics,
>>>>> stuff like "bytes drained in close"  and time to wait for an executor in
>>>>> the thread pool (action_executor_acquired) are important as they're
>>>>> generally sign of misconfigurations
>>>>>
>>>>
>>>> Yep, my focus high level is to see if the tuning or tables must be
>>>> tuned so 429, volume, latencies are key there.
>>>>
>>>
>>> If you turn on AWS S3 server logging you will get numbers of 503
>>> throttle events and the paths; 429 is other stores. Bear in mind that the
>>> recipients of the throttle events may not be the only caller triggering
>>> it...things like bulk delete (hello, compaction) can throttle other work
>>> going on against the same shard.
>>>
>>>
>>> Another thing I don't get is why not reusing hadoop-aws in spark? It
>>>> would at least enable to mix datasources more nicely and focus in a single
>>>> location the work (it is already done).
>>>>
>>>>
>>>>
>>>
>>> Well in Cloudera we do. Nothing to stop you.
>>>
>>> I also have a PoC of an s3 signer for Hadoop 3.4.3+ which gets its
>>> credentials from the rest server -simply wraps the existing one but picks
>>> up its binding info from the filesystem Configuration.
>>>
>>> -Steve
>>>
>>>
>>>
>>>

Reply via email to