[I] Iceberg input source: adopt Iceberg's native Arrow reader stack for forward-compatibility with Iceberg spec evolution and performance improvements (druid)

via GitHub Thu, 21 May 2026 09:34:42 -0700


Shekharrajak opened a new issue, #19498:
URL: https://github.com/apache/druid/issues/19498


   ### Description
   
   Replace (initial both readers will be available) Druid's hand-written 
Iceberg reader path with an opt-in path that delegates reading, delete 
application, and type handling to Iceberg's official iceberg-arrow library. 
This stops shipping a custom Druid-side fork of Iceberg's reader semantics and 
lets Druid automatically inherit every Iceberg spec evolution (V2 deletes → V3 
deletion vectors / row lineage → V4 and beyond), reader optimisation (pushdown, 
statistics, vectorisation), and format coverage (Parquet/ORC/Avro and future 
formats) the moment we bump the Iceberg dependency.
   
   #### Current
   
   * IcebergNativeRecordReader is a Druid-maintained reader 
   * Every Iceberg spec improvement (new delete encodings, partition 
statistics, manifest changes, deletion vectors in V3, row lineage, etc) 
requires bespoke Druid implementation work .
   
   #### After changes
   
   * New IcebergArrowReader activated by useArrowReader: true in the input 
spec; defaults to false initially.
   * Druid converts the resulting Arrow VectorSchemaRoot batches into 
MapBasedInputRow via one small adapter; InputRow remains the firewall and 
nothing else in Druid sees Arrow.
   * Iceberg dependency bumps automatically deliver new spec features and 
optimisations to Druid users with no Druid code change.
   
   ### Motivation
   
   This will be first step (foundation step) towards arrow integration 
https://github.com/apache/druid/issues/19456 and seeing druid + arrow working. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Iceberg input source: adopt Iceberg's native Arrow reader stack for forward-compatibility with Iceberg spec evolution and performance improvements (druid)

Reply via email to