Re: [I] Physical plan proto roundtrip for null-valued scalar of a `Struct(Dict)` data type [datafusion]

via GitHub Tue, 27 Jan 2026 22:11:58 -0800


kumarUjjawal commented on issue #20011:
URL: https://github.com/apache/datafusion/issues/20011#issuecomment-3809196861


   @Jefffrey PR #14227 intentionally drops dict_id from DataFusion’s protobuf 
schema (Arrow deprecated it and it isn’t stable/meaningful schema metadata). 
   The looked into it further for other ways to resolve this  in 
`ScalarNestedValue` (ScalarValue list/struct/map), where we serialize via Arrow 
IPC: IPC still needs dict IDs, but they’re assigned during schema encoding and 
aren’t carried in our protobuf Schema.
   
   I was thinking we keep proto free of `dict_id` and treat dict IDs as an 
internal IPC detail:
   
     1. when encoding nested scalars, seed DictionaryTracker by encoding the 
schema first, then encode the batch;
     2. when decoding, reconstruct an IPC schema from the protobuf schema 
(round-trip through arrow-ipc) and use `arrow_ipc::reader::read_dictionary` to 
build `dict_by_id` before reading the record batch.
   
   Do you have any thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Physical plan proto roundtrip for null-valued scalar of a `Struct(Dict)` data type [datafusion]

Reply via email to