tustvold opened a new issue #209:
URL: https://github.com/apache/arrow-datafusion/issues/209


   The dictionary support added in #1262 hydrates dictionaries for arrow 
flight. In some situations it is possible to do better than this.
   
   This is somewhat complicated because dictionaries may be shared across 
columns for some record batches, however, the dictionary ID is encoded in the 
schema and must be constant for a given column.
   
   A very basic protocol would assign each column in the schema a unique 
dictionary ID, and before sending each record batch send out a non-differential 
dictionary update containing the dictionary for the column within that record 
batch.
   
   This is potentially wasteful, and will likely want to incorporate heuristics 
for when it is better to hydrate the values and/or re-encode the dictionary, 
but should be easy to implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to