bkietz commented on pull request #7536:
URL: https://github.com/apache/arrow/pull/7536#issuecomment-649154064


   Actually, on reflection: I'm not sure it's worthwhile to check the count of 
unique values at all. In any given batch a virtual column would be materialized 
with a single-item dictionary so `int8` should always suffice. (Unless we want 
to always support concatenation of a materialized table's chunks, though even 
in that case we could promote the index type on concatenation...). 
   
   Currently it seems preferable to remove `max_partition_dictionary_size` in 
favor of a boolean flag and always infer `dictionary<indices=int8, 
values=utf8>`.
   
   @jorisvandenbossche ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to