bkietz commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-649154064
Actually, on reflection: I'm not sure it's worthwhile to check the count of unique values at all. In any given batch a virtual column would be materialized with a single-item dictionary so `int8` should always suffice. (Unless we want to always support concatenation of a materialized table's chunks, though even in that case we could promote the index type on concatenation...). Currently it seems preferable to remove `max_partition_dictionary_size` in favor of a boolean flag and always infer `dictionary<indices=int8, values=utf8>`. @jorisvandenbossche ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
