lquerel commented on issue #506: URL: https://github.com/apache/arrow-rs/issues/506#issuecomment-989216649
Another issue with the existing implementation is the DictionaryKeyOverflowError error that is returned in situations where it is reasonably not expected. For example like in this scenario. * Let's imagine a dictionary column type is: DataType::Dictionary(Box::new(**DataType::UInt8**), Box::new(DataType::Utf8)) * The dictionary represents an enumeration with 10 distincts values. * As currently the dictionary columns are concatenated without deduplication it becomes very easy to overflow the key type. In my example the concatenation of 26 batches (containing 10 rows, each row containing a different value of the enum) will return a DictionaryKeyOverflowError error. This issue makes UInt8 dictionary key unusable in a context where concatenation of batches could take place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
