lquerel commented on issue #506:
URL: https://github.com/apache/arrow-rs/issues/506#issuecomment-989216649


   Another issue with the existing implementation is the 
DictionaryKeyOverflowError error that is returned in situations where it is 
reasonably not expected. For example like in this scenario.
   * Let's imagine a dictionary column type is: 
DataType::Dictionary(Box::new(**DataType::UInt8**), Box::new(DataType::Utf8))
   * The dictionary represents an enumeration with 10 distincts values.
   * As currently the dictionary columns are concatenated without deduplication 
it becomes very easy to overflow the key type. In my example the concatenation 
of 26 batches (containing 10 rows, each row containing a different value of the 
enum) will return a DictionaryKeyOverflowError error.
   
   This issue makes UInt8 dictionary key unusable in a context where 
concatenation of batches could take place. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to