lquerel opened a new issue, #14670:
URL: https://github.com/apache/arrow/issues/14670

   Arrow allows declaring a string dictionary with different indice type (e.g. 
uint8, uint16, ...). Unfortunately, sometimes we don't know in advance the 
exact cardinality of a column, so deciding in advance the proper indice type is 
not always feasible and we have to rely on some kind of adaptive approach. 
   
   I'd like to automatically determine when a dictionary usage overflows 
(during insertion) and then fallback to a larger indice type or directly to a 
string when I've reached the maximum indice size I want for a specific 
application. So far, I haven't found a very effective way to detect this 
overflow. The `AppendString` method doesn't return an error in case of 
overflow. So I have to insert all my data, then check all my dictionaries and 
call the `Offset` method to check if the offset is greater than the maximum 
value for the current indice type. 
   
   Is there a better approach to handling dictionary overflow with Go Arrow SDK?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to