lquerel opened a new issue, #14670: URL: https://github.com/apache/arrow/issues/14670
Arrow allows declaring a string dictionary with different indice type (e.g. uint8, uint16, ...). Unfortunately, sometimes we don't know in advance the exact cardinality of a column, so deciding in advance the proper indice type is not always feasible and we have to rely on some kind of adaptive approach. I'd like to automatically determine when a dictionary usage overflows (during insertion) and then fallback to a larger indice type or directly to a string when I've reached the maximum indice size I want for a specific application. So far, I haven't found a very effective way to detect this overflow. The `AppendString` method doesn't return an error in case of overflow. So I have to insert all my data, then check all my dictionaries and call the `Offset` method to check if the offset is greater than the maximum value for the current indice type. Is there a better approach to handling dictionary overflow with Go Arrow SDK? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org