tustvold commented on PR #9011:
URL: https://github.com/apache/arrow-rs/pull/9011#issuecomment-3666519277

   Unfortunately dictionary encoding is best effort, and writers will fallback 
to different encodings if the dictionary gets too large. The result is you need 
to know if all the pages are dictionary encoded in order to be able to make 
this optimisation - iirc this information is not encoded anywhere but the page 
header itself...
   
   Putting this aside there are likely some challenges around typing with this 
approach.
   
   IMO bloom filters are the recommended way to handle this sort of thing, 
dictionaries are more of an encoding optimisation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to