+1 to 1-3.  On 4, what do you mean by test?  Assume it’s the default encoding 
and use that?  Is there a versioning concept in the bloom filters that will 
make it easy to determine if this is pre or post ORC-101?

Alan.

> On Sep 7, 2016, at 08:57, Owen O'Malley <[email protected]> wrote:
> 
> All,
>   Dain Sundstrom pointed out to me in personal email that the ORC bloom
> filters are currently using the default character encoding. That makes the
> bloom filters non-portable between different computers that use different
> default encodings. I've filed ORC-101 to address it, but I want to have a
> wider discussion. I'd propose that we:
> 
> 1. create a new WriterVersion for ORC-101.
> 2. move the bloom filter code from storage-api into ORC.
> 3. consistently use UTF-8 when creating new bloom filters
> 4. for ORC files older than ORC-101, test the default encoding instead of
> UTF-8
> 
> Thoughts?
> 
> .. Owen

Reply via email to