+1 to 1-3. On 4, what do you mean by test? Assume it’s the default encoding and use that? Is there a versioning concept in the bloom filters that will make it easy to determine if this is pre or post ORC-101?
Alan. > On Sep 7, 2016, at 08:57, Owen O'Malley <[email protected]> wrote: > > All, > Dain Sundstrom pointed out to me in personal email that the ORC bloom > filters are currently using the default character encoding. That makes the > bloom filters non-portable between different computers that use different > default encodings. I've filed ORC-101 to address it, but I want to have a > wider discussion. I'd propose that we: > > 1. create a new WriterVersion for ORC-101. > 2. move the bloom filter code from storage-api into ORC. > 3. consistently use UTF-8 when creating new bloom filters > 4. for ORC files older than ORC-101, test the default encoding instead of > UTF-8 > > Thoughts? > > .. Owen
