> On Dec 12, 2016, at 4:48 PM, Dain Sundstrom <[email protected]> wrote:
> On Dec 12, 2016, at 4:36 PM, Owen O'Malley <[email protected]> wrote:
>>> I think this should also be documented in the statistics section which
>> also uses UTF-16 BE, which is at least consistent, but still annoying for
>> everything other than Java.
>>
>> Yes, it should be documented and we should replace it with UTF-8. (Although
>> changes to the serialized form are always painful.)
>
> I think we can do something similar to the bloom filter code, where we add a
> StringUtf8Stats object and have a transition period where we can produce both.
I was looking at the change proto changes to TimestampStatistics, and I think
the same thing could work here. We add:
optional string minimumUtf8 = 4;
optional string maximumUtf8 = 5;
and the update the writer write just the UTF-8 version (or both during a
transition).
-dain