> On Dec 12, 2016, at 4:48 PM, Dain Sundstrom <d...@iq80.com> wrote: > On Dec 12, 2016, at 4:36 PM, Owen O'Malley <omal...@apache.org> wrote: >>> I think this should also be documented in the statistics section which >> also uses UTF-16 BE, which is at least consistent, but still annoying for >> everything other than Java. >> >> Yes, it should be documented and we should replace it with UTF-8. (Although >> changes to the serialized form are always painful.) > > I think we can do something similar to the bloom filter code, where we add a > StringUtf8Stats object and have a transition period where we can produce both.
I was looking at the change proto changes to TimestampStatistics, and I think the same thing could work here. We add: optional string minimumUtf8 = 4; optional string maximumUtf8 = 5; and the update the writer write just the UTF-8 version (or both during a transition). -dain