Is it required that the StringStatistics min and max be the actual min and max value for the column? I ask for two reasons, I’d like to be able to “trim” values if the min or max is very large. Also, as a work around of for the UTF-16be sorting problem (bug?), I’d like to trim values at the first surrogate pair, so the value is slightly smaller than the min or larger than the max, and still a valid UTF-8 sequence.
Thoughts? -dain
