Hi, Tim, I added your suggestion to introduce a new ColumnOrder to PARQUET-1222 <https://issues.apache.org/jira/browse/PARQUET-1222> as the preferred solution.
Alex, not writing min/max if there is a NaN is indeed a feasible quick-fix, but I think it would be better to just ignore NaN-s for the pruposes of min/max stats. For reading, we can ignore stats that contain a NaN. We also shouldn't use stats when looking for a NaN. -0 and +0 will still be problematic, though. Jim, fmax is indeed very close to IEEE-754's maxNum, but -0 and +0 are implementation-dependent, az Zoltan Borok-Nagy pointed it out to me: "This function is not required to be sensitive to the sign of zero, although some implementations additionally enforce that if one argument is +0 and the other is -0, then +0 is returned." [1 <http://en.cppreference.com/w/c/numeric/math/fmax>] Br, Zoltan On Fri, Feb 16, 2018 at 6:57 PM Jim Apple <jbap...@cloudera.com> wrote: > On Fri, Feb 16, 2018 at 9:44 AM, Zoltan Borok-Nagy > <borokna...@cloudera.com> wrote: > > I would just like to mention that the fmax() / fmin() functions in C/C++ > > Math library follow the aforementioned IEEE 754-2008 min and max > > specification: > > http://en.cppreference.com/w/c/numeric/math/fmax > > > > I think this behavior is also the most intuitive and useful regarding to > > statistics. If we want to select the max value, I think it's reasonable > to > > ignore nulls and not-numbers. > > It should be noted that this is different than the total ordering > predicate. With that predicate, -NaN < -inf < negative numbers < -0.0 > < +0.0 < positive numbers < +inf < +NaN > > fmax appears to be closest to IEEE-754's maxNum, but not quite > matching for some corner cases (-0.0, signalling NaN), but I'm not > 100% sure on that. >