Hi,

Tim, I added your suggestion to introduce a new ColumnOrder to PARQUET-1222
<https://issues.apache.org/jira/browse/PARQUET-1222> as the preferred
solution.

Alex, not writing min/max if there is a NaN is indeed a feasible quick-fix,
but I think it would be better to just ignore NaN-s for the pruposes of
min/max stats. For reading, we can ignore stats that contain a NaN. We also
shouldn't use stats when looking for a NaN. -0 and +0 will still be
problematic, though.

Jim, fmax is indeed very close to IEEE-754's maxNum, but -0 and +0 are
implementation-dependent, az Zoltan Borok-Nagy pointed it out to me: "This
function is not required to be sensitive to the sign of zero, although some
implementations additionally enforce that if one argument is +0 and the
other is -0, then +0 is returned." [1
<http://en.cppreference.com/w/c/numeric/math/fmax>]

Br,

Zoltan



On Fri, Feb 16, 2018 at 6:57 PM Jim Apple <jbap...@cloudera.com> wrote:

> On Fri, Feb 16, 2018 at 9:44 AM, Zoltan Borok-Nagy
> <borokna...@cloudera.com> wrote:
> > I would just like to mention that the fmax() / fmin() functions in C/C++
> > Math library follow the aforementioned IEEE 754-2008 min and max
> > specification:
> > http://en.cppreference.com/w/c/numeric/math/fmax
> >
> > I think this behavior is also the most intuitive and useful regarding to
> > statistics. If we want to select the max value, I think it's reasonable
> to
> > ignore nulls and not-numbers.
>
> It should be noted that this is different than the total ordering
> predicate. With that predicate, -NaN < -inf < negative numbers < -0.0
> < +0.0 < positive numbers < +inf < +NaN
>
> fmax appears to be closest to IEEE-754's maxNum, but not quite
> matching for some corner cases (-0.0, signalling NaN), but I'm not
> 100% sure on that.
>

Reply via email to