yyanyy commented on pull request #1747: URL: https://github.com/apache/iceberg/pull/1747#issuecomment-734048396
> > Do you have comment on the case of "this may result in v2 returning more files than v1" when literal is not NaN but the data to be compared have NaN? We might need to accept that to keep behavior of comparing with NaN consistent across different files? > > I don't think this is a v2 problem, it is a bug in how we currently handle NaN right? Thanks for pointing this out! After thinking about this I realized that my original concern probably shouldn't be a problem. My concern was that to make sure v2 could return exactly the same result as v1 when doing NaN comparison would require extra efforts, since the behavior of metrics evaluators now change. However, doing comparison with NaN is actually an invalid operation, and regardless of how each individual engine treats this (e.g. I think Spark consider NaN as Max, as for a column `col` containing NaNs, `where col > 0` always return NaN records) that should be something to be fixed on the engine side. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
