yyanyy commented on pull request #1747:
URL: https://github.com/apache/iceberg/pull/1747#issuecomment-734048396


   > > Do you have comment on the case of "this may result in v2 returning more 
files than v1" when literal is not NaN but the data to be compared have NaN? We 
might need to accept that to keep behavior of comparing with NaN consistent 
across different files?
   > 
   > I don't think this is a v2 problem, it is a bug in how we currently handle 
NaN right?
   
   Thanks for pointing this out! After thinking about this I realized that my 
original concern probably shouldn't be a problem. My concern was that to make 
sure v2 could return exactly the same result as v1 when doing NaN comparison 
would require extra efforts, since the behavior of metrics evaluators now 
change. However, doing comparison with NaN is actually an invalid operation, 
and regardless of how each individual engine treats this (e.g. I think Spark 
consider NaN as Max, as for a column `col` containing NaNs, `where col > 0` 
always return NaN records) that should be something to be fixed on the engine 
side. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to