zhongyujiang opened a new issue, #9130:
URL: https://github.com/apache/iceberg/issues/9130

   ### Apache Iceberg version
   
   1.4.2 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I found that when there are NaN values in the columns, the results returned 
by querying Iceberg and querying Parquet tables are sometimes inconsistent, 
even if the data is the same.
   
   For example, in the following query, there is only one NaN value in the 
result: 
   
![image](https://github.com/apache/iceberg/assets/42907416/efeb0f8a-f7f7-4cc5-9c60-83e07b6d1e55)
   
   The same query on a Parquet table will return two NaN values
   
![image](https://github.com/apache/iceberg/assets/42907416/4a14944a-463a-47cf-a084-18ee685c46f0)
   
   I searched a bit and found that Spark treats NaN larger than any other 
numeric values, and on the other hand, 
[Iceberg](https://iceberg.apache.org/spec/#manifests) doesn't allow NaN value 
exists in lower or upper bounds:
   
   
![image](https://github.com/apache/iceberg/assets/42907416/ffba59ec-de17-4712-b00f-aeffa6638bc5)
   
   So I guess the reason for the above situation is that the first written file 
was filtered out by Iceberg's file filtering during query.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to