zhongyujiang opened a new issue, #9130: URL: https://github.com/apache/iceberg/issues/9130
### Apache Iceberg version 1.4.2 (latest release) ### Query engine Spark ### Please describe the bug 🐞 I found that when there are NaN values in the columns, the results returned by querying Iceberg and querying Parquet tables are sometimes inconsistent, even if the data is the same. For example, in the following query, there is only one NaN value in the result: ![image](https://github.com/apache/iceberg/assets/42907416/efeb0f8a-f7f7-4cc5-9c60-83e07b6d1e55) The same query on a Parquet table will return two NaN values ![image](https://github.com/apache/iceberg/assets/42907416/4a14944a-463a-47cf-a084-18ee685c46f0) I searched a bit and found that Spark treats NaN larger than any other numeric values, and on the other hand, [Iceberg](https://iceberg.apache.org/spec/#manifests) doesn't allow NaN value exists in lower or upper bounds: ![image](https://github.com/apache/iceberg/assets/42907416/ffba59ec-de17-4712-b00f-aeffa6638bc5) So I guess the reason for the above situation is that the first written file was filtered out by Iceberg's file filtering during query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org