laserninja commented on PR #16110:
URL: https://github.com/apache/iceberg/pull/16110#issuecomment-4363055087
> **Can this happen with current code?**
Yes, but not through a direct `alwaysFalse()` scan filter. When
`table.newScan().filter(Expressions.alwaysFalse())` is used, Iceberg's
manifest-level filtering short-circuits and returns no file tasks, so
`ParquetFilters.convert()` is never called. However, the bug is reachable when:
- A predicate is applied on a column that does **not exist in an older
Parquet file** (schema evolution) — the Iceberg manifest evaluator may include
the file (null bounds → "could match"), but `ConvertFilterToParquet` binds
against the Parquet file's schema, resolves the predicate to
`AlwaysFalse.INSTANCE`, and the bug triggers.
- The lower-level `Parquet.read()` API is called directly with an
`alwaysFalse()` filter.
**What the bug actually does:** Both `AlwaysTrue` and `AlwaysFalse` are
Iceberg-internal `FilterPredicate` placeholders whose `accept()` method throws
`UnsupportedOperationException("AlwaysTrue is a placeholder only")`. So the bug
causes an exception, not silently incorrect results.
**On a `TableScan`-level test:** I can write one using the schema evolution
scenario above, write files without a column, add the column, scan with a
predicate on that column, and verify no exception is thrown. Would you prefer
that scenario, or do you have a specific case in mind? I can also add a
`Parquet.read()` level test that directly demonstrates the exception being
thrown before the fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]