joyhaldar commented on issue #14995:
URL: https://github.com/apache/iceberg/issues/14995#issuecomment-3724505484
I think the root cause is in Spark's optimizer, not Iceberg.
From what I understood, Spark's `LikeSimplification` rule normally converts
`LIKE 'prefix%'` to `StartsWith("prefix")`. However, when the pattern contains
any escape character (like `\_`), Spark [skips this optimization
entirely](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L766-L772):
```
if (pattern.contains(escapeChar)) {
// Although there are patterns can be optimized if we handle the escape
first, we just
// skip this rule if pattern contains any escapeChar for simplicity.
None
}
```
Since Spark doesn't convert the `LIKE` to `StartsWith`, it never gets pushed
down to Iceberg for file pruning. If this is correct, Iceberg's code is fine,
it just never receives the filter.
Could you try running the same query with **Trino** or another query engine
against your Iceberg table? If file pruning works there, it could confirm this
may be a Spark specific limitation.
In the meantime, using `startswith()` directly should work:
```
WHERE startswith(file_path, 'warehouse/iceberg/good_facts/')
```
Also, please feel free to ignore this entirely, just sharing my observations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]