dejangvozdenac opened a new issue, #13328:
URL: https://github.com/apache/iceberg/issues/13328
### Apache Iceberg version
1.9.1 (latest release)
### Query engine
Trino
### Please describe the bug 🐞
In Spark, we create a nested struct `address.street`. The outermost field
`address` is optional, but the innermost field `street` is required. When
querying with trino with condition `address.street is null` with projection
pushdown disabled, trino reads the entire file and returns those fields where
address is null (and thus `address.street` is null). However, when using
projection pushdown, Trino delegates the planning decision to Iceberg and it
seems to get no eligible files to read, leading to no rows returned.
I can't find anything in the docs that says what's the right behavior here
(as in, does `address.street is null` mean that `address` exists and
`address.street` is null or that `address.street` is not set in that row in any
way), but agreement between iceberg and Trino is essential.
Here is the spark-sql commands that I used to create the table
```
spark-sql> CREATE TABLE default.dejan_test (
id INT NOT NULL,
name STRING NOT NULL,
age INT NOT NULL,
address STRUCT<street: STRING NOT NULL, address_info: STRUCT<city: STRING
NOT NULL, county: STRING NOT NULL, state: STRING NOT NULL>>)
USING iceberg;
spark-sql> INSERT INTO default.dejan_test (id, name, age, address)
VALUES (
0,
'Jane Doe',
27,
NULL
);
spark-sql> INSERT INTO default.dejan_test (id, name, age, address)
VALUES (
1,
'John Doe',
30,
STRUCT(
'123 Main St',
STRUCT('San Francisco', 'San Francisco County', 'California')
)
);
```
Here are the two different results we get from Trino:
```
trino>
set session iceberg.projection_pushdown_enabled=false;
SET SESSION
trino>
select
id
from
iceberg.default.dejan_test
where
address.street is null;
id
----
0
(1 row)
Query 20250613_033713_00001_xn59q, FINISHED, 1 node
Splits: 2 total, 2 done (100.00%)
2.85 [2 rows, 4.43KiB] [0 rows/s, 1.56KiB/s]
trino>
set session iceberg.projection_pushdown_enabled=true;
SET SESSION
trino>
select
id
from
iceberg.default.dejan_test
where
address.street is null;
id
----
(0 rows)
Query 20250613_034027_00008_xn59q, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.36 [0 rows, 0B] [0 rows/s, 0B/s]
```
Full issue and reproduction steps can be found here:
https://github.com/trinodb/trino/issues/20511#issuecomment-2968932230
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]