YanivZalach opened a new issue, #15708:
URL: https://github.com/apache/iceberg/issues/15708
## Problem
When a table schema contains a column named `ICEZVALUE`, running a Z-order
rewrite produces a misleading error. When the rewrite is skipped (e.g.
`min-input-files` not met), no error is thrown, making the bug inconsistent
and hard to diagnose.
Error seen when rewrite runs:
```
Cannot write incompatible data for the table (The table name): Cannot find
data for the output column `ICEZVALUE`
```
This gives no indication that `ICEZVALUE` is a reserved internal name.
## Root Cause
`SparkZOrderFileRewriteRunner` add a column named `ICEZVALUE` to store
interleaved Z-order bytes. If a user column with the same name already
exists,
it is silently overwritten with binary data, and the write-back then fails.
## Steps to Reproduce
**Environment:**
- Iceberg: `1.4.2`
- Spark: `3.5.1`
```python
from datetime import datetime
from pyspark.sql import Row, SparkSession
spark = # Open spark session
spark.sql("DROP TABLE IF EXISTS spark_catalog.default.check_table")
spark.sql("""
CREATE TABLE spark_catalog.default.check_table (
time_col timestamp,
col_a bigint,
ICEZVALUE string
)
USING iceberg
PARTITIONED BY (days(time_col))
TBLPROPERTIES ('format-version' = '2')
""")
data = [
Row(time_col=datetime(2024, 1, 1), col_a=1, ICEZVALUE="a"),
Row(time_col=datetime(2024, 1, 2), col_a=2, ICEZVALUE="b"),
Row(time_col=datetime(2024, 1, 3), col_a=3, ICEZVALUE="c"),
]
spark.createDataFrame(data).coalesce(1).writeTo(
"spark_catalog.default.check_table"
).append()
# Pass 1: skipped - no error
spark.sql("""
CALL spark_catalog.system.rewrite_data_files(
table => 'spark_catalog.default.check_table',
strategy => 'sort',
sort_order => 'zorder(col_a)',
options => map('min-input-files', '2')
)
""")
# Pass 2: runs - triggers the bug
spark.sql("""
CALL spark_catalog.system.rewrite_data_files(
table => 'spark_catalog.default.check_table',
strategy => 'sort',
sort_order => 'zorder(col_a)',
options => map('rewrite-all', 'true')
)
""")
```
## Actual Behavior
- Pass 1 (rewrite skipped): no error, silent.
- Pass 2 (rewrite runs): misleading `CANNOT_FIND_DATA` AnalysisException
## Expected Behavior
A clear `IllegalArgumentException` thrown early, explaining that `ICEZVALUE`
is a reserved internal column name used by Iceberg Z-order rewrite.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]