ludlows opened a new issue, #6759:
URL: https://github.com/apache/iceberg/issues/6759
### Feature Request / Improvement
### Improvement
### background
we have known that the rewriteDataFiles is suggested to run periodically.
in our production, we would like to run rewriteDataFiles for a iceberg table
once a month using spark sql procedure rewrite_data_files.
for convenience, we add the following sql command in each ETL daily job.
`
call catalog.system.rewrite_data_files(table=>'hive.iceberg_table', where =>
"truncated(load_date, 6) = '$LASTMONTH' and substr('$TODAY', 7,2) = '03'" )
`
for instance, when $TODAY = '20230208', then where condition is always
false. so we expected that rewrite_data_files can exit directly.
in other words, we got exceptions by executing the sql:
`
call catalog.system.rewrite_data_files(table=>'hive.iceberg_table', where
=>" '01'='03' ")
`
It is an AnalysisException in scala code below since the option object
filtered by where condition is empty.
https://github.com/apache/iceberg/blob/32a8ef52ddf20aa2068dfff8f9e73bd5d27ef610/spark/v3.3/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala#L47
### Our Request
so could it be possible make rewrite_data_files exit directly without
exceptions if the where condtion is a deterministic false?
### Query engine
Spark
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]