rjayapalan commented on issue #9689: URL: https://github.com/apache/iceberg/issues/9689#issuecomment-1936419501
@nastra I cannot produce the exact code that I used due to confidential information in there. But these are the steps that I used to reproduce 1. Create an iceberg table with 189 columns with the following table properties specified during table creation ``` spark.sql(""" CREATE TABLE hive.dev.test ( ........ ........ ........ ) USING iceberg TBLPROPERTIES ( 'format-version' = '2', 'write.metadata.delete-after-commit.enabled' = 'true', 'write.metadata.previous-versions-max' = '25') """ ) ``` 2. Load close to 1 million rows using MERGE command from source table to the above target table. Repeat the load for atleast 6 times so that 6 commits/snapshots are produced 3. Run the following maintenance procedures in the same order ``` from datetime import datetime, timedelta now_str = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S") olderthan_str = (datetime.utcnow() - timedelta(days=3)).strftime("%Y-%m-%d %H:%M:%S") spark.sql("CALL hive.system.rewrite_data_files(table => 'dev.test', strategy => 'sort' , sort_order => 'usertoken ASC NULLS LAST' )").show(truncate=False) spark.sql(f"CALL hive.system.expire_snapshots(table => 'dev.test', older_than => TIMESTAMP '{now_str}', retain_last => 2)").show(truncate=False) spark.sql("CALL hive.system.rewrite_manifests(table => 'dev.test', use_caching => true)").show(truncate=False) spark.sql(f"CALL hive.system.remove_orphan_files(table => 'dev.test', older_than => TIMESTAMP '{olderthan_str}')").show(truncate=False) ``` 4. Now read all the data from the iceberg table and write it out as parquet files. This step should succeed without any issues. ``` df = spark.table("hive.dev.test") df.write.mode("overwrite").parquet("s3://bucket/tmp/dev_unload/") ``` 5. Now add a new column to the above iceberg table ``` spark.sql("""alter table hive.dev.test add column testcolumn string""") ``` 6. Repeat `STEP #2` and then `STEP #3` 7. Now executing `STEP #4` gave me the error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org