rjayapalan commented on issue #9689:
URL: https://github.com/apache/iceberg/issues/9689#issuecomment-1936419501

   @nastra I cannot produce the exact code that I used due to confidential 
information in there. But these are the steps that I used to reproduce
   
   1. Create an iceberg table with  189 columns with the following table 
properties specified during table creation
   ```
   spark.sql("""
       CREATE TABLE hive.dev.test (
         ........
         ........
         ........
       )
       USING iceberg
       TBLPROPERTIES (
         'format-version' = '2',
         'write.metadata.delete-after-commit.enabled' = 'true',
         'write.metadata.previous-versions-max' = '25')
   """
   )
   ```
   2. Load close to 1 million rows using MERGE command from source table to the 
above target table. Repeat the load for atleast 6 times so that 6 
commits/snapshots are produced
   3. Run the following maintenance procedures in the same order
   ```
   from datetime import datetime, timedelta
   
   now_str = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
   olderthan_str = (datetime.utcnow() - timedelta(days=3)).strftime("%Y-%m-%d 
%H:%M:%S")
   
   spark.sql("CALL hive.system.rewrite_data_files(table => 'dev.test', strategy 
=> 'sort' , sort_order => 'usertoken ASC NULLS LAST' )").show(truncate=False)
   spark.sql(f"CALL hive.system.expire_snapshots(table => 'dev.test', 
older_than => TIMESTAMP '{now_str}', retain_last => 2)").show(truncate=False)
   spark.sql("CALL hive.system.rewrite_manifests(table => 'dev.test', 
use_caching => true)").show(truncate=False)
   spark.sql(f"CALL hive.system.remove_orphan_files(table => 'dev.test', 
older_than => TIMESTAMP '{olderthan_str}')").show(truncate=False)
   ```
   4. Now read all the data from the iceberg table and write it out as parquet 
files. This step should succeed without any issues.
   ```
   df = spark.table("hive.dev.test")
   df.write.mode("overwrite").parquet("s3://bucket/tmp/dev_unload/")
   ```
   5. Now add a new column to the above iceberg table
   ```
   spark.sql("""alter table hive.dev.test add column testcolumn string""")
   ```
   6. Repeat `STEP #2` and then `STEP #3`
   7. Now executing `STEP #4` gave me the error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to