[I] fast_forward command not merging branches within AWS Glue [iceberg]

via GitHub Thu, 19 Oct 2023 09:01:31 -0700


lime-squeeze opened a new issue, #8881:
URL: https://github.com/apache/iceberg/issues/8881


   ### Query engine
   
   Spark 3.3 within AWS Glue 4.0 and using 
iceberg-spark-runtime-3.3_2.12-1.4.0.jar
   
   ### Question
   
   I am attempting to test branching via SparkSQL within AWS Glue. I am able to 
successfully create a branch and modify it, but when I attempt to merge the 
branch into main via the fast_forward command nothing happens. The query 
completes successfully without error, but when I query the data on the main 
branch the data is unchanged. Below is what I am executing.
   
   ```
   # DROP
   spark.sql(f"""
   DROP TABLE IF EXISTS iceberg_catalog.db.employee
   """)
   spark.sql(f"""
   DROP TABLE IF EXISTS iceberg_catalog.db.employee_stg
   """)
   
   # CREATE
   spark.sql("""
   CREATE TABLE iceberg_catalog.db.employee (
       employee_id int,
       name string,
       salary double,
       eff_start_dt date,
       eff_end_dt date,
       etl_state string
   )
   USING iceberg
   LOCATION 's3://<bucket/db/tbl_nm>'
   TBLPROPERTIES (
       'format-version'='2',
   'write.format.default'='parquet',
   'write.target-file-size-bytes'='536870912',
   'write.parquet.compression-codec'='snappy',
   'history.expire.max-snapshot-age-ms'='86400000',
   'write.wap.enabled'='true',
   'write.object-storage.enabled'=true,
   'external.table.purge'='true'
   )
   """)
   
   spark.sql("""
   CREATE TABLE iceberg_catalog.db.employee_stg (
       employee_id int,
       name string,
       salary double,
       load_date date
   )
   USING iceberg
   LOCATION 's3://<bucket/db/stg_tbl_nm>'
   TBLPROPERTIES (
       'format-version'='2',
   'write.format.default'='parquet',
   'write.target-file-size-bytes'='536870912',
   'write.parquet.compression-codec'='snappy',
   'history.expire.max-snapshot-age-ms'='86400000',
   'write.wap.enabled'='true',
   'write.object-storage.enabled'=true,
   'external.table.purge'='true'
   )
   """)
   
   
   # INSERT INTITIAL VALUES
   spark.sql("""
   insert into iceberg_catalog.db.employee
   values  (101, 'tom', 9000, date('2022-01-01'), date('9999-12-31'), 
'current'),
           (102, 'sara', 5000, date('2022-02-01'), 
date('9999-12-31'),'current'),
           (103, 'bob', 9000, date('2022-01-01'), date('9999-12-31'),'current'),
           (104, 'mike', 5000, date('2022-11-01'), date('9999-12-31'),'current')
   """)
   
   spark.sql("""
   insert into iceberg_catalog.db.employee_stg
   values  (102, 'sara', 7000, date('2023-09-27')),
           (104, 'mike', 7000, date('2023-09-27'))
   """)
   
   
   # APPLY UPDATES ON BRANCH
   spark.sql("""
   ALTER TABLE iceberg_catalog.db.employee DROP BRANCH IF EXISTS 
`validation-branch`
   """)
   
   spark.sql("""
   ALTER TABLE iceberg_catalog.db.employee
   CREATE BRANCH `validation-branch`
   RETAIN 7 DAYS
   WITH SNAPSHOT RETENTION 2 SNAPSHOTS
   """)
   
   spark.sql("""
   SET spark.wap.branch = 'validation-branch';
   """)
   
   spark.sql(f"""
   MERGE INTO iceberg_catalog.db.employee as T
   USING 
       iceberg_catalog.db.employee_stg as S
   ON T.employee_id = S.employee_id
   WHEN MATCHED and etl_state = 'current'
   THEN UPDATE 
       SET etl_state='history',
           eff_end_dt=S.load_date
   """)
   spark.sql(f"""
   INSERT INTO iceberg_catalog.db.employee
   select  employee_id, name, salary, load_date, date('9999-12-31'), 'current'
   from iceberg_catalog.db.employee_stg
   """)
   
   # MERGE BRANCHES
   spark.sql("""CALL 
iceberg_catalog.system.fast_forward('iceberg_catalog.datransforms.employee_iceberg_demo',
 'main', 'validation-branch')""")
   ```
   
   While still on the validation-branch, querying the data yields the following 
result as expected
   ```
   +-----------+-------+------+------------+----------+---------+
   |employee_id|name   |salary|eff_start_dt|eff_end_dt|etl_state|
   +-----------+-------+------+------------+----------+---------+
   |102        |sara|5000.0|2022-02-01  |2023-09-27|history  |
   |104        |mike|5000.0|2022-11-01  |2023-09-27|history  |
   |101        |tom|9000.0|2022-01-01  |9999-12-31|current  |
   |103        |bob|9000.0|2022-01-01  |9999-12-31|current  |
   |102        |sara|7000.0|2023-09-27  |9999-12-31|current  |
   |104        |mike|7000.0|2023-09-27  |9999-12-31|current  |
   +-----------+-------+------+------------+----------+---------+
   ```
   
   If I switch over to the main branch, what I see is ONLY the initial inserts
   ```
   +-----------+-------+------+------------+----------+---------+
   |employee_id|name   |salary|eff_start_dt|eff_end_dt|etl_state|
   +-----------+-------+------+------------+----------+---------+
   |101        |tom|9000.0|2022-01-01  |9999-12-31|current  |
   |102        |sara|5000.0|2022-02-01  |9999-12-31|current  |
   |103        |bob|9000.0|2022-01-01  |9999-12-31|current  |
   |104        |mike|5000.0|2022-11-01  |9999-12-31|current  |
   +-----------+-------+------+------------+----------+---------+
   ```
   
   Is there anything here that I am missing or doing wrong?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] fast_forward command not merging branches within AWS Glue [iceberg]

Reply via email to