lime-squeeze opened a new issue, #8881:
URL: https://github.com/apache/iceberg/issues/8881
### Query engine
Spark 3.3 within AWS Glue 4.0 and using
iceberg-spark-runtime-3.3_2.12-1.4.0.jar
### Question
I am attempting to test branching via SparkSQL within AWS Glue. I am able to
successfully create a branch and modify it, but when I attempt to merge the
branch into main via the fast_forward command nothing happens. The query
completes successfully without error, but when I query the data on the main
branch the data is unchanged. Below is what I am executing.
```
# DROP
spark.sql(f"""
DROP TABLE IF EXISTS iceberg_catalog.db.employee
""")
spark.sql(f"""
DROP TABLE IF EXISTS iceberg_catalog.db.employee_stg
""")
# CREATE
spark.sql("""
CREATE TABLE iceberg_catalog.db.employee (
employee_id int,
name string,
salary double,
eff_start_dt date,
eff_end_dt date,
etl_state string
)
USING iceberg
LOCATION 's3://<bucket/db/tbl_nm>'
TBLPROPERTIES (
'format-version'='2',
'write.format.default'='parquet',
'write.target-file-size-bytes'='536870912',
'write.parquet.compression-codec'='snappy',
'history.expire.max-snapshot-age-ms'='86400000',
'write.wap.enabled'='true',
'write.object-storage.enabled'=true,
'external.table.purge'='true'
)
""")
spark.sql("""
CREATE TABLE iceberg_catalog.db.employee_stg (
employee_id int,
name string,
salary double,
load_date date
)
USING iceberg
LOCATION 's3://<bucket/db/stg_tbl_nm>'
TBLPROPERTIES (
'format-version'='2',
'write.format.default'='parquet',
'write.target-file-size-bytes'='536870912',
'write.parquet.compression-codec'='snappy',
'history.expire.max-snapshot-age-ms'='86400000',
'write.wap.enabled'='true',
'write.object-storage.enabled'=true,
'external.table.purge'='true'
)
""")
# INSERT INTITIAL VALUES
spark.sql("""
insert into iceberg_catalog.db.employee
values (101, 'tom', 9000, date('2022-01-01'), date('9999-12-31'),
'current'),
(102, 'sara', 5000, date('2022-02-01'),
date('9999-12-31'),'current'),
(103, 'bob', 9000, date('2022-01-01'), date('9999-12-31'),'current'),
(104, 'mike', 5000, date('2022-11-01'), date('9999-12-31'),'current')
""")
spark.sql("""
insert into iceberg_catalog.db.employee_stg
values (102, 'sara', 7000, date('2023-09-27')),
(104, 'mike', 7000, date('2023-09-27'))
""")
# APPLY UPDATES ON BRANCH
spark.sql("""
ALTER TABLE iceberg_catalog.db.employee DROP BRANCH IF EXISTS
`validation-branch`
""")
spark.sql("""
ALTER TABLE iceberg_catalog.db.employee
CREATE BRANCH `validation-branch`
RETAIN 7 DAYS
WITH SNAPSHOT RETENTION 2 SNAPSHOTS
""")
spark.sql("""
SET spark.wap.branch = 'validation-branch';
""")
spark.sql(f"""
MERGE INTO iceberg_catalog.db.employee as T
USING
iceberg_catalog.db.employee_stg as S
ON T.employee_id = S.employee_id
WHEN MATCHED and etl_state = 'current'
THEN UPDATE
SET etl_state='history',
eff_end_dt=S.load_date
""")
spark.sql(f"""
INSERT INTO iceberg_catalog.db.employee
select employee_id, name, salary, load_date, date('9999-12-31'), 'current'
from iceberg_catalog.db.employee_stg
""")
# MERGE BRANCHES
spark.sql("""CALL
iceberg_catalog.system.fast_forward('iceberg_catalog.datransforms.employee_iceberg_demo',
'main', 'validation-branch')""")
```
While still on the validation-branch, querying the data yields the following
result as expected
```
+-----------+-------+------+------------+----------+---------+
|employee_id|name |salary|eff_start_dt|eff_end_dt|etl_state|
+-----------+-------+------+------------+----------+---------+
|102 |sara|5000.0|2022-02-01 |2023-09-27|history |
|104 |mike|5000.0|2022-11-01 |2023-09-27|history |
|101 |tom|9000.0|2022-01-01 |9999-12-31|current |
|103 |bob|9000.0|2022-01-01 |9999-12-31|current |
|102 |sara|7000.0|2023-09-27 |9999-12-31|current |
|104 |mike|7000.0|2023-09-27 |9999-12-31|current |
+-----------+-------+------+------------+----------+---------+
```
If I switch over to the main branch, what I see is ONLY the initial inserts
```
+-----------+-------+------+------------+----------+---------+
|employee_id|name |salary|eff_start_dt|eff_end_dt|etl_state|
+-----------+-------+------+------------+----------+---------+
|101 |tom|9000.0|2022-01-01 |9999-12-31|current |
|102 |sara|5000.0|2022-02-01 |9999-12-31|current |
|103 |bob|9000.0|2022-01-01 |9999-12-31|current |
|104 |mike|5000.0|2022-11-01 |9999-12-31|current |
+-----------+-------+------+------------+----------+---------+
```
Is there anything here that I am missing or doing wrong?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]