edwinchoi commented on pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#issuecomment-702272183


   Thanks for elaborating. My thoughts...
   
   > For the write-audit-publish (WAP) pattern, there is an option to only 
stage a commit and not update the table's current-snapshot-id. In this case, 
the writer updates the table by creating a new snapshot. Then an auditor reads 
the snapshot and validates it (with row counts, for example), and if the 
snapshot looks good, commits the snapshot as the current table state. This 
allows reports to be validated before going live.
   
   I don't think WAP is working as expected under RTAS.
   
   ```scala
   spark.sql("""
   CREATE TABLE test.ns.tbl
   USING iceberg
   TBLPROPERTIES ('write.wap.enabled'='true')
   AS SELECT * FROM VALUES (1, "Alice"), (2, "Bob") AS (id, fname)
   """)
   spark.conf.set("spark.wap.id", "12345")
   spark.sql("""
   CREATE OR REPLACE TABLE test.ns.tbl
   USING iceberg
   AS SELECT * FROM VALUES (1, 5, "alice"), (2, 3, "bob") AS (id, name_len, 
name)
   """)
   spark.conf.unset("spark.wap.id")
   ```
   
   After running this, the schema from the staged change is showing up but the 
data that _should_ exist isn't accessible, i.e., `DESC test.ns.tbl` shows the 
new schema and `SELECT * FROM test.ns.tbl` is coming up empty. Even under a 
simple schema change `ALTER TABLE ... ADD COLUMN ...`, the change is taking 
effect prematurely.
   
   > Also, the table state can be rolled back to a previous snapshot and new 
commits will form a new history afterwards.
   
   I don't believe this changes the notion of what was current at some point in 
time. If you view the timeline as being from the database's point of view, then 
rolling back doesn't change the fact that at _some point in time_, a snapshot, 
that is now inaccessible, was visible in the database.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to