edwinchoi edited a comment on pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#issuecomment-711506424


   > Also, from what I see, the metadata timestamp is always the same as the 
snapshot timestamp when the metadata is written for a new snapshot.
   
   If you use Spark 3's catalog API, you'll see that the snapshot timestamp and 
the metadata are _not guaranteed_ to have the same time . You can trace the 
call from `SparkCatalog.stageCreateOrReplace`. RTAS applies the changes in a 
transaction, which uses independent calls to `System.currentTimeMillis()` for 
the two timestamps.
   
   Try adding tests to `TestCreateTableAsSelect` that do CTAS/RTAS, and you'll 
see that the timestamps are not the same.
   
   Also, after giving this some more thought, you can't rely on a partial 
ordering between the snapshot and metadata update timestamps. 
`System.currentTimeMillis()` is not monotonic - clock adjustments via NTP can 
cause two consecutive readings to go back in time. The only safe option then is 
to scan the metadata files to find the file where the current-snapshot-id 
matches the target snapshot-id.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to