pvary commented on issue #9089:
URL: https://github.com/apache/iceberg/issues/9089#issuecomment-1843199768
Flink Writer operators write the data files to their final place - to the
data directory of the table. Then sends the file metadata (file name,
statistics, etc) to the Flink Committer operator.
On snapshot/checkpoint the Flink Committer does a 2 phase commit:
- Phase 1 - snapshotState - creates a single temporary manifest file in the
metadata directory, which stores the metadata for every files received in the
given snapshot. Also stores the manifest file path (this is a bit simplified
for clarity) in the job state.
- Phase 2 - notifyCheckpointComplete - reads the temporary manifest file and
creates an iceberg commit from it. The commit summary for this commit will
contain:
- JobId
- OperatorId
- CheckpointId
When the Flink job restarts, it reads the table commits, and finds the
highest checkpointId for the jobId and operatorId in the history. Also it reads
the state, to checks if we have data which is already snapshotted, but not yet
committed to the table. If we have uncommitted data we commit it at this stage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]