openinx commented on a change in pull request #2863:
URL: https://github.com/apache/iceberg/pull/2863#discussion_r704872129
##########
File path:
flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java
##########
@@ -70,11 +73,19 @@ public void write(RowData row) throws IOException {
switch (row.getRowKind()) {
case INSERT:
case UPDATE_AFTER:
+ if (upsert) {
+ writer.delete(row);
+ }
writer.write(row);
break;
- case DELETE:
case UPDATE_BEFORE:
+ if (upsert) {
+ break; // UPDATE_BEFORE is not necessary for UPDATE, we do nothing
to prevent delete one row twice
+ }
Review comment:
It's a good question, @kbendick ! Let's describe the out-of-order in
two dimension:
1. Is possible to produce disordered events in a single iceberg transaction
? First of all, if we want to maintain the correct data semantics between the
source table and iceberg sink table, the records consumed from source table
must be the correct order. Second, the streaming job will need to shuffle
based on the equality fields so that the records with same key are dispatched
to the specialized parallelism task, otherwise the out-of-order issue happen if
different tasks write the records with same equality fields to the iceberg
table. In this way, the order in a single transaction is guaranteed.
2. The out-of-order issue between two continues transaction. In our flink
stream integration, we have guaranteed the [exact commit
order](https://github.com/apache/iceberg/blob/master/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java#L281)
even if a failover happen. For the spark streaming, I think we will need
more consideration to this issue.
Hopefully, I've answered your question, @kbendick :-)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]