openinx commented on a change in pull request #2863:
URL: https://github.com/apache/iceberg/pull/2863#discussion_r704872129



##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java
##########
@@ -70,11 +73,19 @@ public void write(RowData row) throws IOException {
     switch (row.getRowKind()) {
       case INSERT:
       case UPDATE_AFTER:
+        if (upsert) {
+          writer.delete(row);
+        }
         writer.write(row);
         break;
 
-      case DELETE:
       case UPDATE_BEFORE:
+        if (upsert) {
+          break;  // UPDATE_BEFORE is not necessary for UPDATE, we do nothing 
to prevent delete one row twice
+        }

Review comment:
       It's a good question, @kbendick  ! Let's describe the out-of-order in 
two dimension: 
   
   1.  Is possible to produce disordered events in a single iceberg transaction 
?  First of all,  if we want to maintain the correct data semantics between the 
source table and iceberg sink table, the records consumed from source table 
must be the correct order.  Second, the streaming job will need to shuffle 
based on the equality fields so that the records with same key are dispatched 
to the specialized parallelism task, otherwise the out-of-order issue happen if 
different tasks write the records with same equality fields to the iceberg 
table.  In this way, the order in a single transaction is guaranteed.
   
   2. The out-of-order issue between two continues transaction. In our flink 
stream integration,  we have guaranteed the [exact commit 
order](https://github.com/apache/iceberg/blob/master/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java#L281)
 even if a failover happen.   For the spark streaming,  I think we will need 
more consideration to this issue. 
   
   Hopefully,  I've answered your question, @kbendick  :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to