Re: [PR] Flink: Emit watermarks from the IcebergSource [iceberg]

via GitHub Fri, 17 Nov 2023 22:30:30 -0800


pvary commented on code in PR #8553:
URL: https://github.com/apache/iceberg/pull/8553#discussion_r1398107792



##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java:
##########
@@ -453,6 +492,18 @@ public IcebergSource<T> build() {
         contextBuilder.project(FlinkSchemaUtil.convert(icebergSchema, 
projectedFlinkSchema));
       }
 
+      SerializableRecordEmitter<T> emitter = 
SerializableRecordEmitter.defaultEmitter();
+      if (watermarkColumn != null) {

Review Comment:
   The focus of the feature is correct watermark generation, and we need to 
make sure that the watermarks are emitted in order, but this does not mean 
automatically that the records need to be emitted in order too. These are two 
different aspects of a data stream.
   
   In case of combined splits, we do not advance the watermark, so it doesn't 
cause issues wrt watermark generation. The user can decide if the record out of 
orderness is a problem them. If they decide so, they can set the configuration, 
but if they have enough memory, to keep the state, they can decide that reading 
speed (combining files to splits) is more important than reading files in order.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: Emit watermarks from the IcebergSource [iceberg]

Reply via email to