[GitHub] [flink] pnowojski commented on a change in pull request #18354: [FLINK-25650][docs] Added "Interplay with long-running record process…

GitBox Tue, 18 Jan 2022 00:31:02 -0800


pnowojski commented on a change in pull request #18354:
URL: https://github.com/apache/flink/pull/18354#discussion_r786513195




##########
File path: docs/content.zh/docs/ops/state/checkpointing_under_backpressure.md
##########
@@ -129,6 +129,21 @@ In-flight 数据后再生成 Watermark **。如果您的 Pipeline 中使用了**
 使用对齐 Checkpoint产生**不同的结果**。如果您的 Operator 依赖于最新的 Watermark 始终可用，解决办法是将 
Watermark 
 存放在 OperatorState 中。在这种情况下，Watermark 应该使用单键 group 存放在 UnionState 以方便扩缩容。
 
+#### Interplay with long-running record processing
+
+Despite that unaligned checkpoints barriers are able to overtake all other 
records in the queue.
+The handling of this barrier still can be delayed if the current record takes 
a lot of time to be processed.
+This situation can occur when firing many timers all at once, for example in 
windowed operations.
+Second problematic scenario might occur when system is being blocked waiting 
for more than one
+network buffer availability when processing a single input record. Flink can 
not interrupt processing of
+a single input record, and unaligned checkpoints have to wait for the 
currently processed record to be
+fully processed. This can cause problems in two scenarios. Either as a result 
of serialisation of a large
+record that doesn't fit into single network buffer or in a flatMap operation, 
that produces many output
+records for one input record. In such scenarios back pressure can block 
unaligned checkpoints until all
+the network buffers required to process the single input record are available.
+It also can happen in any other situation when the processing of the single 
record takes a while(a long record).
+As result, the time of the checkpoint can be higher than expected or it can be 
volatile from time to time.

Review comment:
       ```suggestion
   As result, the time of the checkpoint can be higher than expected or it can 
vary.
   ```

##########
File path: docs/content.zh/docs/ops/state/checkpointing_under_backpressure.md
##########
@@ -129,6 +129,21 @@ In-flight 数据后再生成 Watermark **。如果您的 Pipeline 中使用了**
 使用对齐 Checkpoint产生**不同的结果**。如果您的 Operator 依赖于最新的 Watermark 始终可用，解决办法是将 
Watermark 
 存放在 OperatorState 中。在这种情况下，Watermark 应该使用单键 group 存放在 UnionState 以方便扩缩容。
 
+#### Interplay with long-running record processing
+
+Despite that unaligned checkpoints barriers are able to overtake all other 
records in the queue.
+The handling of this barrier still can be delayed if the current record takes 
a lot of time to be processed.
+This situation can occur when firing many timers all at once, for example in 
windowed operations.
+Second problematic scenario might occur when system is being blocked waiting 
for more than one
+network buffer availability when processing a single input record. Flink can 
not interrupt processing of
+a single input record, and unaligned checkpoints have to wait for the 
currently processed record to be
+fully processed. This can cause problems in two scenarios. Either as a result 
of serialisation of a large
+record that doesn't fit into single network buffer or in a flatMap operation, 
that produces many output
+records for one input record. In such scenarios back pressure can block 
unaligned checkpoints until all
+the network buffers required to process the single input record are available.
+It also can happen in any other situation when the processing of the single 
record takes a while(a long record).

Review comment:
       ```suggestion
   It also can happen in any other situation when the processing of the single 
record takes a while.
   ```
   I think I would drop the "a long record"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] pnowojski commented on a change in pull request #18354: [FLINK-25650][docs] Added "Interplay with long-running record process…

Reply via email to