MartijnVisser commented on a change in pull request #18781:
URL: https://github.com/apache/flink/pull/18781#discussion_r808836654



##########
File path: docs/content/docs/connectors/datastream/pulsar.md
##########
@@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep 
in mind that Flink only
 and your problem might be independent of Flink and sometimes can be solved by 
upgrading Pulsar brokers,
 reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink.
 
+### Source stop reading records for about 10s when data volume is small

Review comment:
       Will you also add this section to the Chinese documentation?

##########
File path: docs/content/docs/connectors/datastream/pulsar.md
##########
@@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep 
in mind that Flink only
 and your problem might be independent of Flink and sometimes can be solved by 
upgrading Pulsar brokers,
 reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink.
 
+### Source stop reading records for about 10s when data volume is small

Review comment:
       ```suggestion
   ### Messages can be delayed on low volume topics
   ```

##########
File path: docs/content/docs/connectors/datastream/pulsar.md
##########
@@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep 
in mind that Flink only
 and your problem might be independent of Flink and sometimes can be solved by 
upgrading Pulsar brokers,
 reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink.
 
+### Source stop reading records for about 10s when data volume is small
+
+When source connector read from a low volume topic, users might observe a 10s 
interval between 
+messages. Pulsar source by default buffers messages from pulsar topic before 
emitting to downstream
+operators until either buffered records has reached 
`PulsarSourceOptions.PULSAR_MAX_FETCH_RECORDS` 
+or waiting time has reached `PulsarSourceOptions.PULSAR_MAX_FETCH_TIME` 
(whichever comes first).
+When data volumes is small, e.g, 4 records per second, source will wait  
`PULSAR_MAX_FETCH_TIME` 
+(default to 10s) before emit the records. Change either of the 2 options if 
you want to avoid this 
+behaviour.

Review comment:
       ```suggestion
   When the Pulsar source connector reads from a low volume topic, users might 
observe a 10 seconds delay between messages. Pulsar buffers messages from 
topics by default. Before emitting to downstream
   operators, the number of buffered records must be equal or larger than 
`PulsarSourceOptions.PULSAR_MAX_FETCH_RECORDS`. If the data volume is low, it 
could be that filling up the number of buffered records takes longer than 
`PULSAR_MAX_FETCH_TIME` (default to 10 seconds). If that's the case, it means 
that only after this time has passed the messages will be emitted. 
   
   To avoid this behaviour, you need to change either the buffered records or 
the waiting time. 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to