MartijnVisser commented on a change in pull request #18781: URL: https://github.com/apache/flink/pull/18781#discussion_r808836654
########## File path: docs/content/docs/connectors/datastream/pulsar.md ########## @@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep in mind that Flink only and your problem might be independent of Flink and sometimes can be solved by upgrading Pulsar brokers, reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink. +### Source stop reading records for about 10s when data volume is small Review comment: Will you also add this section to the Chinese documentation? ########## File path: docs/content/docs/connectors/datastream/pulsar.md ########## @@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep in mind that Flink only and your problem might be independent of Flink and sometimes can be solved by upgrading Pulsar brokers, reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink. +### Source stop reading records for about 10s when data volume is small Review comment: ```suggestion ### Messages can be delayed on low volume topics ``` ########## File path: docs/content/docs/connectors/datastream/pulsar.md ########## @@ -417,4 +417,16 @@ If you have a problem with Pulsar when using Flink, keep in mind that Flink only and your problem might be independent of Flink and sometimes can be solved by upgrading Pulsar brokers, reconfiguring Pulsar brokers or reconfiguring Pulsar connector in Flink. +### Source stop reading records for about 10s when data volume is small + +When source connector read from a low volume topic, users might observe a 10s interval between +messages. Pulsar source by default buffers messages from pulsar topic before emitting to downstream +operators until either buffered records has reached `PulsarSourceOptions.PULSAR_MAX_FETCH_RECORDS` +or waiting time has reached `PulsarSourceOptions.PULSAR_MAX_FETCH_TIME` (whichever comes first). +When data volumes is small, e.g, 4 records per second, source will wait `PULSAR_MAX_FETCH_TIME` +(default to 10s) before emit the records. Change either of the 2 options if you want to avoid this +behaviour. Review comment: ```suggestion When the Pulsar source connector reads from a low volume topic, users might observe a 10 seconds delay between messages. Pulsar buffers messages from topics by default. Before emitting to downstream operators, the number of buffered records must be equal or larger than `PulsarSourceOptions.PULSAR_MAX_FETCH_RECORDS`. If the data volume is low, it could be that filling up the number of buffered records takes longer than `PULSAR_MAX_FETCH_TIME` (default to 10 seconds). If that's the case, it means that only after this time has passed the messages will be emitted. To avoid this behaviour, you need to change either the buffered records or the waiting time. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org