liangyepianzhou commented on code in PR #875: URL: https://github.com/apache/pulsar-site/pull/875#discussion_r1554963624
########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). +![img.png](../static/img/blog-consume-best-practice/subscription-types.png) + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. +![img_1.png](../static/img/blog-consume-best-practice/acknowledgement-types.png) + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. Review Comment: You're correct, stream processing scenarios indeed often care about the order of messages, and batch processing scenarios also pay attention to the order of messages. However, stream processing scenarios use individual ack to acknowledge individual messages, while batch processing scenarios use cumulative ack to acknowledge messages in bulk. The subscription types Shared and key-shared recommend using individual ack and do not guarantee the order of messages, so we recommend using Shared and key-shared subscription types in stream processing scenarios where the order of messages is not a concern. This is not to say that stream processing scenarios do not care about the order of data, but rather that Shared and key-shared types should be used in stream processing scenarios where the order of data is not a concern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org