ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190841526
########## File path: site/docs/latest/getting-started/ConceptsAndArchitecture.md ########## @@ -522,18 +541,55 @@ while (true) { To create a reader that will read from the latest available message: ```java -MessageId id = MessageId.latest; -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Reader<byte[]> reader = pulsarClient.newReader() + .topic(topic) + .startMessageId(MessageId.latest) + .create(); ``` To create a reader that will read from some message between earliest and latest: ```java byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Reader<byte[]> reader = pulsarClient.newReader() + .topic(topic) + .startMessageId(id) + .create(); ``` +## Topic compaction {#compaction} + +Pulsar was built with highly scalable [persistent storage](#persistent-storage) of message data as a primary objective. Pulsar {% popover topics %} enable you to persistently store as many unacknowledged messages as you need while preserving message ordering. By default, Pulsar stores *all* unacknowledged/unprocessed messages produced on a topic. Accumulating many unacknowledged messages on a topic is necessary for many Pulsar use cases but it can also be very time intensive for Pulsar {% popover consumers %} to "rewind" through the entire log of messages. + +{% include admonition.html type="success" content="For a more practical guide to topic compaction, see the [Topic compaction cookbook](../../cookbooks/compaction)." %} + +For some use cases, however, consumers don't need a complete "image" of the topic log. They may only need a few values to construct a more "shallow" image of the log, perhaps even just the most recent value. For these kinds of use cases Pulsar offers **topic compaction**. When you run compaction on a topic, Pulsar goes through a topic's backlog and removes messages that are *obscured* by later messages, i.e. it goes through the topic on a per-key basis and leaves only the most recent message associated with that key. + +Pulsar's topic compaction feature: + +* Can help preserve disk space and allow for much more efficient "rewind" of topic logs Review comment: It doesn't help with disk space, as we don't delete the old data. It only helps with rewind. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services