[GitHub] jerrypeng commented on issue #1845: Functions schema integration
jerrypeng commented on issue #1845: Functions schema integration URL: https://github.com/apache/incubator-pulsar/pull/1845#issuecomment-392213345 Yup I agree with @srkukarni This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jerrypeng commented on a change in pull request #1845: Functions schema integration
jerrypeng commented on a change in pull request #1845: Functions schema integration URL: https://github.com/apache/incubator-pulsar/pull/1845#discussion_r191030350 ## File path: pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/PulsarSource.java ## @@ -60,12 +78,17 @@ public void open(Mapconfig) throws Exception { setupSerDe(); // Setup pulsar consumer -this.inputConsumer = this.pulsarClient.newConsumer() -.topics(new ArrayList<>(this.pulsarSourceConfig.getTopicSerdeClassNameMap().keySet())) +ConsumerBuilder consumerBuilder = +this.pulsarClient.newConsumer(emptySchema) .subscriptionName(this.pulsarSourceConfig.getSubscriptionName()) .subscriptionType(this.pulsarSourceConfig.getSubscriptionType().get()) -.ackTimeout(1, TimeUnit.MINUTES) -.subscribe(); +.ackTimeout(1, TimeUnit.MINUTES); + +topicToSerDeMap.forEach(consumerBuilder::addTopic); Review comment: topicToSerDeMap is never used and consumer is NOT passed in any topics. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jerrypeng commented on a change in pull request #1845: Functions schema integration
jerrypeng commented on a change in pull request #1845: Functions schema integration URL: https://github.com/apache/incubator-pulsar/pull/1845#discussion_r191030390 ## File path: pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/PulsarSource.java ## @@ -34,17 +40,29 @@ import org.apache.pulsar.io.core.Source; import org.jboss.util.Classes; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.Map; -import java.util.concurrent.TimeUnit; - @Slf4j public class PulsarSource implements Source { private PulsarClient pulsarClient; private PulsarSourceConfig pulsarSourceConfig; -private MaptopicToSerDeMap = new HashMap<>(); +private Map topicToSerDeMap = new HashMap<>(); + +private final Schema emptySchema = new Schema() { Review comment: Why is this needed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jerrypeng commented on issue #1845: Functions schema integration
jerrypeng commented on issue #1845: Functions schema integration URL: https://github.com/apache/incubator-pulsar/pull/1845#issuecomment-392213345 Yup I agree with @srkukarni This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie closed pull request #1844: Update pulsar netty/grpc dependencies
sijie closed pull request #1844: Update pulsar netty/grpc dependencies URL: https://github.com/apache/incubator-pulsar/pull/1844 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/pom.xml b/pom.xml index 90fa1fa83b..a59c94f7fe 100644 --- a/pom.xml +++ b/pom.xml @@ -123,7 +123,7 @@ flexible messaging model and an intuitive client API. 4.7.0 3.5.4-beta -4.1.21.Final +4.1.22.Final 1.0.5 9.3.11.v20160721 2.25 @@ -140,8 +140,8 @@ flexible messaging model and an intuitive client API. 0.5.0 2.2.1.SP1 2.4.1 -3.4.0 -1.5.0 +3.5.1 +1.12.0 1.0.0 2.8.2 0.8.3 @@ -266,6 +266,11 @@ flexible messaging model and an intuitive client API. org.jboss.netty netty + + +io.netty +netty-* + @@ -273,11 +278,23 @@ flexible messaging model and an intuitive client API. org.apache.bookkeeper stream-storage-java-client ${bookkeeper.version} + + +* +* + + org.apache.bookkeeper -bookkeeper-bookkeeper-stats-api +bookkeeper-common +${bookkeeper.version} + + + +org.apache.bookkeeper.stats +bookkeeper-stats-api ${bookkeeper.version} @@ -356,7 +373,7 @@ flexible messaging model and an intuitive client API. com.google.guava guava -20.0 +21.0 @@ -725,6 +742,13 @@ flexible messaging model and an intuitive client API. org.apache.distributedlog distributedlog-core ${bookkeeper.version} + + + +org.apache.bookkeeper +bookkeeper-server + + diff --git a/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java b/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java index e681fe74ed..f7dce01f24 100644 --- a/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java +++ b/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java @@ -18,8 +18,6 @@ */ package org.apache.pulsar.broker.namespace; -import static com.google.common.base.Preconditions.checkState; - import java.util.List; import java.util.Map; import java.util.Optional; @@ -47,8 +45,6 @@ import com.github.benmanes.caffeine.cache.AsyncCacheLoader; import com.github.benmanes.caffeine.cache.AsyncLoadingCache; import com.github.benmanes.caffeine.cache.Caffeine; -import com.github.benmanes.caffeine.cache.RemovalCause; -import com.github.benmanes.caffeine.cache.RemovalListener; import com.google.common.collect.Lists; import com.google.common.util.concurrent.MoreExecutors; @@ -163,7 +159,7 @@ public OwnershipCache(PulsarService pulsar, NamespaceBundleFactory bundleFactory this.localZkCache = pulsar.getLocalZkCache(); this.ownershipReadOnlyCache = pulsar.getLocalZkCacheService().ownerInfoCache(); // ownedBundlesCache contains all namespaces that are owned by the local broker -this.ownedBundlesCache = Caffeine.newBuilder().executor(MoreExecutors.sameThreadExecutor()) +this.ownedBundlesCache = Caffeine.newBuilder().executor(MoreExecutors.directExecutor()) .buildAsync(new OwnedServiceUnitCacheLoader()); } diff --git a/pulsar-client-tools/pom.xml b/pulsar-client-tools/pom.xml index 5f044f53e1..8f94210e7d 100644 --- a/pulsar-client-tools/pom.xml +++ b/pulsar-client-tools/pom.xml @@ -80,6 +80,22 @@ ${project.version} + + net.jodah + typetools + + + + org.apache.bookkeeper + stream-storage-java-client + + + * + * + + + + diff --git a/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/FailureDomain.java b/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/FailureDomain.java index dd1d09b8af..408b3aae80 100644 --- a/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/FailureDomain.java +++ b/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/FailureDomain.java @@ -18,11 +18,11 @@ */ package org.apache.pulsar.common.policies.data; +import com.google.common.base.MoreObjects; +import com.google.common.base.Objects; import java.util.HashSet; import java.util.Set; -import com.google.common.base.Objects; - public class FailureDomain { public Set brokers = new HashSet(); @@ -47,6 +47,6 @@ public boolean equals(Object obj) { @Override public
[incubator-pulsar] branch master updated: Update pulsar netty/grpc dependencies (#1844)
This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git The following commit(s) were added to refs/heads/master by this push: new 9e88b4d Update pulsar netty/grpc dependencies (#1844) 9e88b4d is described below commit 9e88b4de751f0fd3c6e1f7d5aaa176c1cf84c52d Author: Sijie GuoAuthorDate: Fri May 25 16:14:49 2018 -0700 Update pulsar netty/grpc dependencies (#1844) *Motivation* apache/pulsar#1816 breaks the pulsar functions running in process mode. This because it removes the shading and introduces the conflicts between the netty version that grpc/bk depends and the netty version pulsar depends. the grpc doesn't work which fails the health check on functions *Solution* Update the netty/grpc/protobuf to align the versions to avoid conflicts between versions. --- pom.xml| 34 -- .../pulsar/broker/namespace/OwnershipCache.java| 6 +-- pulsar-client-tools/pom.xml| 16 +++ .../pulsar/common/policies/data/FailureDomain.java | 6 +-- pulsar-functions/instance/pom.xml | 11 + pulsar-functions/runtime-all/pom.xml | 54 -- pulsar-functions/runtime/pom.xml | 5 ++ .../pulsar/functions/runtime/ProcessRuntime.java | 1 + .../pulsar/functions/worker/FunctionActioner.java | 6 ++- 9 files changed, 70 insertions(+), 69 deletions(-) diff --git a/pom.xml b/pom.xml index 90fa1fa..a59c94f 100644 --- a/pom.xml +++ b/pom.xml @@ -123,7 +123,7 @@ flexible messaging model and an intuitive client API. 4.7.0 3.5.4-beta -4.1.21.Final +4.1.22.Final 1.0.5 9.3.11.v20160721 2.25 @@ -140,8 +140,8 @@ flexible messaging model and an intuitive client API. 0.5.0 2.2.1.SP1 2.4.1 -3.4.0 -1.5.0 +3.5.1 +1.12.0 1.0.0 2.8.2 0.8.3 @@ -266,6 +266,11 @@ flexible messaging model and an intuitive client API. org.jboss.netty netty + + +io.netty +netty-* + @@ -273,11 +278,23 @@ flexible messaging model and an intuitive client API. org.apache.bookkeeper stream-storage-java-client ${bookkeeper.version} + + +* +* + + org.apache.bookkeeper -bookkeeper-bookkeeper-stats-api +bookkeeper-common +${bookkeeper.version} + + + +org.apache.bookkeeper.stats +bookkeeper-stats-api ${bookkeeper.version} @@ -356,7 +373,7 @@ flexible messaging model and an intuitive client API. com.google.guava guava -20.0 +21.0 @@ -725,6 +742,13 @@ flexible messaging model and an intuitive client API. org.apache.distributedlog distributedlog-core ${bookkeeper.version} + + + +org.apache.bookkeeper +bookkeeper-server + + diff --git a/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java b/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java index e681fe7..f7dce01 100644 --- a/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java +++ b/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java @@ -18,8 +18,6 @@ */ package org.apache.pulsar.broker.namespace; -import static com.google.common.base.Preconditions.checkState; - import java.util.List; import java.util.Map; import java.util.Optional; @@ -47,8 +45,6 @@ import com.fasterxml.jackson.databind.ObjectMapper; import com.github.benmanes.caffeine.cache.AsyncCacheLoader; import com.github.benmanes.caffeine.cache.AsyncLoadingCache; import com.github.benmanes.caffeine.cache.Caffeine; -import com.github.benmanes.caffeine.cache.RemovalCause; -import com.github.benmanes.caffeine.cache.RemovalListener; import com.google.common.collect.Lists; import com.google.common.util.concurrent.MoreExecutors; @@ -163,7 +159,7 @@ public class OwnershipCache { this.localZkCache = pulsar.getLocalZkCache(); this.ownershipReadOnlyCache = pulsar.getLocalZkCacheService().ownerInfoCache(); // ownedBundlesCache contains all namespaces that are owned by the local broker -this.ownedBundlesCache = Caffeine.newBuilder().executor(MoreExecutors.sameThreadExecutor()) +this.ownedBundlesCache = Caffeine.newBuilder().executor(MoreExecutors.directExecutor()) .buildAsync(new OwnedServiceUnitCacheLoader()); } diff --git a/pulsar-client-tools/pom.xml
[GitHub] srkukarni commented on issue #1845: Functions schema integration
srkukarni commented on issue #1845: Functions schema integration URL: https://github.com/apache/incubator-pulsar/pull/1845#issuecomment-392179231 Most of the changes in this pr touch producer/consumer. Plus there are a bunch of other changes necessary for functions that are not there yet(like schema validation at client/server side). My suggestion would to limit this change to the core pulsar changes and follow up with a different and more comprehensive pr that deals with just functions This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1845: Functions schema integration
merlimat commented on a change in pull request #1845: Functions schema integration URL: https://github.com/apache/incubator-pulsar/pull/1845#discussion_r190985761 ## File path: pulsar-client/src/main/java/org/apache/pulsar/client/api/ConsumerBuilder.java ## @@ -94,6 +94,11 @@ */ ConsumerBuilder topics(List topicNames); +/** + * + */ +ConsumerBuilder addTopic(String topicName, Schema schema); Review comment: This is a bit different from the the other `topic()` `topics()` methods. Also I find it confusing having a per topic schema, when the consumer itself needs to be created with a schema This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1845: Functions schema integration
sijie commented on issue #1845: Functions schema integration URL: https://github.com/apache/incubator-pulsar/pull/1845#issuecomment-392140302 @srkukarni @jerrypeng please review This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie closed pull request #1843: If left unspecified, function tenants and namespaces should have the same behavior as topics
sijie closed pull request #1843: If left unspecified, function tenants and namespaces should have the same behavior as topics URL: https://github.com/apache/incubator-pulsar/pull/1843 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/pulsar-client-tools-test/src/test/java/org/apache/pulsar/admin/cli/CmdFunctionsTest.java b/pulsar-client-tools-test/src/test/java/org/apache/pulsar/admin/cli/CmdFunctionsTest.java index 3d0bb2b6e8..de4ab40ca8 100644 --- a/pulsar-client-tools-test/src/test/java/org/apache/pulsar/admin/cli/CmdFunctionsTest.java +++ b/pulsar-client-tools-test/src/test/java/org/apache/pulsar/admin/cli/CmdFunctionsTest.java @@ -209,7 +209,7 @@ public void testCreateWithoutTenant() throws Exception { }); CreateFunction creater = cmd.getCreater(); -assertEquals("tenant", creater.getFunctionConfig().getTenant()); +assertEquals("public", creater.getFunctionConfig().getTenant()); verify(functions, times(1)).createFunction(any(FunctionDetails.class), anyString()); } @@ -228,8 +228,8 @@ public void testCreateWithoutNamespace() throws Exception { }); CreateFunction creater = cmd.getCreater(); -assertEquals("tenant", creater.getFunctionConfig().getTenant()); -assertEquals("namespace", creater.getFunctionConfig().getNamespace()); +assertEquals("public", creater.getFunctionConfig().getTenant()); +assertEquals("default", creater.getFunctionConfig().getNamespace()); verify(functions, times(1)).createFunction(any(FunctionDetails.class), anyString()); } diff --git a/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdFunctions.java b/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdFunctions.java index 21de29b2d3..d565568602 100644 --- a/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdFunctions.java +++ b/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdFunctions.java @@ -22,6 +22,8 @@ import static java.nio.charset.StandardCharsets.UTF_8; import static java.util.Objects.isNull; import static org.apache.bookkeeper.common.concurrent.FutureUtils.result; +import static org.apache.pulsar.common.naming.TopicName.DEFAULT_NAMESPACE; +import static org.apache.pulsar.common.naming.TopicName.PUBLIC_TENANT; import com.beust.jcommander.Parameter; import com.beust.jcommander.Parameters; @@ -606,21 +608,11 @@ private void inferMissingFunctionName(FunctionConfig functionConfig) { } private void inferMissingTenant(FunctionConfig functionConfig) { -try { -String inputTopic = getUniqueInput(functionConfig); - functionConfig.setTenant(TopicName.get(inputTopic).getTenant()); -} catch (IllegalArgumentException ex) { -throw new RuntimeException("You need to specify a tenant for the function", ex); -} +functionConfig.setTenant(PUBLIC_TENANT); } private void inferMissingNamespace(FunctionConfig functionConfig) { -try { -String inputTopic = getUniqueInput(functionConfig); - functionConfig.setNamespace(TopicName.get(inputTopic).getNamespacePortion()); -} catch (IllegalArgumentException ex) { -throw new RuntimeException("You need to specify a namespace for the function"); -} +functionConfig.setNamespace(DEFAULT_NAMESPACE); } private void inferMissingOutput(FunctionConfig functionConfig) { diff --git a/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdSinks.java b/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdSinks.java index fa99091efb..822d600851 100644 --- a/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdSinks.java +++ b/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdSinks.java @@ -56,6 +56,9 @@ import net.jodah.typetools.TypeResolver; +import static org.apache.pulsar.common.naming.TopicName.DEFAULT_NAMESPACE; +import static org.apache.pulsar.common.naming.TopicName.PUBLIC_TENANT; + @Getter @Parameters(commandDescription = "Interface for managing Pulsar Sinks (Egress data from Pulsar)") public class CmdSinks extends CmdBase { @@ -182,9 +185,13 @@ void processArguments() throws Exception { if (null != tenant) { sinkConfig.setTenant(tenant); +} else if (sinkConfig.getTenant() == null) { +sinkConfig.setTenant(PUBLIC_TENANT); } if (null != namespace) { sinkConfig.setNamespace(namespace); +} else if (sinkConfig.getNamespace() == null) { +
[GitHub] sijie commented on a change in pull request #1844: Update pulsar netty/grpc dependencies
sijie commented on a change in pull request #1844: Update pulsar netty/grpc dependencies URL: https://github.com/apache/incubator-pulsar/pull/1844#discussion_r190972630 ## File path: pom.xml ## @@ -121,9 +121,9 @@ flexible messaging model and an intuitive client API. false 1 -4.7.0 +4.8.0-SNAPSHOT 3.5.4-beta -4.1.21.Final +4.1.22.Final Review comment: I hesitated to do it for 2.1.0 release, since I am not sure if the current grpc 1.12.0 works with 4.1.25. I would suggest sticking to 4.1.22, we can look into bumping it into 4.1.25. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on a change in pull request #1844: Update pulsar netty/grpc dependencies
sijie commented on a change in pull request #1844: Update pulsar netty/grpc dependencies URL: https://github.com/apache/incubator-pulsar/pull/1844#discussion_r190964637 ## File path: pom.xml ## @@ -121,9 +121,9 @@ flexible messaging model and an intuitive client API. false 1 -4.7.0 +4.8.0-SNAPSHOT Review comment: the plan is to use 4.8.0-SNAPSHOT once apache/bookkeeper#1441 is merged. and changed it to 4.7.1 once it is released. does that work for you? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1844: Update pulsar netty/grpc dependencies
merlimat commented on a change in pull request #1844: Update pulsar netty/grpc dependencies URL: https://github.com/apache/incubator-pulsar/pull/1844#discussion_r190954166 ## File path: pom.xml ## @@ -121,9 +121,9 @@ flexible messaging model and an intuitive client API. false 1 -4.7.0 +4.8.0-SNAPSHOT 3.5.4-beta -4.1.21.Final +4.1.22.Final Review comment: We should try to stick with latest stable (4.1.25) since it has bugfixes, if there are no conflicts with grpc This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #1844: Update pulsar netty/grpc dependencies
merlimat commented on a change in pull request #1844: Update pulsar netty/grpc dependencies URL: https://github.com/apache/incubator-pulsar/pull/1844#discussion_r190953327 ## File path: pom.xml ## @@ -121,9 +121,9 @@ flexible messaging model and an intuitive client API. false 1 -4.7.0 +4.8.0-SNAPSHOT Review comment: We should try to avoid going back to use snapshot dependency here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mgodave opened a new pull request #1845: Functions schema integration
mgodave opened a new pull request #1845: Functions schema integration URL: https://github.com/apache/incubator-pulsar/pull/1845 Unify SerDe and Schema. Allow SerDe/Schema to be used when creating PulsarSink and PulsarSource. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XiaoZYang commented on issue #133: delayed message other than delivered right away
XiaoZYang commented on issue #133: delayed message other than delivered right away URL: https://github.com/apache/incubator-pulsar/issues/133#issuecomment-392103785 Cool, that sounds interesting ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190840650 ## File path: site/docs/latest/cookbooks/compaction.md ## @@ -0,0 +1,113 @@ +--- +title: Topic compaction cookbook +tags: [admin, clients, compaction] +--- + +Pulsar's [topic compaction](../../getting-started/ConceptsAndArchitecture#compaction) feature enables you to create **compacted** topics in which older, "obscured" entries are pruned from the topic, allowing for faster reads through the topic's history (which messages are deemed obscured/outdated/irrelevant will depend on your use case). + +To use compaction: + +* You must manually [trigger](#trigger) compaction using the Pulsar administrative API. This will both run a compaction operation *and* mark the topic as a compacted topic. +* Your {% popover consumers %} must be [configured](#config) to read from compacted topics (or else the messages won't be properly read/processed/acknowledged). + +In Pulsar, topic compaction takes place on a *per-key basis*, meaning that messages are compacted based on their key. For the stock ticker use case, the stock symbol---e.g. `AAPL` or `GOOG`---could serve as the key. + +## When should I use compacted topics? + +The classic example of a topic that could benefit from compaction would be a stock ticker topic through which {% popover consumers %} can access up-to-date values for specific stocks. On a stock ticker topic you only care about the most recent value of each stock; "historical values" don't matter, so there's no need to read through outdated data when processing a topic's messages. For topics where older values are important, for example when you need to process long series of messages in order, compaction is unnecessary and possibly even harmful. + +{% include admonition.html type="warning" content="Compaction only works on topics where each message has a key (as in the stock ticker example, where the stock symbol serves as the key). Keys can be thought of as the axis along which compaction is applied." %} + +## When should I trigger compaction? + +How often you trigger compaction will vary widely based on the use case. If you want a compacted topic to be extremely speedy on read, then you should run compaction fairly frequently. + +{% include admonition.html type="warning" title="No automatic compaction" content="Currently, all topic compaction in Pulsar must be initialized manually." %} + +## Which messages get compacted? + +When you [trigger](#trigger) compaction on a topic, all messages with the following + +{% include admonition.html type="warning" title="Message keys are required" +content="Messages that don't have keys are simply left alone and *never* compacted. In order to use compaction, you'll need to come up with some kind of key-based scheme for messages on the topic." %} + +## Triggering compaction {#trigger} + +In order to run compaction on a topic, you need to use the [`topics compact`](../../CliTools#pulsar-admin-topics-compact) command for the [`pulsar-admin`](../../CliTools#pulsar-admin) CLI tool. Here's an example: + +```bash +$ bin/pulsar-admin topics compact \ + persistent://my-tenant/my-namespace/my-topic +``` + +The `pulsar-admin` tool runs compaction via the Pulsar [REST API](../../reference/RestApi). To run compaction locally, i.e. *not* through the REST API, you can use the [`pulsar compact-topic`](../../CliTools#pulsar-compact-topic) command. Here's an example: Review comment: s/locally/in its own dedicated process/ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190843762 ## File path: site/docs/latest/getting-started/ConceptsAndArchitecture.md ## @@ -522,18 +541,55 @@ while (true) { To create a reader that will read from the latest available message: ```java -MessageId id = MessageId.latest; -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Readerreader = pulsarClient.newReader() +.topic(topic) +.startMessageId(MessageId.latest) +.create(); ``` To create a reader that will read from some message between earliest and latest: ```java byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Reader reader = pulsarClient.newReader() +.topic(topic) +.startMessageId(id) +.create(); ``` +## Topic compaction {#compaction} + +Pulsar was built with highly scalable [persistent storage](#persistent-storage) of message data as a primary objective. Pulsar {% popover topics %} enable you to persistently store as many unacknowledged messages as you need while preserving message ordering. By default, Pulsar stores *all* unacknowledged/unprocessed messages produced on a topic. Accumulating many unacknowledged messages on a topic is necessary for many Pulsar use cases but it can also be very time intensive for Pulsar {% popover consumers %} to "rewind" through the entire log of messages. + +{% include admonition.html type="success" content="For a more practical guide to topic compaction, see the [Topic compaction cookbook](../../cookbooks/compaction)." %} + +For some use cases, however, consumers don't need a complete "image" of the topic log. They may only need a few values to construct a more "shallow" image of the log, perhaps even just the most recent value. For these kinds of use cases Pulsar offers **topic compaction**. When you run compaction on a topic, Pulsar goes through a topic's backlog and removes messages that are *obscured* by later messages, i.e. it goes through the topic on a per-key basis and leaves only the most recent message associated with that key. + +Pulsar's topic compaction feature: + +* Can help preserve disk space and allow for much more efficient "rewind" of topic logs +* Applies only to [persistent topics](#persistent-storage) +* Is triggered manually via the command line. See the [Topic compaction cookbook](../../cookbooks/compaction) +* Is conceptually and operationally distinct from [retention and expiry](#message-retention-and-expiry) + +{% include admonition.html type="info" title="Topic compaction example: the stock ticker" + content="An example use case for a compacted Pulsar topic would be a stock ticker topic. On a stock ticker topic, each message bears a timestamped dollar value for stocks for purchase (with the message key holding the stock symbol, e.g. `AAPL` or `GOOG`). With a stock ticker you may care only about the most recent value(s) of the stock and have no interest in historical data (i.e. you don't need to construct a complete image of the topic's sequence of messages per key). Compaction would be highly beneficial in this case because it would keep consumers from needing to rewind through obscured messages." %} + +### How topic compaction works + +When topic compaction is triggered [via the CLI](../../cookbooks/compaction), Pulsar will iterate over the entire topic from beginning to end. For each key that it encounters the {% popover broker %} responsible will keep a record of the latest occurrence of that key. When this iterative process is finished, the broker will create a [BookKeeper ledger](#ledgers) to store the compacted topic. + +After that, the broker will make a second iteration through each message on the topic. For each message, if the key matches the latest occurrence of that key, then the key's data payload, message ID, and metadata will be written to the newly created BookKeeper ledger. If the key doesn't match the latest then the message will be skipped and left alone. If any given message has an empty payload, it will be skipped and considered deleted (akin to the concept of [tombstones](http://docs.basho.com/riak/kv/2.2.3/using/reference/object-deletion/#tombstones) in key-value databases). At the end of this second iteration through the topic, the newly created BookKeeper ledger is closed and two things are written to the topic's metadata: the ID of the BookKeeper ledger and the message ID of the last compacted message (this is known as the **compaction horizon** of the topic). Once this metadata is written compaction is complete. Review comment: The "newly created BookKeeper ledger" should be introduced before we talk about sending data to it.
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190840793 ## File path: site/docs/latest/cookbooks/compaction.md ## @@ -0,0 +1,113 @@ +--- +title: Topic compaction cookbook +tags: [admin, clients, compaction] +--- + +Pulsar's [topic compaction](../../getting-started/ConceptsAndArchitecture#compaction) feature enables you to create **compacted** topics in which older, "obscured" entries are pruned from the topic, allowing for faster reads through the topic's history (which messages are deemed obscured/outdated/irrelevant will depend on your use case). + +To use compaction: + +* You must manually [trigger](#trigger) compaction using the Pulsar administrative API. This will both run a compaction operation *and* mark the topic as a compacted topic. +* Your {% popover consumers %} must be [configured](#config) to read from compacted topics (or else the messages won't be properly read/processed/acknowledged). + +In Pulsar, topic compaction takes place on a *per-key basis*, meaning that messages are compacted based on their key. For the stock ticker use case, the stock symbol---e.g. `AAPL` or `GOOG`---could serve as the key. + +## When should I use compacted topics? + +The classic example of a topic that could benefit from compaction would be a stock ticker topic through which {% popover consumers %} can access up-to-date values for specific stocks. On a stock ticker topic you only care about the most recent value of each stock; "historical values" don't matter, so there's no need to read through outdated data when processing a topic's messages. For topics where older values are important, for example when you need to process long series of messages in order, compaction is unnecessary and possibly even harmful. + +{% include admonition.html type="warning" content="Compaction only works on topics where each message has a key (as in the stock ticker example, where the stock symbol serves as the key). Keys can be thought of as the axis along which compaction is applied." %} + +## When should I trigger compaction? + +How often you trigger compaction will vary widely based on the use case. If you want a compacted topic to be extremely speedy on read, then you should run compaction fairly frequently. + +{% include admonition.html type="warning" title="No automatic compaction" content="Currently, all topic compaction in Pulsar must be initialized manually." %} + +## Which messages get compacted? + +When you [trigger](#trigger) compaction on a topic, all messages with the following + +{% include admonition.html type="warning" title="Message keys are required" +content="Messages that don't have keys are simply left alone and *never* compacted. In order to use compaction, you'll need to come up with some kind of key-based scheme for messages on the topic." %} + +## Triggering compaction {#trigger} + +In order to run compaction on a topic, you need to use the [`topics compact`](../../CliTools#pulsar-admin-topics-compact) command for the [`pulsar-admin`](../../CliTools#pulsar-admin) CLI tool. Here's an example: + +```bash +$ bin/pulsar-admin topics compact \ + persistent://my-tenant/my-namespace/my-topic +``` + +The `pulsar-admin` tool runs compaction via the Pulsar [REST API](../../reference/RestApi). To run compaction locally, i.e. *not* through the REST API, you can use the [`pulsar compact-topic`](../../CliTools#pulsar-compact-topic) command. Here's an example: + +```bash +$ bin/pulsar compact-topic \ + --topic persistent://my-tenant-namespace/my-topic +``` + +The `pulsar compact-topic` command communicates with [ZooKeeper](https://zookeeper.apache.org) directly. In order to establish communication with ZooKeeper, though, the `pulsar` CLI tool will need to have a valid [broker configuration](../../Configuration#broker). You can either supply a proper configuration in `conf/broker.conf` or specify a non-default location for the configuration: + +```bash +$ bin/pulsar compact-topic \ + --broker-conf /path/to/broker.conf \ + --topic persistent://my-tenant/my-namespace/my-topic + +# If the configuration is in conf/broker.conf +$ bin/pulsar compact-topic \ + --topic persistent://my-tenant/my-namespace/my-topic +``` + +## Consumer configuration {#config} + +Pulsar consumers and readers need to be properly configured to read from compacted topics. The sections below show you how to enable compacted topic reads for Pulsar's language clients. If the Review comment: Remove "properly" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190840343 ## File path: site/docs/latest/cookbooks/compaction.md ## @@ -0,0 +1,113 @@ +--- +title: Topic compaction cookbook +tags: [admin, clients, compaction] +--- + +Pulsar's [topic compaction](../../getting-started/ConceptsAndArchitecture#compaction) feature enables you to create **compacted** topics in which older, "obscured" entries are pruned from the topic, allowing for faster reads through the topic's history (which messages are deemed obscured/outdated/irrelevant will depend on your use case). + +To use compaction: + +* You must manually [trigger](#trigger) compaction using the Pulsar administrative API. This will both run a compaction operation *and* mark the topic as a compacted topic. +* Your {% popover consumers %} must be [configured](#config) to read from compacted topics (or else the messages won't be properly read/processed/acknowledged). + +In Pulsar, topic compaction takes place on a *per-key basis*, meaning that messages are compacted based on their key. For the stock ticker use case, the stock symbol---e.g. `AAPL` or `GOOG`---could serve as the key. + +## When should I use compacted topics? + +The classic example of a topic that could benefit from compaction would be a stock ticker topic through which {% popover consumers %} can access up-to-date values for specific stocks. On a stock ticker topic you only care about the most recent value of each stock; "historical values" don't matter, so there's no need to read through outdated data when processing a topic's messages. For topics where older values are important, for example when you need to process long series of messages in order, compaction is unnecessary and possibly even harmful. + +{% include admonition.html type="warning" content="Compaction only works on topics where each message has a key (as in the stock ticker example, where the stock symbol serves as the key). Keys can be thought of as the axis along which compaction is applied." %} + +## When should I trigger compaction? + +How often you trigger compaction will vary widely based on the use case. If you want a compacted topic to be extremely speedy on read, then you should run compaction fairly frequently. + +{% include admonition.html type="warning" title="No automatic compaction" content="Currently, all topic compaction in Pulsar must be initialized manually." %} Review comment: must be initiated manually, via the cli or rest api. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190843622 ## File path: site/docs/latest/getting-started/ConceptsAndArchitecture.md ## @@ -522,18 +541,55 @@ while (true) { To create a reader that will read from the latest available message: ```java -MessageId id = MessageId.latest; -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Readerreader = pulsarClient.newReader() +.topic(topic) +.startMessageId(MessageId.latest) +.create(); ``` To create a reader that will read from some message between earliest and latest: ```java byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Reader reader = pulsarClient.newReader() +.topic(topic) +.startMessageId(id) +.create(); ``` +## Topic compaction {#compaction} + +Pulsar was built with highly scalable [persistent storage](#persistent-storage) of message data as a primary objective. Pulsar {% popover topics %} enable you to persistently store as many unacknowledged messages as you need while preserving message ordering. By default, Pulsar stores *all* unacknowledged/unprocessed messages produced on a topic. Accumulating many unacknowledged messages on a topic is necessary for many Pulsar use cases but it can also be very time intensive for Pulsar {% popover consumers %} to "rewind" through the entire log of messages. + +{% include admonition.html type="success" content="For a more practical guide to topic compaction, see the [Topic compaction cookbook](../../cookbooks/compaction)." %} + +For some use cases, however, consumers don't need a complete "image" of the topic log. They may only need a few values to construct a more "shallow" image of the log, perhaps even just the most recent value. For these kinds of use cases Pulsar offers **topic compaction**. When you run compaction on a topic, Pulsar goes through a topic's backlog and removes messages that are *obscured* by later messages, i.e. it goes through the topic on a per-key basis and leaves only the most recent message associated with that key. + +Pulsar's topic compaction feature: + +* Can help preserve disk space and allow for much more efficient "rewind" of topic logs +* Applies only to [persistent topics](#persistent-storage) +* Is triggered manually via the command line. See the [Topic compaction cookbook](../../cookbooks/compaction) +* Is conceptually and operationally distinct from [retention and expiry](#message-retention-and-expiry) + +{% include admonition.html type="info" title="Topic compaction example: the stock ticker" + content="An example use case for a compacted Pulsar topic would be a stock ticker topic. On a stock ticker topic, each message bears a timestamped dollar value for stocks for purchase (with the message key holding the stock symbol, e.g. `AAPL` or `GOOG`). With a stock ticker you may care only about the most recent value(s) of the stock and have no interest in historical data (i.e. you don't need to construct a complete image of the topic's sequence of messages per key). Compaction would be highly beneficial in this case because it would keep consumers from needing to rewind through obscured messages." %} + +### How topic compaction works + +When topic compaction is triggered [via the CLI](../../cookbooks/compaction), Pulsar will iterate over the entire topic from beginning to end. For each key that it encounters the {% popover broker %} responsible will keep a record of the latest occurrence of that key. When this iterative process is finished, the broker will create a [BookKeeper ledger](#ledgers) to store the compacted topic. + +After that, the broker will make a second iteration through each message on the topic. For each message, if the key matches the latest occurrence of that key, then the key's data payload, message ID, and metadata will be written to the newly created BookKeeper ledger. If the key doesn't match the latest then the message will be skipped and left alone. If any given message has an empty payload, it will be skipped and considered deleted (akin to the concept of [tombstones](http://docs.basho.com/riak/kv/2.2.3/using/reference/object-deletion/#tombstones) in key-value databases). At the end of this second iteration through the topic, the newly created BookKeeper ledger is closed and two things are written to the topic's metadata: the ID of the BookKeeper ledger and the message ID of the last compacted message (this is known as the **compaction horizon** of the topic). Once this metadata is written compaction is complete. Review comment: We shouldn't link to a blog belonging to a company in receivership. There's a wikipedia page for it:
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190841526 ## File path: site/docs/latest/getting-started/ConceptsAndArchitecture.md ## @@ -522,18 +541,55 @@ while (true) { To create a reader that will read from the latest available message: ```java -MessageId id = MessageId.latest; -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Readerreader = pulsarClient.newReader() +.topic(topic) +.startMessageId(MessageId.latest) +.create(); ``` To create a reader that will read from some message between earliest and latest: ```java byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Reader reader = pulsarClient.newReader() +.topic(topic) +.startMessageId(id) +.create(); ``` +## Topic compaction {#compaction} + +Pulsar was built with highly scalable [persistent storage](#persistent-storage) of message data as a primary objective. Pulsar {% popover topics %} enable you to persistently store as many unacknowledged messages as you need while preserving message ordering. By default, Pulsar stores *all* unacknowledged/unprocessed messages produced on a topic. Accumulating many unacknowledged messages on a topic is necessary for many Pulsar use cases but it can also be very time intensive for Pulsar {% popover consumers %} to "rewind" through the entire log of messages. + +{% include admonition.html type="success" content="For a more practical guide to topic compaction, see the [Topic compaction cookbook](../../cookbooks/compaction)." %} + +For some use cases, however, consumers don't need a complete "image" of the topic log. They may only need a few values to construct a more "shallow" image of the log, perhaps even just the most recent value. For these kinds of use cases Pulsar offers **topic compaction**. When you run compaction on a topic, Pulsar goes through a topic's backlog and removes messages that are *obscured* by later messages, i.e. it goes through the topic on a per-key basis and leaves only the most recent message associated with that key. + +Pulsar's topic compaction feature: + +* Can help preserve disk space and allow for much more efficient "rewind" of topic logs Review comment: It doesn't help with disk space, as we don't delete the old data. It only helps with rewind. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190840491 ## File path: site/docs/latest/cookbooks/compaction.md ## @@ -0,0 +1,113 @@ +--- +title: Topic compaction cookbook +tags: [admin, clients, compaction] +--- + +Pulsar's [topic compaction](../../getting-started/ConceptsAndArchitecture#compaction) feature enables you to create **compacted** topics in which older, "obscured" entries are pruned from the topic, allowing for faster reads through the topic's history (which messages are deemed obscured/outdated/irrelevant will depend on your use case). + +To use compaction: + +* You must manually [trigger](#trigger) compaction using the Pulsar administrative API. This will both run a compaction operation *and* mark the topic as a compacted topic. +* Your {% popover consumers %} must be [configured](#config) to read from compacted topics (or else the messages won't be properly read/processed/acknowledged). + +In Pulsar, topic compaction takes place on a *per-key basis*, meaning that messages are compacted based on their key. For the stock ticker use case, the stock symbol---e.g. `AAPL` or `GOOG`---could serve as the key. + +## When should I use compacted topics? + +The classic example of a topic that could benefit from compaction would be a stock ticker topic through which {% popover consumers %} can access up-to-date values for specific stocks. On a stock ticker topic you only care about the most recent value of each stock; "historical values" don't matter, so there's no need to read through outdated data when processing a topic's messages. For topics where older values are important, for example when you need to process long series of messages in order, compaction is unnecessary and possibly even harmful. + +{% include admonition.html type="warning" content="Compaction only works on topics where each message has a key (as in the stock ticker example, where the stock symbol serves as the key). Keys can be thought of as the axis along which compaction is applied." %} + +## When should I trigger compaction? + +How often you trigger compaction will vary widely based on the use case. If you want a compacted topic to be extremely speedy on read, then you should run compaction fairly frequently. + +{% include admonition.html type="warning" title="No automatic compaction" content="Currently, all topic compaction in Pulsar must be initialized manually." %} + +## Which messages get compacted? + +When you [trigger](#trigger) compaction on a topic, all messages with the following Review comment: missing something at the end here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190843936 ## File path: site/docs/latest/getting-started/ConceptsAndArchitecture.md ## @@ -522,18 +541,55 @@ while (true) { To create a reader that will read from the latest available message: ```java -MessageId id = MessageId.latest; -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Readerreader = pulsarClient.newReader() +.topic(topic) +.startMessageId(MessageId.latest) +.create(); ``` To create a reader that will read from some message between earliest and latest: ```java byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Reader reader = pulsarClient.newReader() +.topic(topic) +.startMessageId(id) +.create(); ``` +## Topic compaction {#compaction} + +Pulsar was built with highly scalable [persistent storage](#persistent-storage) of message data as a primary objective. Pulsar {% popover topics %} enable you to persistently store as many unacknowledged messages as you need while preserving message ordering. By default, Pulsar stores *all* unacknowledged/unprocessed messages produced on a topic. Accumulating many unacknowledged messages on a topic is necessary for many Pulsar use cases but it can also be very time intensive for Pulsar {% popover consumers %} to "rewind" through the entire log of messages. + +{% include admonition.html type="success" content="For a more practical guide to topic compaction, see the [Topic compaction cookbook](../../cookbooks/compaction)." %} + +For some use cases, however, consumers don't need a complete "image" of the topic log. They may only need a few values to construct a more "shallow" image of the log, perhaps even just the most recent value. For these kinds of use cases Pulsar offers **topic compaction**. When you run compaction on a topic, Pulsar goes through a topic's backlog and removes messages that are *obscured* by later messages, i.e. it goes through the topic on a per-key basis and leaves only the most recent message associated with that key. + +Pulsar's topic compaction feature: + +* Can help preserve disk space and allow for much more efficient "rewind" of topic logs +* Applies only to [persistent topics](#persistent-storage) +* Is triggered manually via the command line. See the [Topic compaction cookbook](../../cookbooks/compaction) +* Is conceptually and operationally distinct from [retention and expiry](#message-retention-and-expiry) + +{% include admonition.html type="info" title="Topic compaction example: the stock ticker" + content="An example use case for a compacted Pulsar topic would be a stock ticker topic. On a stock ticker topic, each message bears a timestamped dollar value for stocks for purchase (with the message key holding the stock symbol, e.g. `AAPL` or `GOOG`). With a stock ticker you may care only about the most recent value(s) of the stock and have no interest in historical data (i.e. you don't need to construct a complete image of the topic's sequence of messages per key). Compaction would be highly beneficial in this case because it would keep consumers from needing to rewind through obscured messages." %} + +### How topic compaction works + +When topic compaction is triggered [via the CLI](../../cookbooks/compaction), Pulsar will iterate over the entire topic from beginning to end. For each key that it encounters the {% popover broker %} responsible will keep a record of the latest occurrence of that key. When this iterative process is finished, the broker will create a [BookKeeper ledger](#ledgers) to store the compacted topic. + +After that, the broker will make a second iteration through each message on the topic. For each message, if the key matches the latest occurrence of that key, then the key's data payload, message ID, and metadata will be written to the newly created BookKeeper ledger. If the key doesn't match the latest then the message will be skipped and left alone. If any given message has an empty payload, it will be skipped and considered deleted (akin to the concept of [tombstones](http://docs.basho.com/riak/kv/2.2.3/using/reference/object-deletion/#tombstones) in key-value databases). At the end of this second iteration through the topic, the newly created BookKeeper ledger is closed and two things are written to the topic's metadata: the ID of the BookKeeper ledger and the message ID of the last compacted message (this is known as the **compaction horizon** of the topic). Once this metadata is written compaction is complete. + +{% include admonition.html type="info" title="Compaction leaves the original topic intact" %} + +In addition to performing compaction, Pulsar {% popover
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190833328 ## File path: site/docs/latest/cookbooks/compaction.md ## @@ -0,0 +1,113 @@ +--- +title: Topic compaction cookbook +tags: [admin, clients, compaction] +--- + +Pulsar's [topic compaction](../../getting-started/ConceptsAndArchitecture#compaction) feature enables you to create **compacted** topics in which older, "obscured" entries are pruned from the topic, allowing for faster reads through the topic's history (which messages are deemed obscured/outdated/irrelevant will depend on your use case). + +To use compaction: + +* You must manually [trigger](#trigger) compaction using the Pulsar administrative API. This will both run a compaction operation *and* mark the topic as a compacted topic. +* Your {% popover consumers %} must be [configured](#config) to read from compacted topics (or else the messages won't be properly read/processed/acknowledged). Review comment: So, if the consumer doesn't set readCompacted, they will read the topic the same way as they would if compaction hadn't taken place. There's no proper/improper-ness about this, as some usecases will want to read the whole thing anyhow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190841330 ## File path: site/docs/latest/getting-started/ConceptsAndArchitecture.md ## @@ -522,18 +541,55 @@ while (true) { To create a reader that will read from the latest available message: ```java -MessageId id = MessageId.latest; -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Readerreader = pulsarClient.newReader() +.topic(topic) +.startMessageId(MessageId.latest) +.create(); ``` To create a reader that will read from some message between earliest and latest: ```java byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Reader reader = pulsarClient.newReader() +.topic(topic) +.startMessageId(id) +.create(); ``` +## Topic compaction {#compaction} + +Pulsar was built with highly scalable [persistent storage](#persistent-storage) of message data as a primary objective. Pulsar {% popover topics %} enable you to persistently store as many unacknowledged messages as you need while preserving message ordering. By default, Pulsar stores *all* unacknowledged/unprocessed messages produced on a topic. Accumulating many unacknowledged messages on a topic is necessary for many Pulsar use cases but it can also be very time intensive for Pulsar {% popover consumers %} to "rewind" through the entire log of messages. + +{% include admonition.html type="success" content="For a more practical guide to topic compaction, see the [Topic compaction cookbook](../../cookbooks/compaction)." %} + +For some use cases, however, consumers don't need a complete "image" of the topic log. They may only need a few values to construct a more "shallow" image of the log, perhaps even just the most recent value. For these kinds of use cases Pulsar offers **topic compaction**. When you run compaction on a topic, Pulsar goes through a topic's backlog and removes messages that are *obscured* by later messages, i.e. it goes through the topic on a per-key basis and leaves only the most recent message associated with that key. Review comment: Remove however This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190841826 ## File path: site/docs/latest/getting-started/ConceptsAndArchitecture.md ## @@ -522,18 +541,55 @@ while (true) { To create a reader that will read from the latest available message: ```java -MessageId id = MessageId.latest; -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Readerreader = pulsarClient.newReader() +.topic(topic) +.startMessageId(MessageId.latest) +.create(); ``` To create a reader that will read from some message between earliest and latest: ```java byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); -Reader reader = pulsarClient.createReader(topic, id, new ReaderConfiguration()); +Reader reader = pulsarClient.newReader() +.topic(topic) +.startMessageId(id) +.create(); ``` +## Topic compaction {#compaction} + +Pulsar was built with highly scalable [persistent storage](#persistent-storage) of message data as a primary objective. Pulsar {% popover topics %} enable you to persistently store as many unacknowledged messages as you need while preserving message ordering. By default, Pulsar stores *all* unacknowledged/unprocessed messages produced on a topic. Accumulating many unacknowledged messages on a topic is necessary for many Pulsar use cases but it can also be very time intensive for Pulsar {% popover consumers %} to "rewind" through the entire log of messages. + +{% include admonition.html type="success" content="For a more practical guide to topic compaction, see the [Topic compaction cookbook](../../cookbooks/compaction)." %} + +For some use cases, however, consumers don't need a complete "image" of the topic log. They may only need a few values to construct a more "shallow" image of the log, perhaps even just the most recent value. For these kinds of use cases Pulsar offers **topic compaction**. When you run compaction on a topic, Pulsar goes through a topic's backlog and removes messages that are *obscured* by later messages, i.e. it goes through the topic on a per-key basis and leaves only the most recent message associated with that key. + +Pulsar's topic compaction feature: + +* Can help preserve disk space and allow for much more efficient "rewind" of topic logs +* Applies only to [persistent topics](#persistent-storage) +* Is triggered manually via the command line. See the [Topic compaction cookbook](../../cookbooks/compaction) +* Is conceptually and operationally distinct from [retention and expiry](#message-retention-and-expiry) Review comment: However, it does respect retention. So if retention has removed a message from the message backlog, this message will also be removed from the compacted topic ledger (or rather, it won't be readable). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190832046 ## File path: site/_data/cli/pulsar.yaml ## @@ -65,6 +65,9 @@ commands: options: - flags: -t, --topic description: The Pulsar topic that you would like to compact Review comment: For description of the compact-topic subcommand, add (in a new process) at the end. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190832676 ## File path: site/docs/latest/cookbooks/compaction.md ## @@ -0,0 +1,113 @@ +--- +title: Topic compaction cookbook +tags: [admin, clients, compaction] +--- + +Pulsar's [topic compaction](../../getting-started/ConceptsAndArchitecture#compaction) feature enables you to create **compacted** topics in which older, "obscured" entries are pruned from the topic, allowing for faster reads through the topic's history (which messages are deemed obscured/outdated/irrelevant will depend on your use case). + Review comment: A simple motivating usecase would be helpful here. The one I'm going to use for an upcoming is a stock ticker. Some users will want to see just the latest value for each stock, while some want to see the whole history of how the prices changed over time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #1466: Topic compaction documentation
ivankelly commented on a change in pull request #1466: Topic compaction documentation URL: https://github.com/apache/incubator-pulsar/pull/1466#discussion_r190833665 ## File path: site/docs/latest/cookbooks/compaction.md ## @@ -0,0 +1,113 @@ +--- +title: Topic compaction cookbook +tags: [admin, clients, compaction] +--- + +Pulsar's [topic compaction](../../getting-started/ConceptsAndArchitecture#compaction) feature enables you to create **compacted** topics in which older, "obscured" entries are pruned from the topic, allowing for faster reads through the topic's history (which messages are deemed obscured/outdated/irrelevant will depend on your use case). + +To use compaction: + +* You must manually [trigger](#trigger) compaction using the Pulsar administrative API. This will both run a compaction operation *and* mark the topic as a compacted topic. +* Your {% popover consumers %} must be [configured](#config) to read from compacted topics (or else the messages won't be properly read/processed/acknowledged). + +In Pulsar, topic compaction takes place on a *per-key basis*, meaning that messages are compacted based on their key. For the stock ticker use case, the stock symbol---e.g. `AAPL` or `GOOG`---could serve as the key. + +## When should I use compacted topics? + +The classic example of a topic that could benefit from compaction would be a stock ticker topic through which {% popover consumers %} can access up-to-date values for specific stocks. On a stock ticker topic you only care about the most recent value of each stock; "historical values" don't matter, so there's no need to read through outdated data when processing a topic's messages. For topics where older values are important, for example when you need to process long series of messages in order, compaction is unnecessary and possibly even harmful. Review comment: The same topic can serve both usecases with compaction. We don't delete the old history when we compact. Compaction isn't to save space, but rather to allow clients to catch up quickly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-pulsar] branch asf-site updated: Updated site at revision e249493
This is an automated email from the ASF dual-hosted git repository. mmerli pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git The following commit(s) were added to refs/heads/asf-site by this push: new c7a7db4 Updated site at revision e249493 c7a7db4 is described below commit c7a7db4e444c69d70fc2aa866289ac18a486258a Author: jenkinsAuthorDate: Fri May 25 08:28:27 2018 + Updated site at revision e249493 --- .../docs/latest/adaptors/PulsarSpark/index.html| 8 .../docs/latest/adaptors/PulsarStorm/index.html| 6 +++--- content/docs/latest/admin-api/overview/index.html | 10 - content/docs/latest/admin/Authz/index.html | 20 +- content/docs/latest/clients/Cpp/index.html | 8 content/docs/latest/clients/Java/index.html| 12 +-- content/docs/latest/clients/Python/index.html | 8 content/docs/latest/clients/WebSocket/index.html | 8 .../docs/latest/cookbooks/Encryption/index.html| 6 +++--- .../latest/cookbooks/PartitionedTopics/index.html | 14 ++--- .../latest/cookbooks/RetentionExpiry/index.html| 14 ++--- .../cookbooks/message-deduplication/index.html | 10 - .../docs/latest/cookbooks/message-queue/index.html | 14 ++--- .../docs/latest/deployment/Kubernetes/index.html | 4 ++-- .../docs/latest/deployment/aws-cluster/index.html | 4 ++-- content/docs/latest/deployment/cluster/index.html | 4 ++-- content/docs/latest/deployment/instance/index.html | 4 ++-- .../ConceptsAndArchitecture/index.html | 4 ++-- .../latest/getting-started/LocalCluster/index.html | 4 ++-- .../latest/getting-started/Pulsar-2.0/index.html | 2 +- .../docs/latest/getting-started/docker/index.html | 4 ++-- .../docs/latest/project/BinaryProtocol/index.html | 4 ++-- .../docs/latest/project/SimulationTools/index.html | 2 +- .../docs/latest/project/schema-storage/index.html | 4 ++-- content/docs/latest/reference/CliTools/index.html | 16 +++ content/docs/latest/reference/RestApi/index.html | 4 ++-- content/ja/adaptors/PulsarSpark/index.html | 8 content/ja/adaptors/PulsarStorm/index.html | 6 +++--- content/ja/admin/AdminInterface/index.html | 14 ++--- content/ja/admin/Authz/index.html | 24 +++--- content/ja/admin/ClustersBrokers/index.html| 8 content/ja/admin/PropertiesNamespaces/index.html | 8 content/ja/advanced/PartitionedTopics/index.html | 14 ++--- content/ja/advanced/RetentionExpiry/index.html | 14 ++--- content/ja/clients/Cpp/index.html | 8 content/ja/clients/Java/index.html | 8 content/ja/clients/Python/index.html | 8 content/ja/clients/WebSocket/index.html| 8 content/ja/deployment/InstanceSetup/index.html | 8 content/ja/deployment/Kubernetes/index.html| 4 ++-- .../ConceptsAndArchitecture/index.html | 2 +- content/ja/getting-started/LocalCluster/index.html | 4 ++-- content/ja/project/BinaryProtocol/index.html | 4 ++-- content/ja/project/SimulationTools/index.html | 2 +- content/ja/reference/CliTools/index.html | 20 +- content/ja/reference/RestApi/index.html| 4 ++-- 46 files changed, 187 insertions(+), 187 deletions(-) diff --git a/content/docs/latest/adaptors/PulsarSpark/index.html b/content/docs/latest/adaptors/PulsarSpark/index.html index dfdaf27..7cb1819 100644 --- a/content/docs/latest/adaptors/PulsarSpark/index.html +++ b/content/docs/latest/adaptors/PulsarSpark/index.html @@ -965,9 +965,9 @@ + Spark Streaming Pulsar receiver - Spark Streaming Pulsar receiver @@ -1187,9 +1187,9 @@ + Spark Streaming Pulsar receiver - Spark Streaming Pulsar receiver @@ -1329,8 +1329,6 @@ - - Authentication and authorization in Pulsar @@ -1483,6 +1481,8 @@ + + Using Pulsar as a message queue diff --git a/content/docs/latest/adaptors/PulsarStorm/index.html b/content/docs/latest/adaptors/PulsarStorm/index.html index 27f6b43..e857716 100644 --- a/content/docs/latest/adaptors/PulsarStorm/index.html +++ b/content/docs/latest/adaptors/PulsarStorm/index.html @@ -969,9 +969,9 @@ + Pulsar adaptor for Apache Storm - Pulsar adaptor for
[GitHub] sijie commented on issue #1844: Update pulsar netty/grpc dependencies
sijie commented on issue #1844: Update pulsar netty/grpc dependencies URL: https://github.com/apache/incubator-pulsar/pull/1844#issuecomment-391959885 I am also updating the bk dependencies. and will use 4.7.1 in this PR. so hold on merging this until bk's change is merged. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1844: Update pulsar netty/grpc dependencies
sijie commented on issue #1844: Update pulsar netty/grpc dependencies URL: https://github.com/apache/incubator-pulsar/pull/1844#issuecomment-391959671 @srkukarni @jerrypeng ^^ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie opened a new pull request #1844: Update pulsar netty/grpc dependencies
sijie opened a new pull request #1844: Update pulsar netty/grpc dependencies URL: https://github.com/apache/incubator-pulsar/pull/1844 *Motivation* apache/pulsar#1816 breaks the pulsar functions running in process mode. This because it removes the shading and introduces the conflicts between the netty version that grpc/bk depends and the netty version pulsar depends. the grpc doesn't work which fails the health check on functions *Solution* Update the netty/grpc/protobuf to align the versions to avoid conflicts between versions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services