Re: [ANNOUNCE] Apache Pulsar Go Client 0.7.0 released
Good news, thanks Chris work for this! -- Thanks Xiaolong Ran Chris Kellogg 于2021年11月16日周二 下午1:41写道: > The Apache Pulsar team is proud to announce Apache Pulsar Go Client > version 0.7.0. > > Pulsar is a highly scalable, low latency messaging platform running on > commodity hardware. It provides simple pub-sub semantics over topics, > guaranteed at-least-once delivery of messages, automatic cursor management > for > subscribers, and cross-datacenter replication. > > For Pulsar release details and downloads, visit: > https://github.com/apache/pulsar-client-go/releases/tag/v0.7.0 > > Release Notes are at: > https://github.com/apache/pulsar-client-go/blob/master/CHANGELOG.md > > We would like to thank the contributors that made the release possible. > > Regards, > > The Pulsar Team > >
Re: [PIP-78] Reduce redundant producers from partitioned producer
Dear Pulsar community, I have created a new PR https://github.com/apache/pulsar/pull/12401 for stats aggregation, but I didn't discuss about the wire protocol change. I hope we will discuss it here. Currently, partitioned producer can't aggregate by any key such as cnx, producerId, producerName, and so on. I think we need to add any aggregation system. Therefore, added new aggregation policy as producerName (with client side implementation). New protocol field partial_producer_supported is not used for stats aggregation. It is used for backward compatibility. https://github.com/apache/pulsar/pull/12401/files#diff-f29399fed32e0916cf28452ba71078a3ae5ed77bbaef9f52a925165d8ee66cfdR489 In my understanding, if introduce new stats aggregation key to client side, need a way to determine whether the feature is enabled at client side. For example, whether the producer has specific field or metadata, the version (e.g. protocol version) is greater than threshold, etc. Of course, if we can introduce aggregation feature without adding any new key or implementations from client side, we can support the feature not only new client but also old one. I'm looking forward to your opinions or suggestions to this PR. Regards, -- Yuri Mizushima yumiz...@yahoo-corp.jp On 2021/05/11 14:26, "Yuri Mizushima" wrote: Dear Pulsar Community, > I will submit the next PR about PartitionedTopicStats later. I submitted the next PR for this PIP. If you have any suggestions, please comment to this PR. https://github.com/apache/pulsar/pull/10534 Regards, -- Yuri Mizushima yumiz...@yahoo-corp.jp "Yuri Mizushima" wrote: Dear Pulsar Community, I submitted the PR for this PIP. https://github.com/apache/pulsar/pull/10279 This is a part of implementations. I will submit the next PR about PartitionedTopicStats later. Regards, -- Yuri Mizushima yumiz...@yahoo-corp.jp "Yuri Mizushima" wrote: Sijie, After sending previous mail, I watched meeting recording and understand about authn/authz issue. Therefore, I updated the PIP document. https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer Regards, -- Yuri Mizushima yumiz...@yahoo-corp.jp "Yuri Mizushima" wrote: Sijie, > If the lazy-loading approach sounds attractive to you and you like it, > maybe the next step is to update the PIP, what do you think? I think so too. I will update the PIP after discussing the authn/authz issue. Regards, -- Yuri Mizushima yumiz...@yahoo-corp.jp "Sijie Guo" wrote: Hi Yuri, Regarding the authn/authz issue, @Matteo Merli can probably chime in more about that part. If the lazy-loading approach sounds attractive to you and you like it, maybe the next step is to update the PIP, what do you think? - Sijie On Mon, Feb 8, 2021 at 6:57 PM Yuri Mizushima wrote: > Michael, > > Thank you for your comment! > > > Which Pulsar Clients will benefit from this proposal? > I think that this proposal will be useful to any clients. > In my schedule, if this proposal is accepted then I will implement this > feature to Java client. > If needed, then implement same feature to other clients such as C++, Go, > etc. > > Regards, > -- > Yuri Mizushima > yumiz...@yahoo-corp.jp > > > "Michael Marshall" wrote: > > Hi Yuri and Sijie, > > I definitely like the idea of lazily creating producers as well as > introducing a way to provide custom routing logic. > > Which Pulsar Clients will benefit from this proposal? I’d love to see > this feature in the go client. > > Thanks, > Michael Marshall > > > On Feb 7, 2021, at 9:53 PM, Yuri Mizushima > wrote: > > > > Sijie, > > > > Thank you for sharing! > > > > First, I considered your suggestion. > > I think these implementations sound good. > > > > I think we should consider the State of partitioned producer: Ready, > Connecting, etc. > > Currently, partitioned producer
[ANNOUNCE] Apache Pulsar Go Client 0.7.0 released
The Apache Pulsar team is proud to announce Apache Pulsar Go Client version 0.7.0. Pulsar is a highly scalable, low latency messaging platform running on commodity hardware. It provides simple pub-sub semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication. For Pulsar release details and downloads, visit: https://github.com/apache/pulsar-client-go/releases/tag/v0.7.0 Release Notes are at: https://github.com/apache/pulsar-client-go/blob/master/CHANGELOG.md We would like to thank the contributors that made the release possible. Regards, The Pulsar Team
Re: [DISCUSS] Add remove-clusters command for namespace in pulsar-admin
+1, Penghui On Nov 16, 2021, 9:27 AM +0800, Ruguo Yu , wrote: > Hi Community, > > The tool ` pulsar-admin` supports `set-clusters` and `get-clusters` command > so that we can `set` / `get` replication clusters for a namespace. But it > lacks corresponding `remove-clusters` command to restore to the unset state, > I think it is necessary to add this command to ensure the closed-loop > operation of the replication cluster. > > > > I have created a issue[1] which contains possible implementation details for > this problem, please discuss and give opinion. > > > > Thanks, > > Ruguo Yu > > > > [1] https://github.com/apache/pulsar/issues/12822 > > >
[DISCUSS] Add remove-clusters command for namespace in pulsar-admin
Hi Community, The tool ` pulsar-admin` supports `set-clusters` and `get-clusters` command so that we can `set` / `get` replication clusters for a namespace. But it lacks corresponding `remove-clusters` command to restore to the unset state, I think it is necessary to add this command to ensure the closed-loop operation of the replication cluster. I have created a issue[1] which contains possible implementation details for this problem, please discuss and give opinion. Thanks, Ruguo Yu [1] https://github.com/apache/pulsar/issues/12822
Re: [DISCUSS] Add Pulsar io Pulsar connector
Just did a quick search, it's interesting we don't have a pulsar connector to move data among pulsar clusters. I guess people usually write their own pulsar client to move data around. On Mon, Nov 15, 2021 at 3:11 PM ZhangJian He wrote: > Yes, move data across different pulsar clusters which belongs to different > company or organization > > Thanks > ZhangJian He > > Neng Lu 于2021年11月16日周二 上午2:50写道: > > > Hi, > > > > What's your new connector used for in the customer use cases? > > A `pulsar-io-kafak-connector` is used for moving data between kafka and > > pulsar. > > But in your case, a `pulsar-io-pulsar-connector`, do you mean you want to > > move data across different pulsar clusters? > > > > > > On Mon, Nov 15, 2021 at 6:51 AM ZhangJian He wrote: > > > > > Dear all > > > > > > My team are suggesting some of our customers use pulsar instead of > kafka > > > for their needs. > > > Before, my team used a pulsar-io-kafka-connector, now my team wants to > > use > > > a pulsar-io-to-pulsar-connector server for these customers. > > > > > > And I notice now we don't have a pulsar-io-pulsar-connector. > > > > > > Should I develop a connector? > > > And should the connector be maintained in the pulsar main repo ? > > > > > > IMO, if we dicided to develop a pulsar-io-connector, it's more > reasonable > > > to maintain it in the pulsar main repo. (At least, the > > > pulsar-io-kafka-connector is in main repo) > > > > > > Looking forward to your opinions. > > > > > > > > > Thanks > > > ZhangJian He > > > > > >
Re: [DISCUSS] Add Pulsar io Pulsar connector
Yes, move data across different pulsar clusters which belongs to different company or organization Thanks ZhangJian He Neng Lu 于2021年11月16日周二 上午2:50写道: > Hi, > > What's your new connector used for in the customer use cases? > A `pulsar-io-kafak-connector` is used for moving data between kafka and > pulsar. > But in your case, a `pulsar-io-pulsar-connector`, do you mean you want to > move data across different pulsar clusters? > > > On Mon, Nov 15, 2021 at 6:51 AM ZhangJian He wrote: > > > Dear all > > > > My team are suggesting some of our customers use pulsar instead of kafka > > for their needs. > > Before, my team used a pulsar-io-kafka-connector, now my team wants to > use > > a pulsar-io-to-pulsar-connector server for these customers. > > > > And I notice now we don't have a pulsar-io-pulsar-connector. > > > > Should I develop a connector? > > And should the connector be maintained in the pulsar main repo ? > > > > IMO, if we dicided to develop a pulsar-io-connector, it's more reasonable > > to maintain it in the pulsar main repo. (At least, the > > pulsar-io-kafka-connector is in main repo) > > > > Looking forward to your opinions. > > > > > > Thanks > > ZhangJian He > > >
[GitHub] [pulsar-dotpulsar] dgiannone87 commented on issue #8: Support - Message encryption
dgiannone87 commented on issue #8: URL: https://github.com/apache/pulsar-dotpulsar/issues/8#issuecomment-969390727 Hi @JarrodJ83 - I know this is an old thread, but wonder if you ever took this on. Have a need for using DotPulsar with Pulsar encryption. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [DISCUSS] Add Pulsar io Pulsar connector
Hi, What's your new connector used for in the customer use cases? A `pulsar-io-kafak-connector` is used for moving data between kafka and pulsar. But in your case, a `pulsar-io-pulsar-connector`, do you mean you want to move data across different pulsar clusters? On Mon, Nov 15, 2021 at 6:51 AM ZhangJian He wrote: > Dear all > > My team are suggesting some of our customers use pulsar instead of kafka > for their needs. > Before, my team used a pulsar-io-kafka-connector, now my team wants to use > a pulsar-io-to-pulsar-connector server for these customers. > > And I notice now we don't have a pulsar-io-pulsar-connector. > > Should I develop a connector? > And should the connector be maintained in the pulsar main repo ? > > IMO, if we dicided to develop a pulsar-io-connector, it's more reasonable > to maintain it in the pulsar main repo. (At least, the > pulsar-io-kafka-connector is in main repo) > > Looking forward to your opinions. > > > Thanks > ZhangJian He >
[GitHub] [pulsar-helm-chart] javiramos1 commented on issue #164: 2.7.3 helm chart still use 2.7.2 images
javiramos1 commented on issue #164: URL: https://github.com/apache/pulsar-helm-chart/issues/164#issuecomment-969177812 Hi, the chart is still using 2.7.2. I did change the tags to 2.8.1 but it did not work, the pods were stuck in init state. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[DISCUSS] Add Pulsar io Pulsar connector
Dear all My team are suggesting some of our customers use pulsar instead of kafka for their needs. Before, my team used a pulsar-io-kafka-connector, now my team wants to use a pulsar-io-to-pulsar-connector server for these customers. And I notice now we don't have a pulsar-io-pulsar-connector. Should I develop a connector? And should the connector be maintained in the pulsar main repo ? IMO, if we dicided to develop a pulsar-io-connector, it's more reasonable to maintain it in the pulsar main repo. (At least, the pulsar-io-kafka-connector is in main repo) Looking forward to your opinions. Thanks ZhangJian He
[DISCUSS] Optimize zookeeper client performance for loading amounts of topics
Hi Pulsar Community, I have opened a issue in [1] https://github.com/apache/pulsar/issues/12812 Any suggestions will be appreciated. ## Motivation As described in [2], we are running a pulsar cluster with about a million topics, and 20% percent of brokers could break down at the sametime. Previously in [2], I proposed adding a ratelimiter to protect zk from surging requests. Thanks to Penghui Li and Hang Chen, provides an alternative way to solve this issue with zk multi api, which provides a way to optimize the performance by batching reads or writes. we have done a perf test on zk multi, check it out in [1]. ## Goal Optimize zookeeper client performance for loading amounts of topics. ## API Changes Three new configs in broker.conf - **enableAutoBatchZookeeperOps**, this feature is optional, as it may increase metadata latency with a small number of topics. - **autoBatchZookeeperOpsMaxNum** and **autoBatchZookeeperOpsMaxDelayMills** Just like auto batching parameters in pulsar producer. Limits the max number of ops in one batch and max delay time to wait for a batch. ## Implementation The basic idea of implementation will be add two queue (one for read ops and one for write ops) in PulsarZooKeeperClient, all zk ops will be added to the queue first, and a background thread will batch theses requests and sends to zk server in one "multi op". ## Reject Alternatives Holding [2] for now, to see the result of this performance optimization. [1] https://github.com/apache/pulsar/issues/12812 [2] https://github.com/apache/pulsar/issues/12651 --- Thanks, Haiting Jiang
Re: [DISCUSSION] PIP-104: Add new consumer type: TableView
Matteo, sorry late reply Il giorno gio 14 ott 2021 alle ore 01:40 Matteo Merli < matteo.me...@gmail.com> ha scritto: > https://github.com/apache/pulsar/issues/12356 > > --- Pasted here for quoting convenience --- > > > > > > ## Motivation > > In many use cases, applications are using Pulsar consumers or readers to > fetch > all the updates from a topic and construct a map with the latest value of > each > key for the messages that were received. This is very common when > constructing > a local cache of the data. > > We want to offer support for this access pattern directly in the Pulsar > client > API, as a way to encapsulate the complexities of setting this up. > > > ## Goal > > Provide a view of the topic data in the form of a read-only map that is > constantly updated with the latest version of each key. > > Additionally, let the application specify a listener so that it can perform > a scan of the map and then receive notifications when new messages are > received and applied. > > ## API Changes > > This proposal will only add a new API on the client side. > > A new type of consumer will be added, the `TableView`. > > Example: > > ```java > TableView tableView = pulsarClient.newTableView(Schema.INT32) > .topic(topic) > .create(); > > tableView.get("my-key"); // --> 5 > tableView.get("my-other-key"); // --> 7 > ``` > > When a `TableView` instance is created, it will be guaranteed to already > have > the latest value for each key, for the current time. > > ### API additions > > ```java > interface PulsarClient { > // > TableViewBuilder newTableView(Schema schema); > } > > interface TableViewBuilder { > TableViewBuilder loadConf(Map config); > TableView create() throws PulsarClientException; > CompletableFuture> createAsync(); > TableViewBuilder topic(String topic); > TableViewBuilder autoUpdatePartitionsInterval(int interval, > TimeUnit unit); > } > > interface TableView extends Closeable { > > // Similar methods as java.util.Map > int size(); > boolean isEmpty(); > boolean containsKey(String key); > T get(String key); > Set> entrySet(); > Set keySet(); > Collection values(); > void forEach(BiConsumer action); > > /** > * Performs the given action for each entry in this map until all > entries > * have been processed or the action throws an exception. > * > * When all the entries have been processed, the action will be invoked > * for every new update that is received from the topic. > * > * @param action The action to be performed for each entry > */ > void forEachAndListen(BiConsumer action); > > /** > * Close the table view and releases resources allocated. > * > * @return a future that can used to track when the table view has > been closed > */ > CompletableFuture closeAsync(); > } > ``` > > ## Implementation > > The `TableView` will be implemented using multiple `Reader` instances, one > per each partition and will always specify to read starting from the > compacted > view of the topic. > > The creation time of a table view can be controlled by configuring the > topic compaction policies for the given topic or namespace. More frequent > compaction can lead to very short startup times, as in less data will be > replayed to reconstruct the `TableView` of the topic. > I think that this feature will add value to the Pulsar. Only one point. Do we need to add this to the Pulsar Client API or can we just simply add this as an additional library, maintained in pulsar-adapters ? Adding this to the Pulsar Client will allow users to discover this feature more easily but on the other side it would make sense to add this there if we need some additional support. I am thinking about doing the same way, that is to store the code in pulsar-adapters and not in the main Pulsar repo for my other proposal about shared State Objects (I will give a PIP name to it soon :-) ) . Enrico > > > > -- > Matteo Merli > >
Re: Enhancing the Pulsar Client with some a Shared State API
Jack, Il giorno lun 15 nov 2021 alle ore 10:37 Jack Vanlightly ha scritto: > Hi Enrico, > > This is interesting. I'm thinking of the KTables part of Kafka Streams and > also Raft state machines. > > You could build something equivalent to a Raft state machine on top of > Pulsar where WaitForExclusive acts as leader election and the topic as the > log. Yes, that's the idea. > I would be interested in a PIP, that also includes considerations for > things like: > >- how applications can persist an applied index to avoid having to >rebuild state from scratch >- how to manage topic size such that it isn't easy for users to end up >with inconsistent views of the state. Things like checkpointing, may be >having more control over topic data retention. > > Also how does this compare to PIP-104 Table View? ( > > http://mail-archives.apache.org/mod_mbox/pulsar-dev/202110.mbox/%3cCA+JmKXbGf3CMy5dxypX=so-gyxrdh0pmttkhy3wnnm4azgw...@mail.gmail.com%3e > ) > This is partially related but the intent is different. I don't want to create a "database" using Pulsar (like ksqlDB is for Kafka), that would be interesting but that's not the scope of this feature. I saw few times that you need some shared state among the instances of your application, and unfortunately you end up in adding some additional component, like a small external SQL DB (like PostGRE) or something like HerdDB (that is still a SQL DB but can store data on BookKeeper and also can run inside the same JVM of the applications). We already have something like this to manage Pulsar Functions assignments, or in Kafka Connect Adapter to emulate consumer group management. I came to implement this in a bunch of other projects. We need in Pulsar something that is very lightweight but that allows you to synchronize some state among your clients. A similar tool is present in Pravega.io with the State Synchronizers abstraction https://pravega.io/docs/v0.7.1/state-synchronizer-design/ This is why I don't think that initially we need some mechanism to store the data locally. But that's something that we can provide, one step at a time. > > Finally, have we looked at what the market is asking for? What kind of > product strategy do we have regarding Pulsar? AFAIK The is no "strategy", the project is growing thanks to the contributions of users > This kind of thing could end > up being highly valuable but also a big investment and it would be good to > know how the community is steering Pulsar to make it the most relevant it > can be. > I don't want to add a "Pulsar backed Database", in order to do so it would be better to spin off a new project (or subproject) Enrico > > Jack > > > On Wed, Nov 10, 2021 at 5:18 PM Enrico Olivelli > wrote: > > > [ External sender. Exercise caution. ] > > > > Hello, > > With Pulsar 2.8.0 we have the Exclusive Producer, which allows you to use > > Pulsar as a consistent write-ahead-log for replicated state machines. > > > > It already happened to me a couple of times to need to build some > > replicated state storage on top of Pulsar and I would like to share some > > thoughts. > > > > We can provide some simple built-in mechanism to share some "state" > across > > several instances of an application without adding some Database or other > > components to the architecture: > > - metadata > > - dynamic configuration > > - task assignments > > - key-value database > > > > In general we can provide an API to handle a shared distributed Java > > Object: each client can access the Object and mutate the State, > > ensuring consistency. > > > > I have drafted a small API to build such an abstraction: > > > > public interface PulsarDatabase { > > > > /** > > * Read from the current state. > > * @param reader a function that accesses current state and returns a > > value > > * @param latest ensure that the value is the latest > > * @return an handle to the result of the operation > > */ > > CompletableFuture read(Function reader, boolean latest); > > > > /* > > * Execute a mutation on the state. > > * The operationsGenerator generates a list of mutations to be > > * written to the log, the operationApplier function > > * is executed to mutate the state after each successful write > > * to the log. Finally the reader function can read from > > * the current status before releasing the write lock. > > * @param operationsGenerator generates a list of mutations > > * @param operationApplier apply each mutation to the current state > > * @param reader read from the status while inside the write lock > > * @param the returned data type > > * @param the operation type > > * @return a handle to the completion of the operation > > */ > > CompletableFuture write(Function> > > operationsGenerator, > > Function reader); > > } > > > > Using this simple abstraction it is
Re: Enhancing the Pulsar Client with some a Shared State API
Hi Enrico, This is interesting. I'm thinking of the KTables part of Kafka Streams and also Raft state machines. You could build something equivalent to a Raft state machine on top of Pulsar where WaitForExclusive acts as leader election and the topic as the log. I would be interested in a PIP, that also includes considerations for things like: - how applications can persist an applied index to avoid having to rebuild state from scratch - how to manage topic size such that it isn't easy for users to end up with inconsistent views of the state. Things like checkpointing, may be having more control over topic data retention. Also how does this compare to PIP-104 Table View? ( http://mail-archives.apache.org/mod_mbox/pulsar-dev/202110.mbox/%3cCA+JmKXbGf3CMy5dxypX=so-gyxrdh0pmttkhy3wnnm4azgw...@mail.gmail.com%3e ) Finally, have we looked at what the market is asking for? What kind of product strategy do we have regarding Pulsar? This kind of thing could end up being highly valuable but also a big investment and it would be good to know how the community is steering Pulsar to make it the most relevant it can be. Jack On Wed, Nov 10, 2021 at 5:18 PM Enrico Olivelli wrote: > [ External sender. Exercise caution. ] > > Hello, > With Pulsar 2.8.0 we have the Exclusive Producer, which allows you to use > Pulsar as a consistent write-ahead-log for replicated state machines. > > It already happened to me a couple of times to need to build some > replicated state storage on top of Pulsar and I would like to share some > thoughts. > > We can provide some simple built-in mechanism to share some "state" across > several instances of an application without adding some Database or other > components to the architecture: > - metadata > - dynamic configuration > - task assignments > - key-value database > > In general we can provide an API to handle a shared distributed Java > Object: each client can access the Object and mutate the State, > ensuring consistency. > > I have drafted a small API to build such an abstraction: > > public interface PulsarDatabase { > > /** > * Read from the current state. > * @param reader a function that accesses current state and returns a > value > * @param latest ensure that the value is the latest > * @return an handle to the result of the operation > */ > CompletableFuture read(Function reader, boolean latest); > > /* > * Execute a mutation on the state. > * The operationsGenerator generates a list of mutations to be > * written to the log, the operationApplier function > * is executed to mutate the state after each successful write > * to the log. Finally the reader function can read from > * the current status before releasing the write lock. > * @param operationsGenerator generates a list of mutations > * @param operationApplier apply each mutation to the current state > * @param reader read from the status while inside the write lock > * @param the returned data type > * @param the operation type > * @return a handle to the completion of the operation > */ > CompletableFuture write(Function> > operationsGenerator, > Function reader); > } > > Using this simple abstraction it is easy to build for instance a > distributed Java "Map" like this > > https://github.com/eolivelli/pulsar-db/blob/main/src/main/java/org/apache/pulsar/db/PulsarMap.java > > > I believe that we should add this feature to the Pulsar Client API, > maybe we can start by adding this in the pulsar-adapters module as it can > be loosely coupled with the core Pulsar Client > > Building distributed data structures on top of that API is simple, > but the underlying implementation of the core APi is not straightforward, > because there are many > edge cases to deal with. > > If we provide some recipes that are available out-of-the-box we will > unleash the secret power > of Exclusive producer and we will allow more applications to migrate to > Pulsar or to choose Pulsar as storage backbone. > > You can find the code here https://github.com/eolivelli/pulsar-db, it is > only a proof-of-concept, but it is already usable. > > If there is an interest in this I will be happy to draft a PIP > and also to send the implementation to the pulsar-adapters repository. > > Best regards > > Enrico >