Re: [ANNOUNCE] Yunze Xu as a new PMC member in Apache Pulsar
Congratulations!!! Yunze Jun Ma 于2022年12月31日周六 17:26写道: > Congrats and well deserved! > > From: Dave Fisher > Sent: Saturday, December 31, 2022 1:52 > To: dev@pulsar.apache.org > Subject: Re: [ANNOUNCE] Yunze Xu as a new PMC member in Apache Pulsar > > Congratulations Yunze! Well deserved! > > All the best, > Dave > > > > On Dec 29, 2022, at 4:42 AM, Haiting Jiang > wrote: > > > > Hi all, > > > > The Apache Pulsar Project Management Committee (PMC) has invited Yunze Xu > > (https://github.com/BewareMyPower) as a member of the PMC and we are > > pleased to announce that he has accepted. > > > > He is very active in the community in the past few years and made a lot > of great contributions. > > > > Welcome Yunze to the Apache Pulsar PMC. > > > > Best Regards, > > Haiting Jiang on behalf of the Pulsar PMC > > -- BR, Qiang Huang
Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker
What I am concerned about is that if the old clients with other schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe to the topic with AUTO_CONSUME schema, what will happen? What's the schema compatibility check rule on a topic with AUTO_CONSUME schema? Thanks, Yunze On Mon, Jan 2, 2023 at 12:38 AM SiNan Liu wrote: > > 1.Schema.Type and org.apache.pulsar.common.schema.SchemaType value should > be the same. > 2.These changes do not affect produce and are only affect consumer > subscribe behavior. > 3.backward compatibility: > (1)In org.apache.pulsar.broker.service.ServerCnx#handleSubscribe. > if (schema != null && schema.getType() != SchemaType.AUTO_CONSUME) { > return topic.addSchemaIfIdleOrCheckCompatible(schema) > .thenCompose(v -> topic.subscribe(option)); > } else { > return topic.subscribe(option); > } > For the older pulsar client, the schema is null if AUTO_CONSUME consumer > subscribe to the Topic. > For the new pulsar client, if AUTO_CONSUME consumer subscribe the Topic, > then schema is not null and schema.getType() = SchemaType.AUTO_CONSUME. > Both new and old pulsar clients consume the topic, will return topic. > subscribe(option). > > (2)In org.apache.pulsar.broker.service.persistent.PersistentTopic > #addSchemaIfIdleOrCheckCompatible. > @Override > public CompletableFuture addSchemaIfIdleOrCheckCompatible(SchemaData > schema) { > return hasSchema().thenCompose((hasSchema) -> { > int numActiveConsumersWithoutAutoSchema = subscriptions.values().stream() > .mapToInt(subscription -> subscription.getConsumers().stream() > .filter(consumer -> consumer.getSchemaType() != SchemaType.AUTO_CONSUME) > .toList().size()) > .sum(); > if (hasSchema > || (!producers.isEmpty()) > || (numActiveConsumersWithoutAutoSchema != 0) > || (ledger.getTotalSize() != 0)) { > return checkSchemaCompatibleForConsumer(schema); > } else { > return addSchema(schema).thenCompose(schemaVersion -> > CompletableFuture.completedFuture(null)); > } > }); > } > Only in one case will there be a bug. > First, the old pulsar client consume the empty topic, the consumer schema > is AUTO_CONSUME, and then whether the new or old pulsar client consume(i.e. > schema is AVRO) the topic. > The broker will return the error message as IncompatibleSchemaException (" > Topic does not have a schema to check "). The bug at issue17354 is not > fixed in this case. > All the other cases will be normal. > > Yunze Xu 于2022年12月31日周六 20:23写道: > > > Defining `AutoConsume` as -3 is somehow strange. Could you clarify if > > backward compatibility is guaranteed? i.e. if the new Pulsar client > > uploaded the AUTO_CONSUME schema to the broker, can the old Pulsar > > clients produce or consume the same topic anymore? > > > > Thanks, > > Yunze > > > > On Fri, Dec 30, 2022 at 11:32 PM 思楠刘 wrote: > > > > > > Hi all, > > > > > > I made a PIP to discuss: https://github.com/apache/pulsar/issues/19113. > > > > > > Thanks, > > > Sinan > >
Re: [DISCUSS] PIP-232: Introduce thread monitor to check if thread is blocked for long time.
This is an interesting proposal. However, I'd suggest changes to the current proposal. I think that the current proposal is too invasive for the Pulsar code base. "Introduce thread monitor to check if thread is blocking for long time." seems to mean multiple things. When looking at the PR, it seems to be a solution for detecting long running tasks. Just FYI, that Bookkeeper has a solution for this in it's OrderedExecutor with a setting called enableTaskExecutionStats=true . I'm not saying that it would be the preferred way to implement it. If the goal is to detect actual blocking code that is run with threads that should run only non-blocking code, there's a better tool called Reactor BlockHound (https://github.com/reactor/BlockHound) for that purpose. For actual profiling of the code base, Java Flight Recorder and Async Profiler are better solutions. It seems that one part of the problem is that there aren't metrics for the thread pools. As an alternative implementation for the proposed PIP-232, I'd suggest that basic metrics (backlog / queue size, active thread count, number of executed tasks, etc) are added for the thread pools. For example, Micrometer contains a decorator for many thread pool implementations, https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/ExecutorServiceMetrics.java . A similar solution would be very useful in Pulsar for adding the thread pool metrics. Tracking individual tasks requires more resources, and that's why I'd suggest adding the basic metrics and making them enabled by default. Some more advanced metrics would be useful, such as tracking the thread pool queue waiting time. Adding a low overhead thread pool queue waiting time could be done with a sampling approach. The benefit of that is that there won't be a need to wrap all tasks that are executed. There would be several ways to implement the queue waiting time metric. I assume that "blocking" itself might not be the problem and therefore having basic metrics (backlog size, active threads, executed tasks counter, failed tasks counter) for the thread pools is more essential. There's a lot of good things about the PIP-232 proposal and I believe that iterating on the ideas will propose a good outcome. -Lari On 2022/12/19 12:17:09 adobewjl wrote: > Hello pulsar community, > I've opened `PIP-232: Introduce thread monitor to check if thread is blocked > for long time.` to discuss. > For more details, please read the PIP at > https://github.com/apache/pulsar/issues/18985 > I'm looking forward to hearing what you think. > Also the demo PR link at https://github.com/apache/pulsar/pull/18958
[VOTE] Pulsar Release 2.11.0 Candidate-5
This is the fifth release candidate for Apache Pulsar, version 2.11.0. This release contains 1617 commits by 61 contributors. https://github.com/apache/pulsar/compare/v2.10.2...v2.11.0-candidate-5 CI for this release candidate https://github.com/Technoboy-/pulsar/pull/26 *** Please download, test and vote on this release. This vote will stay open for at least 72 hours *** Note that we are voting upon the source (tag), binaries are provided for convenience. Source and binary files: https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.11.0-candidate-5 SHA-512 checksums: 5bf77f600ac23c7ec72696655a49801e0755d67fb4c59c7ab30e9449b061aacef61ca227f106e6b7f5f490fc2ac57412ac2af3ff064ba1e6a31c162d4ca2ba6d ./apache-pulsar-2.11.0-bin.tar.gz 0a3cab04b4a95ced3745d8b41fc3fa2632c91f77cb10e5c7e008b09ba2f279fc14de109626a44a9482b5354c5fa17bbc07ee369a32a252e358298e494c0fc1cb ./apache-pulsar-2.11.0-src.tar.gz Maven staging repo: https://repository.apache.org/content/repositories/orgapachepulsar-1203/ The tag to be voted upon: v2.11.0-candidate-5 (97ee1a114aa1e5df05d86a9dd47f7fe7a2194211) https://github.com/apache/pulsar/releases/tag/v2.11.0-candidate-5 Pulsar's KEYS file containing PGP keys we use to sign the release: https://dist.apache.org/repos/dist/dev/pulsar/KEYS https://hub.docker.com/layers/mattison/pulsar-all/2.11.0-rc5/images/sha256-548a518bb18890ddf8e48f0bad62695cfc4f743b9e1223ca45b2e03dba6e9149 https://hub.docker.com/layers/mattison/pulsar/2.11.0-rc5/images/sha256-3b755fb67d49abeb7ab6a76b7123cc474375e3881526db26f43c8cfccdaa3cf6 Please download the source package, and follow the release-candidate-validation doc to build and run the Pulsar standalone service. https://pulsar.apache.org/contribute/validate-release-candidate Since the metadata store is changed from ZK to RocksDB, the verification of the `stateful functions` needs to set the parameter "export PULSAR_STANDALONE_USE_ZOOKEEPER=1" Regards Jiwei Guo (Tboy)
Re: [DISCUSS] PIP-234: Support using shared thread pool across multiple Pulsar client instance
+1 to Enrico's suggestion about the API to allow sharing resources among multiple clients. I think this could be a good way to hide the implementation details about event loop groups and prevent exposing Netty classes in the API. btw. The timer instance that would need to be shared besides the event loop instances. (the shared timer feature was added to PulsarClientImpl in https://github.com/apache/pulsar/pull/9802). This could also be handled as an internal detail in the PulsarClientGroup shared resources solution. -Lari On 2022/12/27 11:11:34 Enrico Olivelli wrote: > I generally support this proposal, > this is a problem we have in the Proxy and I have seen it on > applications that need to connect to multiple different tenants > and they need different authentication parameters, so they have to > create many PulsarClient instances. > > I have a suggestion: > > Exposing all the internals is a good idea for very advanced users, > but I believe that we should provide some simpler support. > > We should have an API to allow sharing resources among multiple > clients without entering the details. > > Interface PulsarClientGroup { >... put here all the sharable things in the current version... > } > > PulsarClient client = newClient(). > .withSharedResources(pulsarClientGroup) > ... > > > I think that having a PulsarClientGroup is a good choice for future evolutions > because the internal thread pools may change: removed/added/change the > purpose. > > If we require users to deal with all the possible sharable resources > then we have a few risks: > - people can "forget" to share some resources > - upgrading the client may lead to not taking into account some new > "shareable" resources > > This is way I believe that we should provide an opaque > > Enrico > > Il giorno mar 27 dic 2022 alle ore 11:07 PengHui Li > ha scritto: > > > > Hi all, > > > > As discussed at > > https://lists.apache.org/thread/5obfm17g58n3dnbzyxg57vokgmwyp6hx > > I have created this proposal to support shared thread pool across multiple > > client instances > > Here is the proposal link https://github.com/apache/pulsar/issues/19074 > > > > Please help take a look, and look forward to your suggestions. > > > > Thanks, > > Penghui > > > > -- > > > > ### Motivation > > > > The Pulsar client mainly has three thread pools that cooperate with each > > other to complete the message publishing and consumption of messages. > > > > - IO threads - Used for handling network packets from the broker > > - Internal threads - Used for handling internal tasks such as moving the > > received messages to the internal receiver queue and pulling out the > > message from the receiver queue to return to users. And the Java client is > > optimized by the lock-free principle; each consumer will use a pinned > > internal thread to reduce the lock overhead. > > - External threads - Used by the message listener > > > > All the above thread pools will be created automatically after a Pulsar > > client instance has been created. > > > > But for some cases, users need to create multiple Pulsar client instances > > in a JVM process due to different authentications or others. Each client > > will have exclusive thread pools, which will cause unreasonable thread > > usage, waste memory, and potential performance degradation. > > > > It is not a serious problem for previous releases with the default > > configurations because the thread pool will only have 1 thread by default. > > But it also doesn't make sense that we only have one thread for each thread > > pool. We have discussed this part under this [thread]( > > https://lists.apache.org/thread/5obfm17g58n3dnbzyxg57vokgmwyp6hx) > > > > So this proposal will provide a new possibility for users that require > > multiple Pulsar client instances in one JVM process to use the shared > > thread pools across multiple Pulsar client instances. > > > > ### Goal > > > > Provide public API to use the shared thread pool across multiple Pulsar > > client instances in one JVM process > > > > - IO threads > > - Internal threads > > - External threads > > > > BTW, we already have such an ability internally. It was just hidden for > > users. Please take a look at #12037 and #13839 to get more details. > > > > ### API Changes > > > > The following APIs will be introduced to the Java Client when creating a > > Client instance > > > > ```java > > PulsarClient.builder() > > .eventLoopGroup(ioEventLoopGroup) > > .internalExecutorProvider(sharedInternalExecutorProvider) > > .externalExecutorProvider(sharedExternalExecutorProvider) > > .scheduledExecutorProvider(sharedScheduledExecutorProvider) > > ``` >