Re: [ANNOUNCE] Yunze Xu as a new PMC member in Apache Pulsar

2023-01-02 Thread Qiang Huang
Congratulations!!!  Yunze


Jun Ma  于2022年12月31日周六 17:26写道:

> Congrats and well deserved!
> 
> From: Dave Fisher 
> Sent: Saturday, December 31, 2022 1:52
> To: dev@pulsar.apache.org 
> Subject: Re: [ANNOUNCE] Yunze Xu as a new PMC member in Apache Pulsar
>
> Congratulations Yunze! Well deserved!
>
> All the best,
> Dave
>
>
> > On Dec 29, 2022, at 4:42 AM, Haiting Jiang 
> wrote:
> >
> > Hi all,
> >
> > The Apache Pulsar Project Management Committee (PMC) has invited Yunze Xu
> > (https://github.com/BewareMyPower) as a member of the PMC and we are
> > pleased to announce that he has accepted.
> >
> > He is very active in the community in the past few years and made a lot
> of great contributions.
> >
> > Welcome Yunze to the Apache Pulsar PMC.
> >
> > Best Regards,
> > Haiting Jiang on behalf of the Pulsar PMC
>
>

-- 
BR,
Qiang Huang


Re: [DISCUSS] PIP-236: Upload AUTO_CONSUME SchemaType to Broker

2023-01-02 Thread Yunze Xu
What I am concerned about is that if the old clients with other
schemas (i.e. schema is neither null nor AUTO_CONSUME) subscribe to
the topic with AUTO_CONSUME schema, what will happen? What's the
schema compatibility check rule on a topic with AUTO_CONSUME schema?

Thanks,
Yunze

On Mon, Jan 2, 2023 at 12:38 AM SiNan Liu  wrote:
>
> 1.Schema.Type and org.apache.pulsar.common.schema.SchemaType value should
> be the same.
> 2.These changes do not affect produce and are only affect consumer
> subscribe behavior.
> 3.backward compatibility:
> (1)In org.apache.pulsar.broker.service.ServerCnx#handleSubscribe.
> if (schema != null && schema.getType() != SchemaType.AUTO_CONSUME) {
> return topic.addSchemaIfIdleOrCheckCompatible(schema)
> .thenCompose(v -> topic.subscribe(option));
> } else {
> return topic.subscribe(option);
> }
> For the older pulsar client, the schema is null if AUTO_CONSUME consumer
> subscribe to the Topic.
> For the new pulsar client, if AUTO_CONSUME consumer subscribe the Topic,
> then schema is not null and schema.getType() = SchemaType.AUTO_CONSUME.
> Both new and old pulsar clients consume the topic, will return topic.
> subscribe(option).
>
> (2)In org.apache.pulsar.broker.service.persistent.PersistentTopic
> #addSchemaIfIdleOrCheckCompatible.
> @Override
> public CompletableFuture addSchemaIfIdleOrCheckCompatible(SchemaData
> schema) {
> return hasSchema().thenCompose((hasSchema) -> {
> int numActiveConsumersWithoutAutoSchema = subscriptions.values().stream()
> .mapToInt(subscription -> subscription.getConsumers().stream()
> .filter(consumer -> consumer.getSchemaType() != SchemaType.AUTO_CONSUME)
> .toList().size())
> .sum();
> if (hasSchema
> || (!producers.isEmpty())
> || (numActiveConsumersWithoutAutoSchema != 0)
> || (ledger.getTotalSize() != 0)) {
> return checkSchemaCompatibleForConsumer(schema);
> } else {
> return addSchema(schema).thenCompose(schemaVersion ->
> CompletableFuture.completedFuture(null));
> }
> });
> }
> Only in one case will there be a bug.
> First, the old pulsar client consume the empty topic, the consumer schema
> is AUTO_CONSUME, and then whether the new or old pulsar client consume(i.e.
> schema is AVRO) the topic.
> The broker will return the error message as IncompatibleSchemaException ("
> Topic does not have a schema to check "). The bug at issue17354 is not
> fixed in this case.
> All the other cases will be normal.
>
> Yunze Xu  于2022年12月31日周六 20:23写道:
>
> > Defining `AutoConsume` as -3 is somehow strange. Could you clarify if
> > backward compatibility is guaranteed? i.e. if the new Pulsar client
> > uploaded the AUTO_CONSUME schema to the broker, can the old Pulsar
> > clients produce or consume the same topic anymore?
> >
> > Thanks,
> > Yunze
> >
> > On Fri, Dec 30, 2022 at 11:32 PM 思楠刘  wrote:
> > >
> > > Hi all,
> > >
> > > I made a PIP to discuss: https://github.com/apache/pulsar/issues/19113.
> > >
> > > Thanks,
> > > Sinan
> >


Re: [DISCUSS] PIP-232: Introduce thread monitor to check if thread is blocked for long time.

2023-01-02 Thread Lari Hotari
This is an interesting proposal. However, I'd suggest changes to the current 
proposal.

I think that the current proposal is too invasive for the Pulsar code base. 
"Introduce thread monitor to check if thread is blocking for long time." seems 
to mean multiple things. 
When looking at the PR, it seems to be a solution for detecting long running 
tasks. Just FYI, that Bookkeeper has a solution for this in it's 
OrderedExecutor with a setting called enableTaskExecutionStats=true . I'm not 
saying that it would be the preferred way to implement it.

If the goal is to detect actual blocking code that is run with threads that 
should run only non-blocking code, there's a better tool called Reactor 
BlockHound (https://github.com/reactor/BlockHound) for that purpose. 
For actual profiling of the code base, Java Flight Recorder and Async Profiler 
are better solutions.

It seems that one part of the problem is that there aren't metrics for the 
thread pools. As an alternative implementation for the proposed PIP-232, I'd 
suggest that basic metrics (backlog / queue size, active thread count, number 
of executed tasks, etc)  are added for the thread pools. For example, 
Micrometer contains a decorator for many thread pool implementations, 
https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/ExecutorServiceMetrics.java
 . A similar solution would be very useful in Pulsar for adding the thread pool 
metrics.

Tracking individual tasks requires more resources, and that's why I'd suggest 
adding the basic metrics and making them enabled by default. Some more advanced 
metrics would be useful, such as tracking the thread pool queue waiting time. 
Adding a low overhead thread pool queue waiting time could be done with a 
sampling approach. The benefit of that is that there won't be a need to wrap 
all tasks that are executed. There would be several ways to implement the queue 
waiting time metric. 

I assume that "blocking" itself might not be the problem and therefore having 
basic metrics (backlog size, active threads, executed tasks counter, failed 
tasks counter) for the thread pools is more essential. There's a lot of good 
things about the PIP-232 proposal and I believe that iterating on the ideas 
will propose a good outcome.

-Lari

On 2022/12/19 12:17:09 adobewjl wrote:
> Hello pulsar community,
> I've opened `PIP-232: Introduce thread monitor to check if thread is blocked 
> for long time.` to discuss.
> For more details, please read the PIP at 
> https://github.com/apache/pulsar/issues/18985
> I'm looking forward to hearing what you think. 
> Also the demo PR link at https://github.com/apache/pulsar/pull/18958


[VOTE] Pulsar Release 2.11.0 Candidate-5

2023-01-02 Thread guo jiwei
This is the fifth release candidate for Apache Pulsar, version 2.11.0.

This release contains 1617 commits by 61 contributors.
https://github.com/apache/pulsar/compare/v2.10.2...v2.11.0-candidate-5

CI for this release candidate
https://github.com/Technoboy-/pulsar/pull/26

*** Please download, test and vote on this release. This vote will stay open
for at least 72 hours ***

Note that we are voting upon the source (tag), binaries are provided for
convenience.

Source and binary files:
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.11.0-candidate-5

SHA-512 checksums:

5bf77f600ac23c7ec72696655a49801e0755d67fb4c59c7ab30e9449b061aacef61ca227f106e6b7f5f490fc2ac57412ac2af3ff064ba1e6a31c162d4ca2ba6d

 ./apache-pulsar-2.11.0-bin.tar.gz

0a3cab04b4a95ced3745d8b41fc3fa2632c91f77cb10e5c7e008b09ba2f279fc14de109626a44a9482b5354c5fa17bbc07ee369a32a252e358298e494c0fc1cb

 ./apache-pulsar-2.11.0-src.tar.gz

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachepulsar-1203/

The tag to be voted upon:
v2.11.0-candidate-5 (97ee1a114aa1e5df05d86a9dd47f7fe7a2194211)
https://github.com/apache/pulsar/releases/tag/v2.11.0-candidate-5

Pulsar's KEYS file containing PGP keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/pulsar/KEYS

https://hub.docker.com/layers/mattison/pulsar-all/2.11.0-rc5/images/sha256-548a518bb18890ddf8e48f0bad62695cfc4f743b9e1223ca45b2e03dba6e9149
https://hub.docker.com/layers/mattison/pulsar/2.11.0-rc5/images/sha256-3b755fb67d49abeb7ab6a76b7123cc474375e3881526db26f43c8cfccdaa3cf6

Please download the source package, and follow the
release-candidate-validation doc to build
and run the Pulsar standalone service.
https://pulsar.apache.org/contribute/validate-release-candidate

Since the metadata store is changed from ZK to RocksDB, the verification of
the `stateful functions` needs to set the parameter "export
PULSAR_STANDALONE_USE_ZOOKEEPER=1"



Regards
Jiwei Guo (Tboy)


Re: [DISCUSS] PIP-234: Support using shared thread pool across multiple Pulsar client instance

2023-01-02 Thread Lari Hotari
+1 to Enrico's suggestion about the API to allow sharing resources among 
multiple clients.

I think this could be a good way to hide the implementation details about event 
loop groups and prevent exposing Netty classes in the API. 

btw. The timer instance that would need to be shared besides the event loop 
instances. (the shared timer feature was added to PulsarClientImpl in 
https://github.com/apache/pulsar/pull/9802). This could also be handled as an 
internal detail in the PulsarClientGroup shared resources solution.

-Lari


On 2022/12/27 11:11:34 Enrico Olivelli wrote:
> I generally support this proposal,
> this is a problem we have in the Proxy and I have seen it on
> applications that need to connect to multiple different tenants
> and they need different authentication parameters, so they have to
> create many PulsarClient instances.
> 
> I have a suggestion:
> 
> Exposing all the internals is a good idea for very advanced users,
> but I believe that we should provide some simpler support.
> 
> We should have an API to allow sharing resources among multiple
> clients without entering the details.
> 
> Interface PulsarClientGroup {
>... put here all the sharable things in the current version...
> }
> 
> PulsarClient client = newClient().
>  .withSharedResources(pulsarClientGroup)
>  ...
> 
> 
> I think that having a PulsarClientGroup is a good choice for future evolutions
> because the internal thread pools may change: removed/added/change the 
> purpose.
> 
> If we require users to deal with all the possible sharable resources
> then we have a few risks:
> - people can "forget" to share some resources
> - upgrading the client may lead to not taking into account some new
> "shareable" resources
> 
> This is way I believe that we should provide an opaque
> 
> Enrico
> 
> Il giorno mar 27 dic 2022 alle ore 11:07 PengHui Li
>  ha scritto:
> >
> > Hi all,
> >
> > As discussed at
> > https://lists.apache.org/thread/5obfm17g58n3dnbzyxg57vokgmwyp6hx
> > I have created this proposal to support shared thread pool across multiple
> > client instances
> > Here is the proposal link https://github.com/apache/pulsar/issues/19074
> >
> > Please help take a look, and look forward to your suggestions.
> >
> > Thanks,
> > Penghui
> >
> > --
> >
> > ### Motivation
> >
> > The Pulsar client mainly has three thread pools that cooperate with each
> > other to complete the message publishing and consumption of messages.
> >
> > - IO threads - Used for handling network packets from the broker
> > - Internal threads - Used for handling internal tasks such as moving the
> > received messages to the internal receiver queue and pulling out the
> > message from the receiver queue to return to users. And the Java client is
> > optimized by the lock-free principle; each consumer will use a pinned
> > internal thread to reduce the lock overhead.
> > - External threads - Used by the message listener
> >
> > All the above thread pools will be created automatically after a Pulsar
> > client instance has been created.
> >
> > But for some cases, users need to create multiple Pulsar client instances
> > in a JVM process due to different authentications or others. Each client
> > will have exclusive thread pools, which will cause unreasonable thread
> > usage, waste memory, and potential performance degradation.
> >
> > It is not a serious problem for previous releases with the default
> > configurations because the thread pool will only have 1 thread by default.
> > But it also doesn't make sense that we only have one thread for each thread
> > pool. We have discussed this part under this [thread](
> > https://lists.apache.org/thread/5obfm17g58n3dnbzyxg57vokgmwyp6hx)
> >
> > So this proposal will provide a new possibility for users that require
> > multiple Pulsar client instances in one JVM process to use the shared
> > thread pools across multiple Pulsar client instances.
> >
> > ### Goal
> >
> > Provide public API to use the shared thread pool across multiple Pulsar
> > client instances in one JVM process
> >
> > - IO threads
> > - Internal threads
> > - External threads
> >
> > BTW, we already have such an ability internally. It was just hidden for
> > users. Please take a look at #12037 and #13839 to get more details.
> >
> > ### API Changes
> >
> > The following APIs will be introduced to the Java Client when creating a
> > Client instance
> >
> > ```java
> > PulsarClient.builder()
> >   .eventLoopGroup(ioEventLoopGroup)
> >   .internalExecutorProvider(sharedInternalExecutorProvider)
> >   .externalExecutorProvider(sharedExternalExecutorProvider)
> >   .scheduledExecutorProvider(sharedScheduledExecutorProvider)
> > ```
>