heesung-sn opened a new pull request, #19341:
URL: https://github.com/apache/pulsar/pull/19341
<!--
### Contribution Checklist
- PR title format should be *[type][component] summary*. For details, see
*[Guideline - Pulsar PR Naming
Convention](https://docs.google.com/document/d/1d8Pw6ZbWk-_pCKdOmdvx9rnhPiyuxwq60_TrD68d7BA/edit#heading=h.trs9rsex3xom)*.
- Fill out the template below to describe the changes contributed by the
pull request. That will give reviewers the context they need to do the review.
- Each pull request should address only one issue, not mix up code from
multiple issues.
- Each commit in the pull request has a meaningful commit message
- Once all items of the checklist are addressed, remove the above text and
this checklist, leaving only the filled out template below.
-->
<!-- Details of when a PIP is required and how the PIP process work, please
see: https://github.com/apache/pulsar/blob/master/wiki/proposals/PIP.md -->
### Motivation
<!-- Explain here the context, and why you're making that change. What is
the problem you're trying to solve. -->
When there are many concurrent subscriptions in a topic, broker performance
degrades because many io-threads are waiting for the lock, `synchronized
(activeCursors)` while calling `checkBackloggedCursors`.
```
"pulsar-io-4-2" #77 prio=5 os_prio=0 cpu=17289.90ms elapsed=170.09s
tid=0x00007f85b0085800 nid=0xa2 runnable [0x00007f859d05e000]
java.lang.Thread.State: RUNNABLE
at
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.deactivateCursor(ManagedLedgerImpl.java:3526)
- locked <0x00007f8921b24c58> (a
org.apache.bookkeeper.mledger.impl.ManagedCursorContainer)
at
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.setInactive(ManagedCursorImpl.java:1116)
at
org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$checkBackloggedCursors$95(PersistentTopic.java:2363)
at
org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1346/0x00007f8465c23cb0.accept(Unknown
Source)
at
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:544)
at
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:272)
--
at java.lang.Thread.run([email protected]/Thread.java:829)
"pulsar-io-4-3" #78 prio=5 os_prio=0 cpu=17224.94ms elapsed=170.09s
tid=0x00007f85b0087000 nid=0xa3 waiting for monitor entry [0x00007f859cf5d000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.deactivateCursor(ManagedLedgerImpl.java:3526)
- waiting to lock <0x00007f8921b24c58> (a
org.apache.bookkeeper.mledger.impl.ManagedCursorContainer)
at
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.setInactive(ManagedCursorImpl.java:1116)
at
org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$checkBackloggedCursors$95(PersistentTopic.java:2363)
at
org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1346/0x00007f8465c23cb0.accept(Unknown
Source)
at
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:544)
at
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:272)
--
at java.lang.Thread.run([email protected]/Thread.java:829)
```
### Modifications
Added the `isActve` var in ManagedCursorImpl in order to minimize the
access to `activeCursors` in `ManagedLedgerImpl`
<!-- Describe the modifications you've done. -->
### Verifying this change
- [x] Make sure that the change passes the CI checks.
This change can be verified as follows:
**simulate many concurrent subscriptions**
```
#!/bin/sh
while :
do
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss a &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss b &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss c &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss d &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss e &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss f &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss g &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss h &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss i &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss j &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss k &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss l &
bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp
Earliest -time 3 -ss m &
for job in `jobs -p`
do
echo $job
wait $job
done
done
```
**before this PR**
```
hsohn@HeesungSohns-MacBook-Pro pulsar % bin/pulsar-perf produce my-topic -r
10 -m 1000
2023-01-27T18:47:30,430-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 0
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:47:40,510-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 11.8 msg/s --- 0.1 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 6834.356 ms - med: 6834.815 - 95pct: 12128.831 - 99pct:
12629.887 - 99.9pct: 12728.447 - 99.99pct: 12728.447 - Max: 12728.447
2023-01-27T18:47:50,531-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:48:00,555-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:48:10,583-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:48:20,616-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:48:30,656-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:48:40,683-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:48:50,711-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:49:00,734-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 119
msg --- 0.0 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 0.000 ms - med: 0.000 - 95pct: 0.000 - 99pct: 0.000 -
99.9pct: 0.000 - 99.99pct: 0.000 - Max: 0.000
2023-01-27T18:49:01,300-0800 [pulsar-perf-producer-exec-1-1] INFO
org.apache.pulsar.testclient.PerformanceProducer - ------------- DONE (reached
the maximum number: 1000 of production) --------------
2023-01-27T18:49:06,306-0800 [Thread-0] INFO
org.apache.pulsar.testclient.PerformanceProducer - Aggregated throughput stats
--- 119 records sent --- 1.123 msg/s --- 0.009 Mbit/s
2023-01-27T18:49:06,337-0800 [Thread-0] INFO
org.apache.pulsar.testclient.PerformanceProducer - Aggregated latency stats ---
Latency: mean: 6834.356 ms - med: 6834.815 - 95pct: 12128.831 - 99pct:
12629.887 - 99.9pct: 12728.447 - 99.99pct: 12728.447 - 99.999pct: 12728.447 -
Max: 12728.447
```
**after this PR**
```
hsohn@HeesungSohns-MacBook-Pro pulsar % bin/pulsar-perf produce my-topic -r
10 -m 1000
org.apache.pulsar.testclient.PerformanceProducer - Created 1 producers
2023-01-27T18:57:42,190-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 31
msg --- 3.1 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 2848.717 ms - med: 2265.311 - 95pct: 5757.055 - 99pct: 5822.847
- 99.9pct: 5822.847 - 99.99pct: 5822.847 - Max: 5822.847
2023-01-27T18:57:52,230-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 49
msg --- 1.8 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 10118.172 ms - med: 9954.495 - 95pct: 12453.503 - 99pct:
12554.047 - 99.9pct: 12554.047 - 99.99pct: 12554.047 - Max: 12554.047
2023-01-27T18:58:02,254-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 87
msg --- 3.8 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 17079.874 ms - med: 16370.687 - 95pct: 20056.703 - 99pct:
20304.127 - 99.9pct: 20304.127 - 99.99pct: 20304.127 - Max: 20304.127
2023-01-27T18:58:12,277-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 126
msg --- 3.9 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 25037.297 ms - med: 25613.823 - 95pct: 26452.607 - 99pct:
26574.591 - 99.9pct: 26574.591 - 99.99pct: 26574.591 - Max: 26574.591
2023-01-27T18:58:22,299-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 174
msg --- 4.8 msg/s --- 0.0 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 30616.232 ms - med: 31042.431 - 95pct: 32272.895 - 99pct:
32440.447 - 99.9pct: 32440.447 - 99.99pct: 32440.447 - Max: 32440.447
2023-01-27T18:58:32,325-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 240
msg --- 6.6 msg/s --- 0.1 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 35219.516 ms - med: 35478.527 - 95pct: 36584.447 - 99pct:
36717.823 - 99.9pct: 36817.663 - 99.99pct: 36817.663 - Max: 36817.663
2023-01-27T18:58:42,354-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 314
msg --- 7.4 msg/s --- 0.1 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 36440.735 ms - med: 36492.543 - 95pct: 37402.623 - 99pct:
37561.087 - 99.9pct: 37659.391 - 99.99pct: 37659.391 - Max: 37659.391
2023-01-27T18:58:52,377-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 397
msg --- 8.3 msg/s --- 0.1 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 40985.450 ms - med: 41395.967 - 95pct: 42582.783 - 99pct:
42684.927 - 99.9pct: 42727.167 - 99.99pct: 42727.167 - Max: 42727.167
2023-01-27T18:59:02,406-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 519
msg --- 12.2 msg/s --- 0.1 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 38750.025 ms - med: 38156.543 - 95pct: 41149.439 - 99pct:
41394.175 - 99.9pct: 41492.479 - 99.99pct: 41492.479 - Max: 41492.479
2023-01-27T18:59:12,436-0800 [main] INFO
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 645
msg --- 12.6 msg/s --- 0.1 Mbit/s --- failure 0.0 msg/s ---
Latency: mean: 36111.466 ms - med: 35880.959 - 95pct: 37313.535 - 99pct:
37454.335 - 99.9pct: 37470.719 - 99.99pct: 37470.719 - Max: 37470.719
2023-01-27T18:59:12,993-0800 [pulsar-perf-producer-exec-1-1] INFO
org.apache.pulsar.testclient.PerformanceProducer - ------------- DONE (reached
the maximum number: 1000 of production) --------------
2023-01-27T18:59:17,999-0800 [Thread-0] INFO
org.apache.pulsar.testclient.PerformanceProducer - Aggregated throughput stats
--- 723 records sent --- 6.826 msg/s --- 0.053 Mbit/s
2023-01-27T18:59:18,027-0800 [Thread-0] INFO
org.apache.pulsar.testclient.PerformanceProducer - Aggregated latency stats ---
Latency: mean: 32857.773 ms - med: 35737.599 - 95pct: 41621.759 - 99pct:
42467.839 - 99.9pct: 42684.927 - 99.99pct: 42727.167 - 99.999pct: 42727.167 -
Max: 42727.167
```
### Does this pull request potentially affect one of the following parts:
<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
*If the box was checked, please highlight the changes*
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
### Documentation
<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
- [ ] `doc` <!-- Your PR contains doc changes. Please attach the local
preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR
description, or else your PR might not get merged. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update
later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
### Matching PR in forked repository
PR in forked repository: https://github.com/heesung-sn/pulsar/pull/23
<!--
After opening this PR, the build in apache/pulsar will fail and instructions
will
be provided for opening a PR in the PR author's forked repository.
apache/pulsar pull requests should be first tested in your own fork since
the
apache/pulsar CI based on GitHub Actions has constrained resources and quota.
GitHub Actions provides separate quota for pull requests that are executed
in
a forked repository.
The tests will be run in the forked repository until all PR review comments
have
been handled, the tests pass and the PR is approved by a reviewer.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]