ankitsultana opened a new issue, #10140:
URL: https://github.com/apache/pinot/issues/10140
**Context**
When the `Monitor` in `RoundRobinScheduler` leaves, it will call
`Monitor#signalNextWaiter` which will call `RoundRobinScheduler#hasNext` since
the `signalNextWaiter` method iterates over all the guards for the given
monitor and checks if they are satisfied.
Now the `hasNext` method tries to re-compute the _ready queue every time it
is called.
**What we are seeing internally**
We looked at one of our servers which was having memory issues and took
thread-dumps over a period of few minutes. We found that all the QueryWorker
threads are always waiting on acquiring the Guard:
```
"query_worker_on_8421_port-1-thread-1" #2871 prio=5 os_prio=0
cpu=135674.67ms elapsed=69632.90s tid=0x00007ee398016000 nid=0xd24 waiting on
condition [0x00007ef0a8758000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
--
"query_worker_on_8421_port-1-thread-2" #2882 prio=5 os_prio=0
cpu=101003.62ms elapsed=69629.03s tid=0x00007ee398017000 nid=0xd30 waiting on
condition [0x00007e9974377000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
--
...
```
There's always one thread which is trying to compute the ready queue.
```
"grpc-default-executor-37238" #54551 daemon prio=5 os_prio=0 cpu=2507.80ms
elapsed=14701.97s tid=0x00007edbb0234000 nid=0xd955 runnable
[0x00007edcbddb1000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.hash([email protected]/HashMap.java:340)
at java.util.HashMap.containsKey([email protected]/HashMap.java:592)
at java.util.HashSet.contains([email protected]/HashSet.java:204)
at
java.util.Collections.disjoint([email protected]/Collections.java:5465)
at com.google.common.collect.Sets$2.isEmpty(Sets.java:871)
at
org.apache.pinot.query.runtime.executor.RoundRobinScheduler.computeReady(RoundRobinScheduler.java:147)
```
The number of threads can increase quite a lot, since the callback is called
in `MailboxContentStreamObserver` which uses grpc default executor (which seems
unbounded?).
```
❯❯❯ cat 2.thdump| grep "Monitor.enter" | wc -l
4227
...
❯❯❯ cat 6.thdump| grep "Monitor.enter" | wc -l
10658
...
❯❯❯ cat 7.thdump| grep "Monitor.enter" | wc -l
13802
...
❯❯❯ cat 7.thdump| grep "grpc-default-executor" | wc -l
13712 <<== number of grpc-default-executor threads
```
cc: @agavra @walterddr
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]