Hi Maciej,

On Fri, Mar 04, 2022 at 01:10:37PM +0100, Maciej Zdeb wrote:
> Hi,
> 
> I'm experiencing high CPU usage on a single core, idle drops below 40%
> while other cores are at 80% idle. Peers cluster is quite big (12 servers,
> each server running 12 threads) and is used for synchronization of 3
> stick-tables of 1 million entries size.
> 
> Is peers protocol single threaded and high usage on single core is expected
> or am I experiencing some kind of bug? I'll keep digging to be sure nothing
> else from my configuration is causing the  issue.

Almost. More precisely, each peers connection runs on a single thread
at once (like any connection). Some such connections may experience
heavy protocol parsing so it may be possible that sometimes you end up
with an unbalance number of connections on threads. It's tricky though,
because we could imagine a mechanism to try to evenly spread the outgoing
peers connection on threads but the incoming ones will land where they
land.

That's something you can check with "show sess" and/or "show fd", looking
for those related to your peers and checking their thread_mask. If you
find two on the same thread (same thread_mask), you can shut one of them
down using "shutdown session <id>" and it will reconnect, possibly on
another thread. That could confirm that it's the root cause of the
problem you're experiencing.

I'm wondering if we shouldn't introduce a notion of "heavy connection"
like we already have heavy tasks for the scheduler. These ones would
be balanced differently from others. Usually they're long-lived and
can eat massive amounts of CPU so it would make sense. The only ones
I'm thinking about are the peers ones but the concept would be more
portable than focusing on peers. Usually such long connections are not
refreshed often so we probably prefer to spend time carefully picking
the best thread rather than saving 200ns of processing and having to
pay it for the whole connection's life time.

Willy

Reply via email to