Re: [PATCH master 1/4] Master core scalability design doc

Guido Trotter Tue, 18 May 2010 09:07:40 -0700

On Tue, May 18, 2010 at 4:59 PM, Iustin Pop <[email protected]> wrote:
> On Tue, May 18, 2010 at 04:44:15PM +0100, Guido Trotter wrote:
>> Signed-off-by: Guido Trotter <[email protected]>
>> ---
>>  doc/design-2.2.rst |   62 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 62 insertions(+), 0 deletions(-)
>>
>> diff --git a/doc/design-2.2.rst b/doc/design-2.2.rst
>> index ab0a8bd..c18e7a7 100644
>> --- a/doc/design-2.2.rst
>> +++ b/doc/design-2.2.rst
>> @@ -33,6 +33,68 @@ As for 2.1 we divide the 2.2 design into three areas:
>>  Core changes
>>  ------------
>>
>> +Master Daemon Scaling improvements
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Current state and shortcomings
>> +++++++++++++++++++++++++++++++
>> +
>> +Currently the Ganeti master daemon is based on four sets of threads:
>> +
>> +- The main thread (1) just accepts connections on the master socket
>> +- The client worker (16) pool (16 threads) handles those connections,
>> +  one thread per connected socket, parses luxi requests, and sends data
>> +  back to the clients
>> +- The job queue worker pool (25) executes the actual jobs submitted by
>> +  the clients
>> +- The rpc worker pool (10) interacts with the nodes via http-based-rpc
>> +
>> +This means that every masterd currently runs 52 threads to do its job.
>> +Being able to reduce this number would make the master a lot simpler.
>
> I think you mean reducing the number of thread *sets*, not threads,
> would simplify the architecture.
>


The number of thread sets would simplify the architecture, but I also
think less than 52 threads will make masterd less heavyweight.

>> +Also, even with this big number of threads masterd suffers from quite a
>> +few scalability issues:
>> +
>> +- Since the 16 client worker threads handle one connection each, it's
>> +  very easy to exaust them, by just connecting to masterd 16 times.
>> +  While we could perhaps make those pools resizable, increasing the
>> +  number of threads won't help with lock contention.
>> +- Some luxi operations (in particular REQ_WAIT_FOR_JOB_CHANGE) make the
>> +  relevant client thread block on its job for a relatively long time.
>> +  This makes it easier to finish the 16 client threads.
>
> s/finish/exhaust/
>

Yes.

>> +- The luxi lock is quite heavily contended, and certain easily
>
> Hmm, what luxi lock?
>

s/luxi/job queue/

>> +  reproducible worklogs show that's it's very easy to put masterd in
>> +  trouble: for example running ~15 background instance reinstall jobs,
>> +  results in a master daemon that, even without having finished the
>> +  client worker threads, can't answer simple job list requests, or
>> +  submit more jobs.
>
> I'd like to understand better how this happens. Do you have more info?
>

My tests show that it's contention on the job queue lock, which gets
hold for as long as 40/50 seconds, at times.
This makes in impossible to submit jobs, and easily make any type of
request hang a client request thread because it'll be waiting for the
job queue.

>> +Proposed changes
>> +++++++++++++++++
>> +
>> +In order to fix the above issues, for Ganeti 2.2, we propose the
>> +following core changes:
>> +
>> +- The main thread of masterd is moved to asyncore (so it can share the
>> +  mainloop code with all other ganeti daemons) and handles all client
>> +  connections.
>> +- The REQ_WAIT_FOR_JOB_CHANGE luxi request is changed to be
>> +  subscription-based, so that the executing thread doesn't have to be
>> +  hogged while changes arrive.
>
> What do you mean "subscription-based"?
>

That the request is noted, associated with the connection, and when
new data is available it can be sent there, without the need to have a
thread waiting for that data.
Basically each job would have a subscribers list, and it would be the
job of the thread which updates the job to also update the subscribers
on the change (by queueing data to be sent to them, which will be done
by asyncore in the main thread at its convenience).

>> +- The job queue lock is reviewed to decrease its contention, making the
>> +  job queue more interactive.
>> +
>> +With these changes it should be possible to interact with the master
>> +daemon even when it's under heavy load, and it will also be simpler to
>> +add core functionality such as: asynchronous rpc client, internal timers
>> +to avoid master client timeouts (luxi level keepalives).
>> +
>> +Only the first two changes should be enough to reduce the size of the
>> +client worker pool from 16 to ~4/5 threads maximum (although the perfect
>> +number needs to be tested in practice) and if the rpc client can be
>> +moved to be asynchronous as well, masterd should become a lot smaller in
>> +number of threads, and thus also easier to understand, debug, and scale.
>
> Hmm… What gain is there in reducing this number?
>

Well, memory usage should decrease, and perhaps lock contention.
Debugging should be easier with fewer threads acting at the same time
in the log.

Thanks,

Guido

Re: [PATCH master 1/4] Master core scalability design doc

Reply via email to