Re: [Drizzle-discuss] [GSoC ideas] Rewrite of the Pool of Threads Scheduler

Jay Pipes Tue, 31 Mar 2009 08:27:51 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Biping MENG wrote:
> Hi,
>     The source code is new to me. So I spent some time on reviewing the code
> that goes from the main() to handle_one_connection(THD) today and drew a
> flow chat as is attached to this mail to make the process clear.
>     I've not got any document or something that explains the structure of
> the source code, and is a little confused on some of details of the code
> like that "priv" always appear in the name of a header file, does it stands
> for private or privilege or something else?  Reading cross-platform code is
> such a pain, for ifdef always appears here and there.


Hmmm, it seems you are referring to the MySQL source code and not the
Drizzle source code... I'd suggest grabbing the source code of Drizzle
using Bazaar.  You can find instructions for developing Drizzle using
standard open source tools on my blog and on the Drizzle wiki:

http://jpipes.com/index.php?/archives/248-Getting-a-Working-CC++-Development-Environment-for-Developing-Drizzle.html
http://jpipes.com/index.php?/archives/249-A-Contributors-Guide-to-Launchpad.net-Part-1-Getting-Started.html
http://jpipes.com/index.php?/archives/250-A-Contributors-Guide-to-Launchpad.net-Part-2-Code-Management.html
http://drizzle.org/wiki/Compiling
http://drizzle.org/wiki/Debugging_Drizzle_with_GDB

Also, Doxygen documentation is always available on the Drizzle website here:

http://drizzle.org/doxygen/

You will find that the structure of the Drizzle source code, especially
regarding the include header ordering and directory structure, is much
simpler and more straight-forward to navigate than the main MySQL sources.

Note that the THD class which you cite above is different in Drizzle.
In Drizzle, we have a Session class (defined in /drizzled/session.h) and
it is smaller and simpler than the THD class in MySQL.  The Session
class in Drizzle is not necessarily related to an operating system
thread, but instead represents a client session.

>     Let's come straight to the point.
>     Only after the connection is established and some prioritizing
> information is received from the client, can we decide whether this session
> is prioritized. So we can never decide to refuse to serve a non-prioritized
> a client request before a thread is assigned. So under the current process
> model, session scheduler can only be implemented as an enhanced wrapper of
> the thread scheduler. I don't think it is a elegant way of doing this.

No, it's not, and there are some complex things which happen in the
mysys/ library in the MySQL source tree which are simpler in the Drizzle
code base.  In addition, as Drizzle is based on the MySQL 6.0 source
code, we have a more modular way of dealing with scheduling threads.  We
have a plugin system, currently under quite a bit of day-to-day
modification and refactoring, which allows us to plug in a session
scheduler module.  See /plugin/pool_of_threads/, /plugin/single_thread/
and /plugin/multi_thread/ for examples of the three current session
scheduler plugins we have.

>      I have not studied the packet content during a life-cycle of a client
> connection. I guess it goes as the followings or something like that:
>      START PACKET(launch a client connection including some authentication
> info) + QUERY HEADER(identifies the following request is a query) + QUERY
> SENTENCE + UPDATE HEADER(identifies the following request is a update) +
> UPDATE SENTENCE + ... + END PACKET(request to close the connection politely)
> 
>      I'm thinking of the following solution:
>      We can divide the whole process into 4 phases (also explained in the
> attachment new_session_scheduler.png):
>      A: receiving START PACKET or START PACKET received and response is
> being sent.
>      B: connection validation done, receiving sql request packet until one
> full sql request packet is received.
>      C: request is being disposed by the subsequence component such as LEX
> until the response packet is produced.
>      D: response packet is being sent.

Agreed.  Separating out the phases in the above way is good.  Basically,
we are in the process of refactoring the kernel to better delineate a
session's lifetime and the events which occur during a session's
lifetime.  Eric Day is currently working on the protocol-centric pieces
of this and Monty Taylor, myself and Brian Aker are working on
refactoring the Session object into a much more class-oriented approach
than the MySQL source code.

>      Request queue is a nice way to separate client connections and serving
> processions. We can set n epolls at the front end, m queues in the middle
> and l workers at the back side. In this way the whole process is divided
> into pieces and can be configured with high flexibility.

Agreed, though I'd suggest to approach things in a slow and steady
manner, otherwise you risk going down code paths that get very complex.
   To get you started thinking, see the code in the below pastebin
links.  In thinking about your queue solution, I hacked up some ideas
into a sample scheduler plugin that uses a queue-based approach
sometimes called a scoreboard.

I haven't tested the code, as it's just proof-of-concept stuff, but I
figured it might give you some ideas...

Here is the interface file,
plugin/scoreboard_scheduler/scoreboard_scheduler.h:

http://rafb.net/p/fKOGtt20.html

And here is a prototype implementation file,
plugin/scoreboard_scheduler/scoreboard_scheduler.cc:

http://rafb.net/p/pGJvKw89.html

Hopefully, the code in the above might give you some more ideas. :)

Cheers!

Jay

>      Prioritizing can be implemented in either of the following ways:
>      1. Set some the request queue as prioritized queues. Only prioritized
> requests are pushed into them. Workers should always try to fetch a request
> from a prioritized queue first. And from non-prioritized queues only when
> prioritized queue is empty.
>      2. Depart workers into two group: one for prioritized requests, the
> other for all requests. Workers of prioritized group should only process
> requests and do nothing about non-prioritized requests.
> 
>      Workers are threads that are initialized on the start of mysql server
> and will never be killed until the server is killed. They work in a way like
> this:
> 
> uint8_t responseBuffer[BUFFER_LEN];
> while(true)
> {
> if(reqestQueue.empty())
> {
> pthread_wait_cond on request-entered signals;
> }
> request* pRequest = requestQueue.Dequeue();
 if(pRequest == NULL)
> {
> continue;
> }
>  if(!validateContext(pRequest->client_context))
> {
> continue;
> }
> 
> 
> /* parsing sql statment, access db ...*/
> PROCESS_REQUEST(pRequest, responseBuffer, BUFFER_LEN);
> 
> sendResponse(pRequest->client_context->client_socket, responseBuffer,
> BUFFER_LEN);
> }
> 
>         Interrupting seems very easy in this model. We simply discard the
> request!
> 
>         Workers and client are separated by queues. Each request is assigned
> to a worker separately. So re-scheduling is done on every request.
> 
>         Surly, we need to maintain some context info for each
> connection/session. When workers decide to close a session, they simply
> unregister the socket from epoll, close the socket and destroy the related
> context. There may be still some requests already in the queue started by
> this closing session. Don't worry. They will be discarded at the call of
> validateContext(pRequest->client_context).
> 
> 
>          How do you think of this way of scheduling?
> 
> On Mon, Mar 30, 2009 at 8:47 PM, Jay Pipes <[email protected]> wrote:
> 
> Biping MENG wrote:
>>>> Hello Sir,My name is Biping MENG. I am a Chinese student from Department
> of
>>>> Computer Science and Technology, Nanjing University. I am very interested
> in
>>>> refining thread scheduling strategy of MySQL listed in the idea page and
>>>> have a strong desire to contribute to MySQL open source projects by
>>>> participating GSoC this summer.
>>>> I've already submitted the proposal on the GSoC official website. Hope
> the
>>>> mentor of this project to take a look at my proposal and give some
>>>> suggestions on it.
>>>>
>>>> I've posted this mail to mysql GSoC discussing group and got no reply. So
> I
>>>> turned to this mail list and kindly request for reply from the mentor of
>>>> this project.
> Hi Meng!  As we discussed on IRC, there are a number of students
> interested in the session scheduling functionality.  Like I mentioned on
> IRC, it would be cool if all interested students could work as a team on
> this, to make it easier for mentors to handle the proposal and also to
> make it more likely that many different ideas around the scheduler are
> examined.
> 
> You had mentioned online that you'd be interested in working on a team
> of students.
> 
> Other students: please respond here if you're interested in working with
> Meng on this project! :)
> 
> Cheers!
> 
> -jay
> 
>>>> Best regards.
>>>>
>>>> Biping MENG
>>>> Department of Computer Science and Technology
>>>> Nanjing University
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~drizzle-discuss
>>>> Post to     : [email protected]
>>>> Unsubscribe : https://launchpad.net/~drizzle-discuss
>>>> More help   : https://help.launchpad.net/ListHelp
>>

> ------------------------------------------------------------------------


> ------------------------------------------------------------------------


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknSNlEACgkQ2upbWsB4UtF2IwCfcYpzFm1WKSOC1iBodo/4ZXz6
bTQAnR/Y9ThyXxNghz/zvo6gw8jpp9D9
=DFSn
-----END PGP SIGNATURE-----

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] [GSoC ideas] Rewrite of the Pool of Threads Scheduler

Reply via email to