Re: [Drizzle-discuss] [GSoC ideas] Rewrite of the Pool of Threads Scheduler

Biping MENG Sun, 05 Apr 2009 03:57:05 -0700

Hi:  Jay, It has been a period of time that I did not mail to you. Time to
say Hi:)


On Tue, Mar 31, 2009 at 11:27 PM, Jay Pipes <[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Biping MENG wrote:
> > Hi,
> >     The source code is new to me. So I spent some time on reviewing the
> code
> > that goes from the main() to handle_one_connection(THD) today and drew a
> > flow chat as is attached to this mail to make the process clear.
> >     I've not got any document or something that explains the structure of
> > the source code, and is a little confused on some of details of the code
> > like that "priv" always appear in the name of a header file, does it
> stands
> > for private or privilege or something else?  Reading cross-platform code
> is
> > such a pain, for ifdef always appears here and there.
>
> Hmmm, it seems you are referring to the MySQL source code and not the
> Drizzle source code... I'd suggest grabbing the source code of Drizzle
> using Bazaar.  You can find instructions for developing Drizzle using
> standard open source tools on my blog and on the Drizzle wiki:
>
>
> http://jpipes.com/index.php?/archives/248-Getting-a-Working-CC++-Development-Environment-for-Developing-Drizzle.html
>
> http://jpipes.com/index.php?/archives/249-A-Contributors-Guide-to-Launchpad.net-Part-1-Getting-Started.html
>
> http://jpipes.com/index.php?/archives/250-A-Contributors-Guide-to-Launchpad.net-Part-2-Code-Management.html
> http://drizzle.org/wiki/Compiling
> http://drizzle.org/wiki/Debugging_Drizzle_with_GDB
>
> Also, Doxygen documentation is always available on the Drizzle website
> here:
>
> http://drizzle.org/doxygen/
>
> You will find that the structure of the Drizzle source code, especially
> regarding the include header ordering and directory structure, is much
> simpler and more straight-forward to navigate than the main MySQL sources.
>
> Note that the THD class which you cite above is different in Drizzle.
> In Drizzle, we have a Session class (defined in /drizzled/session.h) and
> it is smaller and simpler than the THD class in MySQL.  The Session
> class in Drizzle is not necessarily related to an operating system
> thread, but instead represents a client session.


Thank you for your guide. When I got to know drizzle I didn't make it very
clear from MySQL. I spent some days on setting up the drizzle project
developing environment. I was suffering from a very low downloading-rate
(less than 1KB/s) when trying to get a local Bazaar repository. It takes
almost one whole day to finish the downloading.


> >     Let's come straight to the point.
> >     Only after the connection is established and some prioritizing
> > information is received from the client, can we decide whether this
> session
> > is prioritized. So we can never decide to refuse to serve a
> non-prioritized
> > a client request before a thread is assigned. So under the current
> process
> > model, session scheduler can only be implemented as an enhanced wrapper
> of
> > the thread scheduler. I don't think it is a elegant way of doing this.
>
> No, it's not, and there are some complex things which happen in the
> mysys/ library in the MySQL source tree which are simpler in the Drizzle
> code base.  In addition, as Drizzle is based on the MySQL 6.0 source
> code, we have a more modular way of dealing with scheduling threads.  We
> have a plugin system, currently under quite a bit of day-to-day
> modification and refactoring, which allows us to plug in a session
> scheduler module.  See /plugin/pool_of_threads/, /plugin/single_thread/
> and /plugin/multi_thread/ for examples of the three current session
> scheduler plugins we have.


> >      I have not studied the packet content during a life-cycle of a
> client
> > connection. I guess it goes as the followings or something like that:
> >      START PACKET(launch a client connection including some
> authentication
> > info) + QUERY HEADER(identifies the following request is a query) + QUERY
> > SENTENCE + UPDATE HEADER(identifies the following request is a update) +
> > UPDATE SENTENCE + ... + END PACKET(request to close the connection
> politely)
> >
> >      I'm thinking of the following solution:
> >      We can divide the whole process into 4 phases (also explained in the
> > attachment new_session_scheduler.png):
> >      A: receiving START PACKET or START PACKET received and response is
> > being sent.
> >      B: connection validation done, receiving sql request packet until
> one
> > full sql request packet is received.
> >      C: request is being disposed by the subsequence component such as
> LEX
> > until the response packet is produced.
> >      D: response packet is being sent.
>
> Agreed.  Separating out the phases in the above way is good.  Basically,
> we are in the process of refactoring the kernel to better delineate a
> session's lifetime and the events which occur during a session's
> lifetime.  Eric Day is currently working on the protocol-centric pieces
> of this and Monty Taylor, myself and Brian Aker are working on
> refactoring the Session object into a much more class-oriented approach
> than the MySQL source code.


I've reviewed the implementation of the scheduling plugins and have found
out that currently we can only schedule Sessions as the atomic scheduling
element. So more detailed-level scheduling can not be practiced until one
big session is separated into relatively small phases at kernel level. And
that would be a big work. More flexible scheduling requests smaller
scheduling elements.

>
>
> >      Request queue is a nice way to separate client connections and
> serving
> > processions. We can set n epolls at the front end, m queues in the middle
> > and l workers at the back side. In this way the whole process is divided
> > into pieces and can be configured with high flexibility.
>
> Agreed, though I'd suggest to approach things in a slow and steady
> manner, otherwise you risk going down code paths that get very complex.
>   To get you started thinking, see the code in the below pastebin
> links.  In thinking about your queue solution, I hacked up some ideas
> into a sample scheduler plugin that uses a queue-based approach
> sometimes called a scoreboard.
>
> I haven't tested the code, as it's just proof-of-concept stuff, but I
> figured it might give you some ideas...
>
> Here is the interface file,
> plugin/scoreboard_scheduler/scoreboard_scheduler.h:
>
> http://rafb.net/p/fKOGtt20.html
>
> And here is a prototype implementation file,
> plugin/scoreboard_scheduler/scoreboard_scheduler.cc:
>
> http://rafb.net/p/pGJvKw89.html
>
> Hopefully, the code in the above might give you some more ideas. :)


I'm sorry, but I was not working on my own working station the day I
received the mail and only took a very brief look at the codes without
copying them out. May I ask for another copy of these codes please?

I think Drizzle pool of threads scheduling plugin is a much better
implementation over that of MySQL 5.1 as we treat sessions that need for
processing and that wait for I/O differently. And each process procedure may
be assigned to any one of the threads in the pool. This process-level
scheduling bypassed the higher logic-layer and divide the session life-cycle
into pieces of small procedures by network I/Os. It's a very nice way of
doing that.
On the GSoC wiki
page<http://drizzle.org/wiki/Soc#Pool_of_Threads_Scheduler>it writes
"it
can have very poor performance on other workloads". I wonder if the details
on the investigation that shows poor performance of this scheduling are
available eg. in what condition "poor" performance is more likely to show
up.

By the way, is exception handling mechanism recommended in Drizzle? I found
very few code that throws exceptions in our project. And this is not
mentioned on the code style wiki page.

I am sorry that I made some mistakes on the "brief list of deliverables"
part in my proposal on GSoC website.  And even sorry that I've missed the
deadline to fix it for some reason:(
I have posted the fixed part to the comment and is sending to the drizzle
discussing group here:

Brief list of deliverables
1st April - 22nd April: Get ready for contribution. Review drizzle source
code. Try to collect possiable ways of solutions to the idea.
23rd April - 22nd May: Keep in touch with my mentor and discuss on the
practicability on each possiable solution. Try to find the best and refine
it.
23rd May - 1st August: Test-driven coding. Discuss with my mentor to
overcome the unexpected troubles.
2nd August - 24th August: Improve comments and documents. Fix remaining
little bugs.


Best regards.

Biping MENG

Department of Computer Science and Technology
Nanjing University


>
> Cheers!
>
> Jay
>
> >      Prioritizing can be implemented in either of the following ways:
> >      1. Set some the request queue as prioritized queues. Only
> prioritized
> > requests are pushed into them. Workers should always try to fetch a
> request
> > from a prioritized queue first. And from non-prioritized queues only when
> > prioritized queue is empty.
> >      2. Depart workers into two group: one for prioritized requests, the
> > other for all requests. Workers of prioritized group should only process
> > requests and do nothing about non-prioritized requests.
> >
> >      Workers are threads that are initialized on the start of mysql
> server
> > and will never be killed until the server is killed. They work in a way
> like
> > this:
> >
> > uint8_t responseBuffer[BUFFER_LEN];
> > while(true)
> > {
> > if(reqestQueue.empty())
> > {
> > pthread_wait_cond on request-entered signals;
> > }
> > request* pRequest = requestQueue.Dequeue();
>  if(pRequest == NULL)
> > {
> > continue;
> > }
> >  if(!validateContext(pRequest->client_context))
> > {
> > continue;
> > }
> >
> >
> > /* parsing sql statment, access db ...*/
> > PROCESS_REQUEST(pRequest, responseBuffer, BUFFER_LEN);
> >
> > sendResponse(pRequest->client_context->client_socket, responseBuffer,
> > BUFFER_LEN);
> > }
> >
> >         Interrupting seems very easy in this model. We simply discard the
> > request!
> >
> >         Workers and client are separated by queues. Each request is
> assigned
> > to a worker separately. So re-scheduling is done on every request.
> >
> >         Surly, we need to maintain some context info for each
> > connection/session. When workers decide to close a session, they simply
> > unregister the socket from epoll, close the socket and destroy the
> related
> > context. There may be still some requests already in the queue started by
> > this closing session. Don't worry. They will be discarded at the call of
> > validateContext(pRequest->client_context).
> >
> >
> >          How do you think of this way of scheduling?
> >
> > On Mon, Mar 30, 2009 at 8:47 PM, Jay Pipes <[email protected]> wrote:
> >
> > Biping MENG wrote:
> >>>> Hello Sir,My name is Biping MENG. I am a Chinese student from
> Department
> > of
> >>>> Computer Science and Technology, Nanjing University. I am very
> interested
> > in
> >>>> refining thread scheduling strategy of MySQL listed in the idea page
> and
> >>>> have a strong desire to contribute to MySQL open source projects by
> >>>> participating GSoC this summer.
> >>>> I've already submitted the proposal on the GSoC official website. Hope
> > the
> >>>> mentor of this project to take a look at my proposal and give some
> >>>> suggestions on it.
> >>>>
> >>>> I've posted this mail to mysql GSoC discussing group and got no reply.
> So
> > I
> >>>> turned to this mail list and kindly request for reply from the mentor
> of
> >>>> this project.
> > Hi Meng!  As we discussed on IRC, there are a number of students
> > interested in the session scheduling functionality.  Like I mentioned on
> > IRC, it would be cool if all interested students could work as a team on
> > this, to make it easier for mentors to handle the proposal and also to
> > make it more likely that many different ideas around the scheduler are
> > examined.
> >
> > You had mentioned online that you'd be interested in working on a team
> > of students.
> >
> > Other students: please respond here if you're interested in working with
> > Meng on this project! :)
> >
> > Cheers!
> >
> > -jay
> >
> >>>> Best regards.
> >>>>
> >>>> Biping MENG
> >>>> Department of Computer Science and Technology
> >>>> Nanjing University
> >>>>
> >>>>
> >>>>
> >>>>
> ------------------------------------------------------------------------
> >>>>
> >>>> _______________________________________________
> >>>> Mailing list: 
> >>>> https://launchpad.net/~drizzle-discuss<https://launchpad.net/%7Edrizzle-discuss>
> >>>> Post to     : [email protected]
> >>>> Unsubscribe : 
> >>>> https://launchpad.net/~drizzle-discuss<https://launchpad.net/%7Edrizzle-discuss>
> >>>> More help   : https://help.launchpad.net/ListHelp
> >>
>
> > ------------------------------------------------------------------------
>
>
> > ------------------------------------------------------------------------
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAknSNlEACgkQ2upbWsB4UtF2IwCfcYpzFm1WKSOC1iBodo/4ZXz6
> bTQAnR/Y9ThyXxNghz/zvo6gw8jpp9D9
> =DFSn
> -----END PGP SIGNATURE-----
>



--

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] [GSoC ideas] Rewrite of the Pool of Threads Scheduler

Reply via email to