Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, Mar 07, 2007 at 03:21:19PM -0300, Kirk Kuchov ([EMAIL PROTECTED]) wrote: > On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > >* Kirk Kuchov <[EMAIL PROTECTED]> wrote: > > > >> I don't believe I'm wasting my time explaining this. They don't exist > >> as /dev/null, they are just fucking _LINKS_. > >[...] > >> > Either stop flaming kernel developers or become one. It is that > >> > simple. > >> > >> If I were to become a kernel developer I would stick with FreeBSD. > >> [...] > > > >Hey, really, this is an excellent idea: what a boon you could become to > >FreeBSD, again! How much they must be longing for your insightful > >feedback, how much they must be missing your charming style and tactful > >approach! I bet they'll want to print your mails out, frame them and > >hang them over their fireplace, to remember the good old days on cold > >snowy winter days, with warmth in their hearts! Please? > > > > http://www.totallytom.com/thecureforgayness.html Fonts are a bit bad in my browser :) Kirk, I understand your frustration - yes, Linux is not the perfect place to include startups ideas, and yes it lacks some features modern (or old) systems support for years, but things change with time. I posted a patch which allows to poll for signals, it can be trivially adopted to support timers and essentially any other events. Kevent did that too, but some things are just too radical for immediate support, especially when majority of users do not require additional functionality. People do work, and a lot of them do really good work, so no need for rude talks about how things are bad. Things change - even I support that, although kevent ignorance should put me into the first line with you :) Be good, and be cool. > -- > Kirk Kuchov -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: * Kirk Kuchov <[EMAIL PROTECTED]> wrote: > I don't believe I'm wasting my time explaining this. They don't exist > as /dev/null, they are just fucking _LINKS_. [...] > > Either stop flaming kernel developers or become one. It is that > > simple. > > If I were to become a kernel developer I would stick with FreeBSD. > [...] Hey, really, this is an excellent idea: what a boon you could become to FreeBSD, again! How much they must be longing for your insightful feedback, how much they must be missing your charming style and tactful approach! I bet they'll want to print your mails out, frame them and hang them over their fireplace, to remember the good old days on cold snowy winter days, with warmth in their hearts! Please? http://www.totallytom.com/thecureforgayness.html -- Kirk Kuchov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, Mar 07 2007, Kirk Kuchov wrote: > On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > >* Kirk Kuchov <[EMAIL PROTECTED]> wrote: > > > >> I don't believe I'm wasting my time explaining this. They don't exist > >> as /dev/null, they are just fucking _LINKS_. > >[...] > >> > Either stop flaming kernel developers or become one. It is that > >> > simple. > >> > >> If I were to become a kernel developer I would stick with FreeBSD. > >> [...] > > > >Hey, really, this is an excellent idea: what a boon you could become to > >FreeBSD, again! How much they must be longing for your insightful > >feedback, how much they must be missing your charming style and tactful > >approach! I bet they'll want to print your mails out, frame them and > >hang them over their fireplace, to remember the good old days on cold > >snowy winter days, with warmth in their hearts! Please? > > > > http://www.totallytom.com/thecureforgayness.html Dude, get a life. But more importantly, go waste somebody elses time instead of lkml's. -- Jens Axboe, updating killfile - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Kirk Kuchov <[EMAIL PROTECTED]> wrote: > I don't believe I'm wasting my time explaining this. They don't exist > as /dev/null, they are just fucking _LINKS_. [...] > > Either stop flaming kernel developers or become one. It is that > > simple. > > If I were to become a kernel developer I would stick with FreeBSD. > [...] Hey, really, this is an excellent idea: what a boon you could become to FreeBSD, again! How much they must be longing for your insightful feedback, how much they must be missing your charming style and tactful approach! I bet they'll want to print your mails out, frame them and hang them over their fireplace, to remember the good old days on cold snowy winter days, with warmth in their hearts! Please? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Wed, 7 Mar 2007, Kirk Kuchov wrote: > > I don't believe I'm wasting my time explaining this. They don't exist > as /dev/null, they are just fucking _LINKS_. I could even "ln -s > /proc/self/fd/0 sucker". A real /dev/stdout can/could even exist, but > that's not the point! Actually, one large reason for /proc/self/ existing is exactly /dev/stdin and friends. And yes, /proc/self looks like a link too, but that doesn't change the fact that it's a very special file. No different from /dev/null or friends. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Trading Places (was: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3)
On 3/7/07, Al Boldi <[EMAIL PROTECTED]> wrote: Kirk Kuchov wrote: > > Either stop flaming kernel developers or become one. It is that > > simple. > > If I were to become a kernel developer I would stick with FreeBSD. At > least they have kqueue for about seven years now. I have been playing with this thought for quite some time. The question is, can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I have to leave all the GNU/Linux distributions behind as well? http://www.debian.org/ports/kfreebsd-gnu/ -- Kirk Kuchov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Trading Places (was: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3)
Kirk Kuchov wrote: > > Either stop flaming kernel developers or become one. It is that > > simple. > > If I were to become a kernel developer I would stick with FreeBSD. At > least they have kqueue for about seven years now. I have been playing with this thought for quite some time. The question is, can I just use FreeBSD as a drop-in kernel replacement for Linux, or do I have to leave all the GNU/Linux distributions behind as well? Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/6/07, Pavel Machek <[EMAIL PROTECTED]> wrote: > >As for why common abstractions like file are a good thing, think about why > >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd > >value to be plugged everywhere, > > This is a stupid comparaison. By your logic we should also have /dev/stdin, > /dev/stdout and /dev/stderr. Bzzt, wrong. We have them. [EMAIL PROTECTED]:~$ ls -al /dev/std* lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stderr -> fd/2 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdin -> fd/0 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdout -> fd/1 [EMAIL PROTECTED]:~$ ls -al /proc/self/fd total 0 dr-x-- 2 pavel users 0 Mar 6 09:18 . dr-xr-xr-x 4 pavel users 0 Mar 6 09:18 .. lrwx-- 1 pavel users 64 Mar 6 09:18 0 -> /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 1 -> /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 2 -> /dev/ttyp2 lr-x-- 1 pavel users 64 Mar 6 09:18 3 -> /proc/2299/fd [EMAIL PROTECTED]:~$ I don't believe I'm wasting my time explaining this. They don't exist as /dev/null, they are just fucking _LINKS_. I could even "ln -s /proc/self/fd/0 sucker". A real /dev/stdout can/could even exist, but that's not the point! It remains a stupid comparison because /dev/stdin/stderr/whatever "must" be plugged, else how could a process write to stdout/stderr that it coud'nt open it ? The way things are is not because it's cleaner to have it as a file but because it's the only sane way. /dev/null is not a must have, it's mainly used for redirecting purposes. A sys_nullify(fileno(stdout)) would rule out almost any use of /dev/null. > >As for why common abstractions like file are a good thing, think about why > >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd > >value to be plugged everywhere, > >But here the list could be almost endless. > >And please don't start the, they don't scale or they need heavy file > >binding tossfeast. They scale as well as the interface that will receive > >them (poll, select, epoll). Heavy file binding what? 100 or so bytes for > >the struct file? How many signal/timer fd are you gonna have? Like 100K? > >Really moot argument when opposed to the benefit of being compatible with > >existing POSIX interfaces and being more Unix friendly. > > So why the HELL don't we have those yet? Why haven't you designed > epoll with those in mind? Why don't you back your claims with patches? > (I'm not a kernel developer.) Either stop flaming kernel developers or become one. It is that simple. If I were to become a kernel developer I would stick with FreeBSD. At least they have kqueue for about seven years now. -- Kirk Kuchov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
> >As for why common abstractions like file are a good thing, think about why > >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd > >value to be plugged everywhere, > > This is a stupid comparaison. By your logic we should also have /dev/stdin, > /dev/stdout and /dev/stderr. Bzzt, wrong. We have them. [EMAIL PROTECTED]:~$ ls -al /dev/std* lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stderr -> fd/2 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdin -> fd/0 lrwxrwxrwx 1 root root 4 Nov 12 2003 /dev/stdout -> fd/1 [EMAIL PROTECTED]:~$ ls -al /proc/self/fd total 0 dr-x-- 2 pavel users 0 Mar 6 09:18 . dr-xr-xr-x 4 pavel users 0 Mar 6 09:18 .. lrwx-- 1 pavel users 64 Mar 6 09:18 0 -> /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 1 -> /dev/ttyp2 lrwx-- 1 pavel users 64 Mar 6 09:18 2 -> /dev/ttyp2 lr-x-- 1 pavel users 64 Mar 6 09:18 3 -> /proc/2299/fd [EMAIL PROTECTED]:~$ > >But here the list could be almost endless. > >And please don't start the, they don't scale or they need heavy file > >binding tossfeast. They scale as well as the interface that will receive > >them (poll, select, epoll). Heavy file binding what? 100 or so bytes for > >the struct file? How many signal/timer fd are you gonna have? Like 100K? > >Really moot argument when opposed to the benefit of being compatible with > >existing POSIX interfaces and being more Unix friendly. > > So why the HELL don't we have those yet? Why haven't you designed > epoll with those in mind? Why don't you back your claims with patches? > (I'm not a kernel developer.) Either stop flaming kernel developers or become one. It is that simple. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/4/07, Kyle Moffett <[EMAIL PROTECTED]> wrote: Well, even this far into 2.6, Linus' patch from 2003 still (mostly) applies; the maintenance cost for this kind of code is virtually zilch. If it matters that much to you clean it up and make it apply; add an alarmfd() syscall (another 100 lines of code at most?) and make a "read" return an architecture-independent siginfo-like structure and submit it for inclusion. Adding epoll() support for random objects is as simple as a 75-line object-filesystem and a 25- line syscall to return an FD to a new inode. Have fun! Go wild! Something this trivially simple could probably spend a week in -mm and go to linus for 2.6.22. Or, if you want to do slightly more work and produce something a great deal more useful, you could implement additional netlink address families for additional "event" sources. The socket - setsockopt - bind - sendmsg/recvmsg sequence is a well understood and well documented UNIX paradigm for multiplexing non-blocking I/O to many destinations over one socket. Everyone who has read Stevens is familiar with the basic UDP and "fd open server" techniques, and if you look at Linux's IP_PKTINFO and NETLINK_W1 (bravo, Evgeniy!) you'll see how easily they could be extended to file AIO and other kinds of event sources. For file AIO, you might have the application open one AIO socket per mount point, open files indirectly via the SCM_RIGHTS mechanism, and submit/retire read/write requests via sendmsg/recvmsg with ancillary data consisting of an lseek64 tuple and a user-provided cookie. Although the process still has to have one fd open per actual open file (because trying to authenticate file accesses without opening fds is madness), the only fds it has to manipulate directly are those representing entire pools of outstanding requests. This is usually a small enough set that select() will do just fine, if you're careful with fd allocation. (You can simply punt indirectly opened fds up to a high numerical range, where they can't be accessed directly from userspace but still make fine cookies for use in lseek64 tuples within cmsg headers). The same basic approach will work for timers, signals, and just about any other event source. Userspace is of course still stuck doing its own state machines / thread scheduling / however you choose to think of it. But all the important activity goes through socketcall(), and the data and control parameters are all packaged up into a struct msghdr instead of the bare buffer pointers of read/write. So if someone else does come along later and design an ultralight threading mechanism that isn't a total botch, the actual data paths won't need much rework; the exception handling will just get a lot simpler. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Kirk Kuchov wrote: [snip] This is a stupid comparaison. By your logic we should also have /dev/stdin, /dev/stdout and /dev/stderr. Well, as a matter of fact (on my system): # ls -l /dev/std* lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stderr -> fd/2 lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stdin -> fd/0 lrwxrwxrwx 1 root root 4 Feb 1 2006 /dev/stdout -> fd/1 Please don't bother to respond to this mail, I just saw that you apparently needed the info. Magnus P.S.: *PLONK* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Discussing LKML community [OT from the Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3]
> From: "Michael K. Edwards" <[EMAIL PROTECTED]> > Newsgroups: gmane.linux.kernel > Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3 > Date: Wed, 28 Feb 2007 09:01:07 -0800 Michael, [] > In this instance, there didn't seem to be any harm in sending my > thoughts to LKML as I wrote them, on the off chance that Ingo or > Davide would get some value out of them in this design cycle (which > any code I eventually get around to producing will miss). So far, > I've gotten some rather dismissive pushback from Ingo and Alan (who > seem to have no interest outside x86 and less understanding than I > would have thought of what real userspace code looks like), a "why > preach to people who know more than you do" from Davide, this may be sad, unless you've spent time and effort to make a Patch, i.e. read source, understand why it's written so, why it's being used now that way, and why it has to be updated on new cycle of kernel development. > a brief aside on the dominance of x86 from Oleg, I didn't have a chance, and probably i will not have one, to communicate with people like you to learn from your wisdom personally. That's why i've replied to your, after you've mentioned transputers. And i've got rather different opinion, than i expected. That shows my test-tube being, little experience etc. As discussion was about CPUs, it was technical, thus on-topic for LKML. > and one off-list "keep up the good work". Not a very rich harvest from > (IMHO) pretty good seeds. Offlist message was my share of view about things, that were offtopic, and clarifying about lkml thing, and it wasn't on-topic for LKML. I'm pretty sure, that there libraries of books, written on every single bit of things Linux currently *implements* in asm/C. (1) Thus, `return -ENOPATCH', man, regardless what you are saying in lkml. That's why prominent people, you've joined me with (:, replied in go-to-kernelnewbie style. > In short, so far the "Linux kernel community" is upholding its > reputation for insularity, arrogance, coding without prior design, > lack of interest in userspace problems, and inability to learn from > the mistakes of others. (None of these characterizations depends on > there being any real insight in anything I have written.) You, as a person, who have right to be personally wrong, may think that way. But do not forget, as i've wrote you offlist and in (1), this is development community, sometimes development of development one, etc; educated, enthusiastic, wise, Open Source, poor on time (and money :). > Happy hacking, > - Michael And you too. LKML *can* (sometimes may) show how useful this hacking is. > P. S. I do think "threadlets" are brilliant, though, and reading > Ingo's patches gave me a much better idea of what would be involved in > prototyping Asynchronously Executed I/O Unit opcodes. You are discussing on-topic thing in the P.S. And this is IMHO wrong approach. Also, note, that i've changed subject, stripped cc list, please note, that i can be young and naive boy barking up the wrong tree. Kind regards. -- -o--=O`C /. .\ #oo'L O o <___=E M^-- (Wuuf) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sun, 4 Mar 2007, Kirk Kuchov wrote: > I don't give a shit. Here's another good use of /dev/null: *PLONK* - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/4/07, Davide Libenzi wrote: On Sun, 4 Mar 2007, Kirk Kuchov wrote: > On 3/3/07, Davide Libenzi wrote: > > > > > > Those *other* (tons?!?) interfaces can be created *when* the need comes > > (see Linus signalfd [1] example to show how urgent that was). *When* > > the need comes, they will work with existing POSIX interfaces, without > > requiring your own just-another event interface. Those other interfaces > > could also be more easily adopted by other Unix cousins, because of > > the fact that they rely on existing POSIX interfaces. > > Please stop with this crap, this chicken or the egg argument of yours is utter > BULLSHIT! Wow, wow, fella! You _deinitely_ cannot afford rudeness here. I don't give a shit. You started bad, and you end even worse. By listing a some APIs that will work only with epoll. As I said already, and as it was listed in the thread I posted the link, something like: int signalfd(...); // Linus initial interface would be perfectly fine int timerfd(...); // Open ... int eventfd(...); // [1] Will work *even* with standard POSIX select/poll. 95% or more of the software does not have scalability issues, and select/poll are more portable and easy to use for simple stuff. On top of that, as I already said, they are *confined* interfaces that could be more easily adopted by other Unixes (if they are 100-200 lines on Linux, don't expect them to be a lot more on other Unixes) [2]. We *already* have the infrastructure inside Linux to deliver events (f_op->poll subsystem), how about we use that instead of just-another way? [3] Man you're so full of shit, your eyes are brown. NOBODY cares about select/poll or that the interfaces are going to be adopted by other Unixes. This issue has already been solved by then YEARS ago. What I want (and a ton of other users) is a SIMPLE and generic way to receive events from _MULTIPLE_multiple sources. I don't care about kernel-level portability, easiness or whatever, the linux kernel developers are good at not knowing what their users want. As for why common abstractions like file are a good thing, think about why having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, This is a stupid comparaison. By your logic we should also have /dev/stdin, /dev/stdout and /dev/stderr. or why I can use find/grep/cat/echo/... to look/edit at my configuration inside /proc, instead of using a frigging registry editor. Yet another stupid comparaison, /proc is a MESS! Almost as worse as the registry. Linux now has three pieces of crap for configuration/information: /proc, sysfs and sysctl. Nobody knows exactly what should go into each one of those. Crap design at it's best. But here the list could be almost endless. And please don't start the, they don't scale or they need heavy file binding tossfeast. They scale as well as the interface that will receive them (poll, select, epoll). Heavy file binding what? 100 or so bytes for the struct file? How many signal/timer fd are you gonna have? Like 100K? Really moot argument when opposed to the benefit of being compatible with existing POSIX interfaces and being more Unix friendly. So why the HELL don't we have those yet? Why haven't you designed epoll with those in mind? Why don't you back your claims with patches? (I'm not a kernel developer.) As for the AIO stuff, if threadlets/syslets will prove effective, you can host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of userspace code needed to do that, fall inside your definition of "kludge", we can even find a way to bridge the two. I don't care about threadlets in this context, I just want to wait for EVENTS from MULTIPLE sources WITHOUT mixing signals and other crap. Your arrogance is amusing, stop pushing narrow-minded beliefs down the throats of all Linux users. Kqueue, event ports, WaitForMultipleObjects, epoll with multiple sources. That's what users want, not yet another syscall/whatever hack. Now, how about we focus on the topic of this thread? [1] This could be an idea. People already uses pipes for this, but pipes has some memory overhead inside the kernel (plus use two fds) that could, if really felt necessary, be avoided. Yet another hack!! 64kiB of space just to push some user events around. Great idea! [2] This is how those kind of interfaces should be designed. Modular, re-usable, file-based interfaces, whose acceptance is not linked into slurping-in a whole new interface with tenths of sub, interface-only, objects. And from this POV, epoll is the friendlier. Who said I want yet another interface? I just fucking want to receive events from MULTIPLE sources through epoll. With or without a fd! My anger and frustration is that we can get past this SIMPLE need! [3] Notice the similarity between threadlets/syslets and epoll? They enable pretty darn good scalability, with *existing* infrastructure, and w/out special ad-hoc code to b
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sun, 4 Mar 2007, Kirk Kuchov wrote: > On 3/3/07, Davide Libenzi wrote: > > > > > > Those *other* (tons?!?) interfaces can be created *when* the need comes > > (see Linus signalfd [1] example to show how urgent that was). *When* > > the need comes, they will work with existing POSIX interfaces, without > > requiring your own just-another event interface. Those other interfaces > > could also be more easily adopted by other Unix cousins, because of > > the fact that they rely on existing POSIX interfaces. > > Please stop with this crap, this chicken or the egg argument of yours is utter > BULLSHIT! Wow, wow, fella! You _deinitely_ cannot afford rudeness here. You started bad, and you end even worse. By listing a some APIs that will work only with epoll. As I said already, and as it was listed in the thread I posted the link, something like: int signalfd(...); // Linus initial interface would be perfectly fine int timerfd(...); // Open ... int eventfd(...); // [1] Will work *even* with standard POSIX select/poll. 95% or more of the software does not have scalability issues, and select/poll are more portable and easy to use for simple stuff. On top of that, as I already said, they are *confined* interfaces that could be more easily adopted by other Unixes (if they are 100-200 lines on Linux, don't expect them to be a lot more on other Unixes) [2]. We *already* have the infrastructure inside Linux to deliver events (f_op->poll subsystem), how about we use that instead of just-another way? [3] As for why common abstractions like file are a good thing, think about why having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd value to be plugged everywhere, or why I can use find/grep/cat/echo/... to look/edit at my configuration inside /proc, instead of using a frigging registry editor. But here the list could be almost endless. And please don't start the, they don't scale or they need heavy file binding tossfeast. They scale as well as the interface that will receive them (poll, select, epoll). Heavy file binding what? 100 or so bytes for the struct file? How many signal/timer fd are you gonna have? Like 100K? Really moot argument when opposed to the benefit of being compatible with existing POSIX interfaces and being more Unix friendly. As for the AIO stuff, if threadlets/syslets will prove effective, you can host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of userspace code needed to do that, fall inside your definition of "kludge", we can even find a way to bridge the two. Now, how about we focus on the topic of this thread? [1] This could be an idea. People already uses pipes for this, but pipes has some memory overhead inside the kernel (plus use two fds) that could, if really felt necessary, be avoided. [2] This is how those kind of interfaces should be designed. Modular, re-usable, file-based interfaces, whose acceptance is not linked into slurping-in a whole new interface with tenths of sub, interface-only, objects. And from this POV, epoll is the friendlier. [3] Notice the similarity between threadlets/syslets and epoll? They enable pretty darn good scalability, with *existing* infrastructure, and w/out special ad-hoc code to be plugged everywhere. This translate directly in easier to maintain code. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Mar 04, 2007, at 11:23:37, Kirk Kuchov wrote: So here we are, 2007. epoll() works with files, pipes, sockets, inotify and anything pollable (file descriptors) but aio, timers, signals and user-defined event. Can we please get those working with epoll ? Something as simple as: [code snipped] Would this be acceptable? Can we finally move on? Well, even this far into 2.6, Linus' patch from 2003 still (mostly) applies; the maintenance cost for this kind of code is virtually zilch. If it matters that much to you clean it up and make it apply; add an alarmfd() syscall (another 100 lines of code at most?) and make a "read" return an architecture-independent siginfo-like structure and submit it for inclusion. Adding epoll() support for random objects is as simple as a 75-line object-filesystem and a 25- line syscall to return an FD to a new inode. Have fun! Go wild! Something this trivially simple could probably spend a week in -mm and go to linus for 2.6.22. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Davide Libenzi wrote: Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. Please stop with this crap, this chicken or the egg argument of yours is utter BULLSHIT! Just because Linux doesn't have a decent kernel event notification mechanism it does not mean that users don't need. Nobody cared about Linus's signalfd because it wasn't mainline. Look at any event notification libraries out there, it makes me sick how much kludge they have to go thru to get near the same functionality of kqueue on Linux. Solaris has the Event Ports mechanism since 2003. FreeBSD, NetBSD, OpenBSD and Mac OS X support kqueue since around 2000. Windows has had event notification for ages now. These _facilities_ are all widely used, given the platforms popularity. So here we are, 2007. epoll() works with files, pipes, sockets, inotify and anything pollable (file descriptors) but aio, timers, signals and user-defined event. Can we please get those working with epoll ? Something as simple as: struct epoll_event ev; ev.events = EV_TIMER | EPOLLONESHOT; ev.data.u64 = 1000; /* timeout */ epoll_ctl(epfd, EPOLL_CTL_ADD, 0 /* ignored */, &ev); or struct sigevent ev; ev.sigev_notify = SIGEV_EPOLL; ev.sigev_signo = epfd; ev.sigev_value = &ev; timer_create(CLOCK_MONOTONIC, &ev, &timerid); AIO: struct sigevent ev; int fd = io_setup(..); /* oh boy, I wish... but it works */ ev.events = EV_AIO | EPOLLONESHOT; /* event.data.ptr returns pointer to the iocb */ epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev); or struct iocb iocb; iocb.aio_fildes = fileno(stdin); iocb.aio_lio_opcode = IO_CMD_PREAD; iocb.c.notify = IO_NOTIFY_EPOLL; /* __pad3/4 */ Would this be acceptable? Can we finally move on? -- Kirk Kuchov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Please don't take this the wrong way, Ray, but I don't think _you_ understand the problem space that people are (or should be) trying to address here. Servers want to always, always block. Not on a socket, not on a stat, not on any _one_ thing, but in a condition where the optimum number of concurrent I/O requests are outstanding (generally of several kinds with widely varying expected latencies). I have an embedded server I wrote that avoids forking internally for any reason, although it watches the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle VoIP signaling protocols (which are separate processes because it was more practical to write them in a different language with mediocre embeddability). There's a lot of things that can block out there, not just disk I/O, but the only thing a genuinely scalable server process ever blocks on (apart from the odd spinlock) is a wait-for-IO-from-somewhere mechanism like select or epoll or kqueue (or even sleep() while awaiting SIGRT+n, or if it genuinely doesn't suck, the thread scheduler). Furthermore, not only do servers want to block rather than shove more I/O into the plumbing than it can handle without backing up, they also want to throttle the concurrency of requests at the kernel level *for the kernel's benefit*. In particular, a server wants to submit to the kernel a ton of stats and I/O in parallel, far more than it makes sense to actually issue concurrently, so that efficient sequencing of these requests can be left to the kernel. But the server wants to guide the kernel with regard to the ratios of concurrency appropriate to the various classes and the relative urgency of the individual requests within each class. The server also wants to be able to reprioritize groups of requests or cancel them altogether based on new information about hardware status and user behavior. Finally, the biggest argument against syslets/threadlets AFAICS is that -- if done incorrectly, as currently proposed -- they would unify the AIO and normal IO paths in the kernel. This would shackle AIO to the current semantics of synchronous syscalls, in which buffers are passed as bare pointers and exceptional results are tangled up with programming errors. This would, in turn, make it quite impossible for future hardware to pipeline and speculatively execute chains of AIO operations, leaving "syslets" to a few RDBMS programmers with time to burn. The unimproved ease of long term maintenance on the kernel (not to mention the complete failure to make the writing of _correct_, performant server code any easier) makes them unworthy of consideration for inclusion. So, while everybody has been talking about cached and non-cached cases, those are really total irrelevancies. The principal problem that needs solving is to model the process's pool of in-flight I/O requests, together with a much larger number of submitted but not yet issued requests whose results are foreseeably likely to be needed soon, using a data structure that efficiently supports _all_ of the operations needed, including bulk cancellation, reprioritization, and batch migration based on affinities among requests and locality to the correct I/O resources. Memory footprint and gentle-on-real-hardware scheduling are secondary, but also important, considerations. If you happen to be able to service certain things directly from cache, that's gravy -- but it's not very smart IMHO to put that central to your design process. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Ihar `Philips` Filipau wrote: > On 3/3/07, Ray Lee <[EMAIL PROTECTED]> wrote: >> On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote: >> > What I'm trying to get to: keep things simple. The proposed >> > optimization by Ingo does nothing else but allowing AIO to probe file >> > cache - if data there to go with fast path. So why not to implement >> > what the people want - probing of cache? Because it sounds bad? But >> > they are in fact proposing precisely that just masked with "fast >> > threads". >> >> >> Servers want to never, ever block. Not on a socket, not on a stat, not >> on anything. (I have an embedded server I wrote that has to fork >> internally just to watch the damn serial port signals in parallel with >> handling network I/O, audio, and child processes that handle H323.) >> There's a lot of things that can block out there, and it's not just >> disk I/O. >> > > Why select/poll/epoll/friends do not work? I have programmed on both > sides - user-space network servers and in-kernel network protocols - > and "never blocking" thing was implemented in *nix in the times I was > walking under table. > Then you've never had to write something that watches serial port signals. Google on TIOCMIWAIT to see what I'm talking about. The only option for a userspace programmer to deal with that is to fork() or poll the signals every so many milliseconds. There are probably more easy examples, but that's the one off the top of my head that affected me. In short, this isn't just about network IO, this isn't just about file IO. > One can poll() more or less *any* device in system. With frigging > exception of - right - files. The problem is the "more or less." Say you're right, and 95% of the system calls are either already asynchronous or non-blocking/poll()able. One of the questions on the table is how to extend it to the last 5%. > User-space-wise, check how squid (caching http proxy) does it: you > have several (forked) instances to serve network requests and you have > one/several disk I/O daemons. (So called "diskd storeio") Why? Because > you cannot poll() file descriptors, but you can poll unix socket > connected to diskd. If diskd blocks, squid still can serve requests. > How threadlets are better then pool of diskd instances? All nastiness > of shared memory set loose... Samba/lighttpd/git want to issue dozens of stats in parallel so that the kernel can have an opportunity to sort them better. Are you saying they should fork() a process per stat that they want to issue in parallel? > What I'm trying to get to. Threadlets wouldn't help existing > single-threaded applications - what is about 95% of all applications. Eh, I don't think that's right. Part of the reason threadlets and syslets are on the table because it may be a more efficient way to do AIO. And the differences between the syslet API and the current kernel Async IO API can be abstracted away by glibc, so that today's apps that do AIO would immediately benefit. > What's more, as having some limited experience of kernel programming, > I fail to see what threadlets would simplify on kernel side. You can yank the entire separate AIO path, and just treat them as another blocking API that syslets makes nonblocking. Immediate reduction of code, and everybody is now using the same code paths, which means higher test coverage and reduced maintenance cost. This last point is really important. Even if no extra functionality eventually makes it to userspace, this last point would still be enough to make the powers that be consider inclusion. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Ray Lee <[EMAIL PROTECTED]> wrote: On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote: > What I'm trying to get to: keep things simple. The proposed > optimization by Ingo does nothing else but allowing AIO to probe file > cache - if data there to go with fast path. So why not to implement > what the people want - probing of cache? Because it sounds bad? But > they are in fact proposing precisely that just masked with "fast > threads". Servers want to never, ever block. Not on a socket, not on a stat, not on anything. (I have an embedded server I wrote that has to fork internally just to watch the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle H323.) There's a lot of things that can block out there, and it's not just disk I/O. Why select/poll/epoll/friends do not work? I have programmed on both sides - user-space network servers and in-kernel network protocols - and "never blocking" thing was implemented in *nix in the times I was walking under table. One can poll() more or less *any* device in system. With frigging exception of - right - files. IOW for 75% of I/O problem doesn't exists since there is proper interface - e.g. sockets - in place. User-space-wise, check how squid (caching http proxy) does it: you have several (forked) instances to serve network requests and you have one/several disk I/O daemons. (So called "diskd storeio") Why? Because you cannot poll() file descriptors, but you can poll unix socket connected to diskd. If diskd blocks, squid still can serve requests. How threadlets are better then pool of diskd instances? All nastiness of shared memory set loose... What I'm trying to get to. Threadlets wouldn't help existing single-threaded applications - what is about 95% of all applications. And multi-threaded applications would gain little because few real application create threads dynamically: creation need resources and can fail, uncontrollable thread spawning hurts overall manageability and additional care is needed regarding deadlocks/lock contentions proofing. (The category of applications which want the performance gain are also the applications which need to ensure greater stability over long non-stop runs. Uncontrollable dynamism helps nothing.) Having implemented several "file servers" - daemons serving file I/O to other daemons - I honestly hardly see any improvements. Now people configure such file servers to issue e.g. 10 file operations simultaneously - using pool of 10 threads. What threadlets change? In the end just to keep in check with threadlets I would need to issue pthread_join() after some number of threadlets created. And the latter number is the former "e.g. 10". IOW, programmer-wise the implementation remain same - and all the limitations remain the same. And all overhead of user-space locking remain the same. (*) What's more, as having some limited experience of kernel programming, I fail to see what threadlets would simplify on kernel side. End result as I see it: user space becomes bit more complicated because of dynamic multi-threading and kernel-space becomes also more complicated because of the same added dynamism. (*) Hm... On other side, if application would be able to tell kernel to limit number of issued threadlets to N, then it might simplify the job. Application can tell kernel "I need at most 10 blocking threadlets, block me if there are more" and then dumbly throw I/O threadlets at kernel as they are coming in. And kernel would then put process to sleep if N+1 thredlets are blocking. That would definitely simplify the job in user-space: it wouldn't need to call pthread_join(). But it is still no replacement to poll()able file descriptor or truly async mmap(). -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, 3 Mar 2007, Davide Libenzi wrote: > Those *other* (tons?!?) interfaces can be created *when* the need comes > (see Linus signalfd [1] example to show how urgent that was). *When* > the need comes, they will work with existing POSIX interfaces, without > requiring your own just-another event interface. Those other interfaces > could also be more easily adopted by other Unix cousins, because of > the fact that they rely on existing POSIX interfaces. One of the reason > about the Unix file abstraction interfaces, is that you do *not* have to > plan and bloat interfaces before. As long as your new abstraction behave > in a file-fashion, it can be automatically used with existing interfaces. > And you create them *when* the need comes. Now, if you don't mind, my spare time is really limited and I prefer to spend it looking at stuff the topic of this thread talks about. Even because the whole epoll/kevent discussion is heavily dependent on the fact that syslets/threadlets will or will not result a viable method for generic AIO. Savvy? - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: > > I was referring to dropping an event directly to a userspace buffer, from > > the poll callback. If pages are not there, you might sleep, and you can't > > since the wakeup function holds a spinlock on the waitqueue head while > > looping through the waiters to issue the wakeup. Also, you don't know from > > where the poll wakeup is called. > > Ugh, no, that is very limited solution - memory must be either pinned > (which leads to dos and limited ring buffer), or callback must sleep. > Actually in any way there _must_ exist a queue - if ring buffer is full > event is not allowed to be dropped - it must be stored in some other > place, for example in queue from where entries will be read (copied) > which ring buffer will have entries (that is how it is implemented in > kevent at least). I was not advocating for that, if you read carefully. The fact that epoll does not do that, should be a clear hint. The old /dev/epoll IIRC was only 10% faster than the current epoll under an *heavy* event frequency micro-bench like pipetest (and that version of epoll did not have the single pass over the ready set optimization). And /dev/epoll was delivering events *directly* on userspace visible (mmaped) memory in a zero-copy fashion. > > BTW, Linus made a signalfd sketch code time ago, to deliver signals to an > > fd. Code remained there and nobody cared. Question: Was it because > > 1) it had file bindings or 2) because nobody really cared to deliver > > signals to an event collector? > > And *if* later requirements come, you don't need to change the API by > > adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new > > XXEVENT-only submission structure. You create an API that automatically > > makes that new abstraction work with POSIX poll/select, and you get epoll > > support for free. Without even changing a bit in the epoll API. > > Well, we get epoll support for free, but we need to create tons of other > interfaces and infrastructure for kernel users, and we need to change > userspace anyway. Those *other* (tons?!?) interfaces can be created *when* the need comes (see Linus signalfd [1] example to show how urgent that was). *When* the need comes, they will work with existing POSIX interfaces, without requiring your own just-another event interface. Those other interfaces could also be more easily adopted by other Unix cousins, because of the fact that they rely on existing POSIX interfaces. One of the reason about the Unix file abstraction interfaces, is that you do *not* have to plan and bloat interfaces before. As long as your new abstraction behave in a file-fashion, it can be automatically used with existing interfaces. And you create them *when* the need comes. [1] That was like 100 lines of code or so. See here: http://tinyurl.com/3yuna5 - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote: What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with "fast threads". Please don't take this the wrong way, but I don't think you understand the problem space that people are trying to address here. Servers want to never, ever block. Not on a socket, not on a stat, not on anything. (I have an embedded server I wrote that has to fork internally just to watch the damn serial port signals in parallel with handling network I/O, audio, and child processes that handle H323.) There's a lot of things that can block out there, and it's not just disk I/O. Further, not only do servers not want to block, they also want to cram a lot more requests into the kernel at once *for the kernel's benefit*. In particular, a server wants to issue a ton of stats and I/O in parallel so that the kernel can optimize which order to handle the requests. Finally, the biggest argument in favor of syslets/threadlets AFAICS is that -- if done correctly -- it would unify the AIO and normal IO paths in the kernel. The improved ease of long term maintenance on the kernel (and more test coverage, and more directed optimization, etc...) just for this point alone makes them worth considering for inclusion. So, while everybody has been talking about cached and non-cached cases, those are really special cases of the entire package that the rest of us want. Ray - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, Mar 03, 2007 at 10:46:59AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: > > > > You've to excuse me if my memory is bad, but IIRC the whole discussion > > > and loong benchmark feast born with you throwing a benchmark at Ingo > > > (with kevent showing a 1.9x performance boost WRT epoll), not with you > > > making any other point. > > > > So, how does it sound? > > "Threadlets are bad for IO because kevent is 2 times faster than epoll?" > > > > I said threadlets are bad for IO (and we agreed that both approaches > > shouldbe usedfor the maximum performance) because of rescheduling overhead - > > tasks are quite heavy structuresa to move around - even pt_regs copy > > takes more than event structure, but not because there is something in other > > galaxy which might work faster than another something in another galaxy. > > That was stupid even to think about that. > > Evgeny, other folks on this thread read what you said, so let's not drag > this over. Sure, I was wrong to start this again, but try to get my position - I really tired from trying to prove that I'm not a camel just because we had some misunderstanding at the start. I do think that threadlets are relly cool solution and are indeed very good approach for majority of the parallel processing, but my point is still that it is not a perfect solution for all tasks. Just to draw a line: kevent example is extrapolation of what can be achieved with event-driven model, but that does not mean that it must be _only_ used for AIO model - threadlets _and_ event driven model (yes, I accepted Ingo's point about its declining) is the best solution. > > > And if you really feel raw about the single O(nready) loop that epoll > > > currently does, a new epoll_wait2 (or whatever) API could be used to > > > deliver the event directly into a userspace buffer [1], directly from the > > > poll callback, w/out extra delivery loops > > > (IRQ/event->epoll_callback->event_buffer). > > > > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an > > > mlocked userspace buffer, or some kernel pages mapped to userspace. > > > > Callbacks never sleep - they add event into list just like current > > implementation (maybe some lock must be changed from mutex to spinlock, > > I do not rememeber), main problem is binding to the file structure, > > which is heavy. > > I was referring to dropping an event directly to a userspace buffer, from > the poll callback. If pages are not there, you might sleep, and you can't > since the wakeup function holds a spinlock on the waitqueue head while > looping through the waiters to issue the wakeup. Also, you don't know from > where the poll wakeup is called. Ugh, no, that is very limited solution - memory must be either pinned (which leads to dos and limited ring buffer), or callback must sleep. Actually in any way there _must_ exist a queue - if ring buffer is full event is not allowed to be dropped - it must be stored in some other place, for example in queue from where entries will be read (copied) which ring buffer will have entries (that is how it is implemented in kevent at least). > File binding heavy? The first, and by *far* biggest, source of events > inside an event collector, of someone that cares about scalability, are > sockets. And those are already files. Second would be AIO, and those (if > performance figures agrees) can be hosted inside syslets/threadlets. > Then you fall into the no-care category, where the extra 100 bytes do not > make a case against the ability of using it with an existing POSIX > infrastructure (poll/select). Well, sockets are the files indeed, and sockets already are perfectly handled by epoll - but there are other users of petential interace - and it must be designed to scale in _any_ situation very well. Even if we right now do not have problems with some types of events, we must scale with any new one. > BTW, Linus made a signalfd sketch code time ago, to deliver signals to an > fd. Code remained there and nobody cared. Question: Was it because > 1) it had file bindings or 2) because nobody really cared to deliver > signals to an event collector? > And *if* later requirements come, you don't need to change the API by > adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new > XXEVENT-only submission structure. You create an API that automatically > makes that new abstraction work with POSIX poll/select, and you get epoll > support for free. Without even changing a bit in the epoll API. Well, we get epoll support for free, but we need to create tons of other interfaces and infrastructure for kernel users, and we need to change userspace anyway. But epoll support requires to have quite heavy bindings to file structure, so why don't we want to design new interface (since we need to change userspace anyway) so that it could allow to scale and be very memory optimized
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, 3 Mar 2007, Evgeniy Polyakov wrote: > > You've to excuse me if my memory is bad, but IIRC the whole discussion > > and loong benchmark feast born with you throwing a benchmark at Ingo > > (with kevent showing a 1.9x performance boost WRT epoll), not with you > > making any other point. > > So, how does it sound? > "Threadlets are bad for IO because kevent is 2 times faster than epoll?" > > I said threadlets are bad for IO (and we agreed that both approaches > shouldbe usedfor the maximum performance) because of rescheduling overhead - > tasks are quite heavy structuresa to move around - even pt_regs copy > takes more than event structure, but not because there is something in other > galaxy which might work faster than another something in another galaxy. > That was stupid even to think about that. Evgeny, other folks on this thread read what you said, so let's not drag this over. > > And if you really feel raw about the single O(nready) loop that epoll > > currently does, a new epoll_wait2 (or whatever) API could be used to > > deliver the event directly into a userspace buffer [1], directly from the > > poll callback, w/out extra delivery loops > > (IRQ/event->epoll_callback->event_buffer). > > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an > > mlocked userspace buffer, or some kernel pages mapped to userspace. > > Callbacks never sleep - they add event into list just like current > implementation (maybe some lock must be changed from mutex to spinlock, > I do not rememeber), main problem is binding to the file structure, > which is heavy. I was referring to dropping an event directly to a userspace buffer, from the poll callback. If pages are not there, you might sleep, and you can't since the wakeup function holds a spinlock on the waitqueue head while looping through the waiters to issue the wakeup. Also, you don't know from where the poll wakeup is called. File binding heavy? The first, and by *far* biggest, source of events inside an event collector, of someone that cares about scalability, are sockets. And those are already files. Second would be AIO, and those (if performance figures agrees) can be hosted inside syslets/threadlets. Then you fall into the no-care category, where the extra 100 bytes do not make a case against the ability of using it with an existing POSIX infrastructure (poll/select). BTW, Linus made a signalfd sketch code time ago, to deliver signals to an fd. Code remained there and nobody cared. Question: Was it because 1) it had file bindings or 2) because nobody really cared to deliver signals to an event collector? And *if* later requirements come, you don't need to change the API by adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new XXEVENT-only submission structure. You create an API that automatically makes that new abstraction work with POSIX poll/select, and you get epoll support for free. Without even changing a bit in the epoll API. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > >Threadlets can work with any functionas a base - if it would be > >recv-like it will limit possible case for parallel programming, so you > >can code anything in threadlets - it is not only about IO. > > What I'm trying to get to: keep things simple. The proposed > optimization by Ingo does nothing else but allowing AIO to probe file > cache - if data there to go with fast path. So why not to implement > what the people want - probing of cache? Because it sounds bad? But > they are in fact proposing precisely that just masked with "fast > threads". There can be other parts than just plain recv/read syscalls - you can create a logical processing entity and if it will block (as a whole, no matter where), the whole processing will continue as a new thread. And having different syscall to warm cache can end up in cache flush in between warming and processing itself. I'm not talking about cache warm up. And if we do - and that the whole freaking point of AIO - Linux IIRC pins freshly loaded clean pages anyway. So there would be problem but only under memory pressure. If you under memory pressure - you already lost the game and do not care about performance/what threads you are using. It is the whole "threadlets to threads on blocking" things doesn't sound convincing. It sounds more like "premature optimization". But anyway, not that I'm AIO specialist. For networking it is totally unnecessary since most applications which care have already rate control and buffer management built in. Network connections/sockets allows greater level of application control on what and how they do. Compared to blockdev's plain dumb read()/write() going through global cache. And not that (judging from interface) AIO changes that much - it is still dumb read() what IMHO makes no sense whatsoever to mmap() oriented Linux. -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, Mar 03, 2007 at 11:58:17AM +0100, Ihar `Philips` Filipau ([EMAIL PROTECTED]) wrote: > On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > >On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau > >([EMAIL PROTECTED]) wrote: > >> I'm not well versed in modern kernel development discussions, and > >> since you have put the thing into the networked context anyway, can > >> you please ask on lkml why (if they want threadlets solely for AIO) > >> not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT). > >> Developers already know the inteface, socket infrastructure is already > >> in kernel, etc. And it might do precisely what they want: access file > >> in disk cache - just like in case of socket it does access recv buffer > >> of socket. Why bother with implicit threads/waiting/etc - if all they > >> want some way to probe cache? > > > >Threadlets can work with any functionas a base - if it would be > >recv-like it will limit possible case for parallel programming, so you > >can code anything in threadlets - it is not only about IO. > > > > Ingo defined them as "plain function calls as long as they do not block". > > But when/what function could block? > > (1) File descriptors. Read. If data are in cache it wouldn't block. > Otherwise it would. Write. If there is space in cache it wouldn't > block. Otherwise it would. > > (2) Network sockets. Recv. If data are in buffer they wouldn't block. > Otherwise they would. Send. If there is space in send buffer it > wouldn't block. Otherwise it would. > > (3) Pipes, fifos & unix sockets. Unfortunately gain nothing since the > reliable local communication used mostly for control information > passing. If you have to block on such socket it most likely important > information anyway. (e.g. X server communication or sql query to SQL > server). (Or even less important here case of shell pipes.) And most > users here are all single threaded and I/O bound: they would gain > nothing from multi-threading - only PITA of added locking. > > What I'm trying to get to: keep things simple. The proposed > optimization by Ingo does nothing else but allowing AIO to probe file > cache - if data there to go with fast path. So why not to implement > what the people want - probing of cache? Because it sounds bad? But > they are in fact proposing precisely that just masked with "fast > threads". There can be other parts than just plain recv/read syscalls - you can create a logical processing entity and if it will block (as a whole, no matter where), the whole processing will continue as a new thread. And having different syscall to warm cache can end up in cache flush in between warming and processing itself. > -- > Don't walk behind me, I may not lead. > Don't walk in front of me, I may not follow. > Just walk beside me and be my friend. >-- Albert Camus (attributed to) -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau ([EMAIL PROTECTED]) wrote: > I'm not well versed in modern kernel development discussions, and > since you have put the thing into the networked context anyway, can > you please ask on lkml why (if they want threadlets solely for AIO) > not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT). > Developers already know the inteface, socket infrastructure is already > in kernel, etc. And it might do precisely what they want: access file > in disk cache - just like in case of socket it does access recv buffer > of socket. Why bother with implicit threads/waiting/etc - if all they > want some way to probe cache? Threadlets can work with any functionas a base - if it would be recv-like it will limit possible case for parallel programming, so you can code anything in threadlets - it is not only about IO. Ingo defined them as "plain function calls as long as they do not block". But when/what function could block? (1) File descriptors. Read. If data are in cache it wouldn't block. Otherwise it would. Write. If there is space in cache it wouldn't block. Otherwise it would. (2) Network sockets. Recv. If data are in buffer they wouldn't block. Otherwise they would. Send. If there is space in send buffer it wouldn't block. Otherwise it would. (3) Pipes, fifos & unix sockets. Unfortunately gain nothing since the reliable local communication used mostly for control information passing. If you have to block on such socket it most likely important information anyway. (e.g. X server communication or sql query to SQL server). (Or even less important here case of shell pipes.) And most users here are all single threaded and I/O bound: they would gain nothing from multi-threading - only PITA of added locking. What I'm trying to get to: keep things simple. The proposed optimization by Ingo does nothing else but allowing AIO to probe file cache - if data there to go with fast path. So why not to implement what the people want - probing of cache? Because it sounds bad? But they are in fact proposing precisely that just masked with "fast threads". -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 09:28:10AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: > > > do we really want to have per process signalfs, timerfs and so on - each > > simple structure must be bound to a file, which becomes too cost. > > I may be old school, but if you ask me, and if you *really* want those > events, yes. Reason? Unix's everything-is-a-file rule, and being able to > use them with *existing* POSIX poll/select. Remember, not every app > requires huge scalability efforts, so working with simpler and familiar > APIs is always welcome. > The *only* thing that was not practical to have as fd, was block requests. > But maybe threadlets/syslets will handle those just fine, and close the gap. That means that we bind very small object like timer or signal to the whoe file structure - yes, as I stated - it is doable, but do we really have to create a file each time create_timer() or signal() is called? Signals as a filesystem are limited in that regard that we need to create additional structures to have signal number<->private data relations. I designed kevent to be as small as possible, so I removed file binding idea first. I do not say it is wrong or epoll (and threadlets) are broken (fsck, I hope people do understand that), but as is it can not handle that scenario, so it must be extended and/or a lot of other stuff written to be compatible with epoll design. Kevent has different design (which allows to work with old one though - there is a patch to implement epoll over kevent). > - Davide > -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 09:13:40AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: > > > On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi > > (davidel@xmailserver.org) wrote: > > > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: > > > > > > > Ingo, do you really think I will send mails with faked benchmarks? :)) > > > > > > I don't think he ever implied that. He was only suggesting that when you > > > post benchmarks, and even more when you make claims based on benchmarks, > > > you need to be extra carefull about what you measure. Otherwise the > > > external view that you give to others does not look good. > > > Kevent can be really faster than epoll, but if you post broken benchmarks > > > (that can be, unrealiable HTTP loaders, broken server implemenations, > > > etc..) and make claims based on that, the only effect that you have is to > > > lose your point. > > > > So, I only talked that kevent is superior compared to epoll because (and > > it is _main_ issue) of its ability to handle essentially any kind of > > events with very small overhead (the same as epoll has in struct file - > > list and spinlock) and without significant price of struct file binding > > to event. > > You've to excuse me if my memory is bad, but IIRC the whole discussion > and loong benchmark feast born with you throwing a benchmark at Ingo > (with kevent showing a 1.9x performance boost WRT epoll), not with you > making any other point. So, how does it sound? "Threadlets are bad for IO because kevent is 2 times faster than epoll?" I said threadlets are bad for IO (and we agreed that both approaches shouldbe usedfor the maximum performance) because of rescheduling overhead - tasks are quite heavy structuresa to move around - even pt_regs copy takes more than event structure, but not because there is something in other galaxy which might work faster than another something in another galaxy. That was stupid even to think about that. > As far as epoll not being able to handle other events. Said who? Of > course, with zero modifications, you can handle zero additional events. > With modifications, you can handle other events. But lets talk about those > other events. The *only* kind of event that ppl (and being the epoll > maintainer I tend to receive those requests) missed in epoll, was AIO > events, That's the *only* thing that was missed by real life application > developers. And if something like threadlets/syslets will prove effective, > the gap is closed WRT that requirement. > Epoll handle already the whole class of pollable devices inside the > kernel, and if you exclude block AIO, that's a pretty wide class already. > The *existing* f_op->poll subsystem can be used to deliver events at the > poll-head wakeup time (by using the "key" member of the poll callback), so > that you don't even need the extra f_op->poll call to fetch events. > And if you really feel raw about the single O(nready) loop that epoll > currently does, a new epoll_wait2 (or whatever) API could be used to > deliver the event directly into a userspace buffer [1], directly from the > poll callback, w/out extra delivery loops > (IRQ/event->epoll_callback->event_buffer). Signals, futexes, timers and userspace events I was requested to add into kevent, so far only futexes are missed because I was asked to freeze development so other hackers could check the project. > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an > mlocked userspace buffer, or some kernel pages mapped to userspace. Callbacks never sleep - they add event into list just like current implementation (maybe some lock must be changed from mutex to spinlock, I do not rememeber), main problem is binding to the file structure, which is heavy. > - Davide > -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Sat, 3 Mar 2007, Ingo Molnar wrote: > * Davide Libenzi wrote: > > > [...] Status word and control bits should not be changed from > > underneath userspace AFAIK. [...] > > Note that the control bits do not just magically change during normal > FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense > to change those per-thread anyway. This is a non-issue anyway - what is > important is that the big bulk of 512 (or more) bytes of FPU state /are/ > callee-saved (both on 32-bit and on 64-bit), hence there's no need to > unlazy anything or to do expensive FPU state saves or other FPU juggling > around threadlet (or even syslet) use. Well, the unlazy/sync happen in any case later when we switch (given TS_USEDFPU set). We'd avoid a copy of it given the above conditions true. Wouldn't it makes sense to carry over only the status word and the control bits eventually? Also, if the caller saves the whole context, and if we're scheduled while inside a system call (not totally unfrequent case), can't we implement a smarter unlazy_fpu that avoids fxsave during schedule-out and frstor after schedule-in (do not do stts on this condition, so the newly scheduled task don't get a fault at all)? If the above conditions are true (no need context-copy for new head in async_exec), this should be possible too. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > Note that the control bits do not just magically change during normal > FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense > to change those per-thread anyway. This is a non-issue anyway - what is > important is that the big bulk of 512 (or more) bytes of FPU state /are/ > callee-saved (both on 32-bit and on 64-bit), hence there's no need to ^ caller-saved > unlazy anything or to do expensive FPU state saves or other FPU juggling > around threadlet (or even syslet) use. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Davide Libenzi wrote: > [...] Status word and control bits should not be changed from > underneath userspace AFAIK. [...] Note that the control bits do not just magically change during normal FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense to change those per-thread anyway. This is a non-issue anyway - what is important is that the big bulk of 512 (or more) bytes of FPU state /are/ callee-saved (both on 32-bit and on 64-bit), hence there's no need to unlazy anything or to do expensive FPU state saves or other FPU juggling around threadlet (or even syslet) use. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Nicholas Miell wrote: > On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote: > > On Fri, 2 Mar 2007, Nicholas Miell wrote: > > > > > The point Ingo was making is that the x86 ABI already requires the FPU > > > context to be saved before *all* function calls. > > > > I've not seen that among Ingo's points, but yeah some status is caller > > saved. But, aren't things like status word and control bits callee saved? > > If that's the case, it might require proper handling. > > > > Ingo mentioned it in one of the parts you cut out of your reply: > > > and here is where thinking about threadlets as a function call and not > > as an asynchronous context helps alot: the classic gcc convention for > > FPU use & function calls should apply: gcc does not call an external > > function with an in-use FPU stack/register, it always neatly unuses it, > > as no FPU register is callee-saved, all are caller-saved. > > The i386 psABI is ancient (i.e. it predates SSE, so no mention of the > XMM or MXCSR registers) and a bit vague (no mention at all of the FP > status word), but I'm fairly certain that Ingo is right. I'm not sure if that's the case. I'd be happy if it was, but I'm afraid it's not. Status word and control bits should not be changed from underneath userspace AFAIK. The ABI I remember tells me that those are callee saved. A quick gcc asm test tells me that too. And assuming that's the case, why don't we have a smarter unlazy_fpu() then, that avoid FPU context sync if we're scheduled while inside a syscall (this is no different than an enter inside sys_async_exec - userspace should have taken care of it)? IMO a syscall enter should not assume that userspace took care of saving the whole FPU context. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 05:36:01PM -0800, Nicholas Miell wrote: > > as an asynchronous context helps alot: the classic gcc convention for > > FPU use & function calls should apply: gcc does not call an external > > function with an in-use FPU stack/register, it always neatly unuses it, > > as no FPU register is callee-saved, all are caller-saved. > > The i386 psABI is ancient (i.e. it predates SSE, so no mention of the > XMM or MXCSR registers) and a bit vague (no mention at all of the FP > status word), but I'm fairly certain that Ingo is right. The FPU status word *must* be saved, as the rounding behaviour and error mode bits are assumed to be preserved. Iow, yes, there is state which is required. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <[EMAIL PROTECTED]>. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote: > On Fri, 2 Mar 2007, Nicholas Miell wrote: > > > The point Ingo was making is that the x86 ABI already requires the FPU > > context to be saved before *all* function calls. > > I've not seen that among Ingo's points, but yeah some status is caller > saved. But, aren't things like status word and control bits callee saved? > If that's the case, it might require proper handling. > Ingo mentioned it in one of the parts you cut out of your reply: > and here is where thinking about threadlets as a function call and not > as an asynchronous context helps alot: the classic gcc convention for > FPU use & function calls should apply: gcc does not call an external > function with an in-use FPU stack/register, it always neatly unuses it, > as no FPU register is callee-saved, all are caller-saved. The i386 psABI is ancient (i.e. it predates SSE, so no mention of the XMM or MXCSR registers) and a bit vague (no mention at all of the FP status word), but I'm fairly certain that Ingo is right. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Nicholas Miell wrote: > The point Ingo was making is that the x86 ABI already requires the FPU > context to be saved before *all* function calls. I've not seen that among Ingo's points, but yeah some status is caller saved. But, aren't things like status word and control bits callee saved? If that's the case, it might require proper handling. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2007-03-02 at 12:53 -0800, Davide Libenzi wrote: > On Fri, 2 Mar 2007, Ingo Molnar wrote: > > > > > * Davide Libenzi wrote: > > > > > I think that the "dirty" FPU context must, at least, follow the new > > > head. That's what the userspace sees, and you don't want an async_exec > > > to re-emerge with a different FPU context. > > > > well. I think there's some confusion about terminology, so please let me > > describe everything in detail. This is how execution goes: > > > > outer loop() { > > call_threadlet(); > > } > > > > this all runs in the 'head' context. call_threadlet() always switches to > > the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, > > while executing the threadlet function, we block, then the > > threadlet-thread gets to keep the task (the threadlet stack and also the > > FPU), and blocks - and we pick a 'new head' from the thread pool and > > continue executing in that context - right after the call_threadlet() > > function, in the 'old' head's stack. I.e. it's as if we returned > > immediately from call_threadlet(), with a return code that signals that > > the 'threadlet went async'. > > > > now, the FPU state that was when the threadlet blocked is totally > > meaningless to the 'new head' - that FPU state is from the middle of the > > threadlet execution. > > For threadlets, it might be. Now think about a task wanting to dispatch N > parallel AIO requests as N independent syslets. > Think about this task having USEDFPU set, so the FPU context is dirty. > When it returns from async_exec, with one of the requests being become > sleepy, it needs to have the same FPU context it had when it entered, > otherwise it won't prolly be happy. > For the same reason a schedule() must preserve/sync the "prev" FPU > context, to be reloaded at the next FPU fault. The point Ingo was making is that the x86 ABI already requires the FPU context to be saved before *all* function calls. Unfortunately, this isn't true of other ABIs -- looking over the psABIs specs I have laying around, AMD64, PPC64, and MIPS require at least part of the FPU state to be preserved across function calls, and I'm sure this is also true of others. Then there's the other nasty details of new thread creation -- thankfully, the contents of the TLS isn't inherited from the parent thread, but it still needs to be initialized; not to mention all the other details involved in pthread creation and destruction. I don't see any way around the pthread issues other than making a libc upcall on return from the first system call that blocked. -- Nicholas Miell <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On 3/2/07, Davide Libenzi wrote: For threadlets, it might be. Now think about a task wanting to dispatch N parallel AIO requests as N independent syslets. Think about this task having USEDFPU set, so the FPU context is dirty. When it returns from async_exec, with one of the requests being become sleepy, it needs to have the same FPU context it had when it entered, otherwise it won't prolly be happy. For the same reason a schedule() must preserve/sync the "prev" FPU context, to be reloaded at the next FPU fault. And if you actually think this through, I think you will arrive at (a subset of) the conclusions I did a week ago: to keep the threadlets lightweight enough to schedule and migrate cheaply, they can't be allowed to "own" their own FPU and TLS context. They have to be allowed to _use_ the FPU (or they're useless) and to _use_ TLS (or they can't use any glibc wrapper around a syscall, since they practically all set the thread-local errno). But they have to "quiesce" the FPU and stash any thread-local state they want to keep on their stack before entering the next syscall, or else it'll get clobbered. Keep thinking, especially about FPU flags, and you'll see why threadlets spawned from the _same_ threadlet entrypoint should all run in the same pool of threads, one per CPU, while threadlets from _different_ entrypoints should never run in the same thread (FPU/TLS context). You'll see why threadlets in the same pool shouldn't be permitted to preempt one another except at syscalls that block, and the cost of preempting the real thread associated with one threadlet pool with another real thread associated with a different threadlet pool is the same as any other thread switch. At which point, threadlet pools are themselves first-class objects (to use the snake oil phrase), and might as well be enhanced to a data structure that has efficient operations for reprioritization, bulk cancellation, and all that jazz. Did I mention that there is actually quite a bit of prior art in this area, which makes a much better guide to the design of round wheels than micro-benchmarks do? Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: > > * Davide Libenzi wrote: > > > I think that the "dirty" FPU context must, at least, follow the new > > head. That's what the userspace sees, and you don't want an async_exec > > to re-emerge with a different FPU context. > > well. I think there's some confusion about terminology, so please let me > describe everything in detail. This is how execution goes: > > outer loop() { > call_threadlet(); > } > > this all runs in the 'head' context. call_threadlet() always switches to > the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, > while executing the threadlet function, we block, then the > threadlet-thread gets to keep the task (the threadlet stack and also the > FPU), and blocks - and we pick a 'new head' from the thread pool and > continue executing in that context - right after the call_threadlet() > function, in the 'old' head's stack. I.e. it's as if we returned > immediately from call_threadlet(), with a return code that signals that > the 'threadlet went async'. > > now, the FPU state that was when the threadlet blocked is totally > meaningless to the 'new head' - that FPU state is from the middle of the > threadlet execution. For threadlets, it might be. Now think about a task wanting to dispatch N parallel AIO requests as N independent syslets. Think about this task having USEDFPU set, so the FPU context is dirty. When it returns from async_exec, with one of the requests being become sleepy, it needs to have the same FPU context it had when it entered, otherwise it won't prolly be happy. For the same reason a schedule() must preserve/sync the "prev" FPU context, to be reloaded at the next FPU fault. > > So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU > > context with an early unlazy_fpu(), *and* copy the sync'd FPU context > > to the new head. This should really be a fork of the dirty FPU context > > IMO, and should only happen if the USEDFPU bit is set. > > why? The only effect this will have is a slowdown :) The FPU context > from the middle of the threadlet function is totally meaningless to the > 'new head'. It might be anything. (although in practice system calls are > almost never called with a truly in-use FPU.) See above ;) - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Davide Libenzi wrote: > I think that the "dirty" FPU context must, at least, follow the new > head. That's what the userspace sees, and you don't want an async_exec > to re-emerge with a different FPU context. well. I think there's some confusion about terminology, so please let me describe everything in detail. This is how execution goes: outer loop() { call_threadlet(); } this all runs in the 'head' context. call_threadlet() always switches to the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, while executing the threadlet function, we block, then the threadlet-thread gets to keep the task (the threadlet stack and also the FPU), and blocks - and we pick a 'new head' from the thread pool and continue executing in that context - right after the call_threadlet() function, in the 'old' head's stack. I.e. it's as if we returned immediately from call_threadlet(), with a return code that signals that the 'threadlet went async'. now, the FPU state that was when the threadlet blocked is totally meaningless to the 'new head' - that FPU state is from the middle of the threadlet execution. and here is where thinking about threadlets as a function call and not as an asynchronous context helps alot: the classic gcc convention for FPU use & function calls should apply: gcc does not call an external function with an in-use FPU stack/register, it always neatly unuses it, as no FPU register is callee-saved, all are caller-saved. > So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU > context with an early unlazy_fpu(), *and* copy the sync'd FPU context > to the new head. This should really be a fork of the dirty FPU context > IMO, and should only happen if the USEDFPU bit is set. why? The only effect this will have is a slowdown :) The FPU context from the middle of the threadlet function is totally meaningless to the 'new head'. It might be anything. (although in practice system calls are almost never called with a truly in-use FPU.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: > > * Davide Libenzi wrote: > > > [...] We're still missing proper FPU context switch in the > > move_user_context(). [...] > > yeah - i'm starting to be of the opinion that the FPU context should > stay with the threadlet, exclusively. I.e. when calling a threadlet, the > 'outer loop' (the event loop) should not leak FPU context into the > threadlet and then expect it to be replicated from whatever random point > the threadlet ended up sleeping at. It would be possible, but it just > makes no sense. What makes most sense is to just keep the FPU context > with the threadlet, and to let the 'new head' use an initial (unused) > FPU context. And it's in fact the threadlet that will most likely have > an acrive FPU context across a system call, not the outer loop. In other > words: no special FPU support needed at all for threadlets (i.e. no > flipping needed even) - this behavior just naturally happens in the > current implementation. Hm? I think that the "dirty" FPU context must, at least, follow the new head. That's what the userspace sees, and you don't want an async_exec to re-emerge with a different FPU context. I think it should also follow the async thread (old, going-to-sleep, thread), since a threadlet might have that dirtied, and as a consequence it'll want to find it back when it's re-scheduled. So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU context with an early unlazy_fpu(), *and* copy the sync'd FPU context to the new head. This should really be a fork of the dirty FPU context IMO, and should only happen if the USEDFPU bit is set. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Davide Libenzi wrote: > [...] We're still missing proper FPU context switch in the > move_user_context(). [...] yeah - i'm starting to be of the opinion that the FPU context should stay with the threadlet, exclusively. I.e. when calling a threadlet, the 'outer loop' (the event loop) should not leak FPU context into the threadlet and then expect it to be replicated from whatever random point the threadlet ended up sleeping at. It would be possible, but it just makes no sense. What makes most sense is to just keep the FPU context with the threadlet, and to let the 'new head' use an initial (unused) FPU context. And it's in fact the threadlet that will most likely have an acrive FPU context across a system call, not the outer loop. In other words: no special FPU support needed at all for threadlets (i.e. no flipping needed even) - this behavior just naturally happens in the current implementation. Hm? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Davide Libenzi wrote: > And if you really feel raw about the single O(nready) loop that epoll > currently does, a new epoll_wait2 (or whatever) API could be used to > deliver the event directly into a userspace buffer [1], directly from the > poll callback, w/out extra delivery loops > (IRQ/event->epoll_callback->event_buffer). And if you ever wonder from where the "epoll" name came, it came from the old /dev/epoll. The epoll predecessor /dev/epoll, was adding plugs everywhere events where needed and was delivering those events in O(1) *directly* on a user visible (mmap'd) buffer, in a zero-copy fashion. The old /dev/epoll was faster the the current epoll, but the latter was chosen because despite being sloghtly slower, it had support for every pollable device, *without* adding more plugs into the existing code. Performance and code maintainance are not to be taken disjointly whenever you evaluate a solution. That's the reason I got excited about this new generic AIO slution. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Ingo Molnar wrote: > > After your changes epoll increased to 5k. > > Can we please stop this pointless episode of benchmarketing, where every > mail of yours shows different results and you even deny having said > something which you clearly said just a few days ago? At this point i > simply cannot trust the numbers you are posting, nor is the discussion > style you are following productive in any way in my opinion. Agreed. Can we focus on the topic here? We're still missing proper FPU context switch in the move_user_context(). In v6? - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: > do we really want to have per process signalfs, timerfs and so on - each > simple structure must be bound to a file, which becomes too cost. I may be old school, but if you ask me, and if you *really* want those events, yes. Reason? Unix's everything-is-a-file rule, and being able to use them with *existing* POSIX poll/select. Remember, not every app requires huge scalability efforts, so working with simpler and familiar APIs is always welcome. The *only* thing that was not practical to have as fd, was block requests. But maybe threadlets/syslets will handle those just fine, and close the gap. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, 2 Mar 2007, Evgeniy Polyakov wrote: > On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi > (davidel@xmailserver.org) wrote: > > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: > > > > > Ingo, do you really think I will send mails with faked benchmarks? :)) > > > > I don't think he ever implied that. He was only suggesting that when you > > post benchmarks, and even more when you make claims based on benchmarks, > > you need to be extra carefull about what you measure. Otherwise the > > external view that you give to others does not look good. > > Kevent can be really faster than epoll, but if you post broken benchmarks > > (that can be, unrealiable HTTP loaders, broken server implemenations, > > etc..) and make claims based on that, the only effect that you have is to > > lose your point. > > So, I only talked that kevent is superior compared to epoll because (and > it is _main_ issue) of its ability to handle essentially any kind of > events with very small overhead (the same as epoll has in struct file - > list and spinlock) and without significant price of struct file binding > to event. You've to excuse me if my memory is bad, but IIRC the whole discussion and loong benchmark feast born with you throwing a benchmark at Ingo (with kevent showing a 1.9x performance boost WRT epoll), not with you making any other point. As far as epoll not being able to handle other events. Said who? Of course, with zero modifications, you can handle zero additional events. With modifications, you can handle other events. But lets talk about those other events. The *only* kind of event that ppl (and being the epoll maintainer I tend to receive those requests) missed in epoll, was AIO events, That's the *only* thing that was missed by real life application developers. And if something like threadlets/syslets will prove effective, the gap is closed WRT that requirement. Epoll handle already the whole class of pollable devices inside the kernel, and if you exclude block AIO, that's a pretty wide class already. The *existing* f_op->poll subsystem can be used to deliver events at the poll-head wakeup time (by using the "key" member of the poll callback), so that you don't even need the extra f_op->poll call to fetch events. And if you really feel raw about the single O(nready) loop that epoll currently does, a new epoll_wait2 (or whatever) API could be used to deliver the event directly into a userspace buffer [1], directly from the poll callback, w/out extra delivery loops (IRQ/event->epoll_callback->event_buffer). [1] From the epoll callback, we cannot sleep, so it's gonna be either an mlocked userspace buffer, or some kernel pages mapped to userspace. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 11:57:13AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > > > > [...] The numbers are still highly suspect - and we are already > > > > > down from the prior claim of kevent being almost twice as fast > > > > > to a 25% difference. > > > > > > > > Btw, there were never almost twice perfromance increase - epoll in > > > > my tests always showed 4-5 thousands requests per second, kevent - > > > > up to 7 thausands. > > > > > > i'm referring to your claim in this mail of yours from 4 days ago > > > for example: > > > > > > http://lkml.org/lkml/2007/2/25/116 > > > > > > "But note, that on my athlon64 3500 test machine kevent is about 7900 > > > requests per second compared to 4000+ epoll, so expect a challenge." > > > > > > no matter how i look at it, but 7900 is 1.9 times 4000 - which is > > > "almost twice". > > > > After your changes epoll increased to 5k. > > Can we please stop this pointless episode of benchmarketing, where every > mail of yours shows different results and you even deny having said > something which you clearly said just a few days ago? At this point i > simply cannot trust the numbers you are posting, nor is the discussion > style you are following productive in any way in my opinion. I just show what I see in tests - I do not perform deep analysis of that, since I do not see why it should be done - it is not fake, it is not fantasy - real behaviour which is observed in my test machine, if it will sudenly change I will report it. Btw, I showed cases when epoll behaved better than kevent and performance was unbeatable 9k requests per second - I do not know, why it happend - maybe some cache related issues, other process all slept in once, increased radiation or strong wind blew away my bad aura - it is not reproducible on demand too. > (you are never ever wrong, and if you are proven wrong on topic A you > claim it is an irrelevant topic (without even admitting you were wrong > about it) and you point to topic B claiming it's the /real/ topic you > talked about all along. And along the way you are slandering other > projects like epoll and threadlets, distorting the discussion. This kind > of keep-the-ball-moving discussion style is effective in politics but > IMO it's a waste of time when developing a kernel.) Heh - that is why I'm not subscribed to lkml@ - it tooo frequently ends up with politics :) What we are talking about - we try to insult each other with something, that was supposed to be said after some assumption on theoretical mental exercise? I can only laugh on that :) Ingo, I never ever tried to show that something is broken - that is fantasy based on straight words, not on the real intension. I never said epoll is broken. Absolutely. I never said threadlet is broken. Absolutely. I just showed that it is not (in my opinion) right decision to use threadlets for IO based model instead of event driven - it is not based on kevent performance (I _never_ stated it as a main factor - kevent was only an example of event driven model, you were confused it with kevent AIO, which is different beast), but instead on experience with nptl threads and linuxthreads, and related rescheduling overhead compared to userspace one. I showed kevent as a possible usage scenario - since it does support own AIO. And you started to fight against it in every detail, since you think kevent is not a good idea to handle AIO model - well, it can be pefectly correct, I showed kevent AIO (please do not think that kevent and kevent AIO are the same - the latter is just one of the possible users I implemented, it only uses kevent to deliver completion event to userspace) as possible AIO implementation, but not _kevent_ itself. But somehow we ended with binding to me some words I never said and ideas I never based my assumptions on... I do not really think you even remotely wanted to make any somehow personal assumptions on what we had discussed. We even concluded, that perfect IO model should use both approaches to really scale - both threadlets with its on-demand-only rescheduling, and event driven ring. You pointed your opinion on kevents - well, I can not agree with it, but that is your right not to like something. Let's not continue bad practice of kicking each other just because there were some problematic roots which noone even remember correctly - let's do not make a mistake of pointing something personal out of trivial bits - if you will be in Russia of around any time soon I will happily buy you a beer or what you prefer :) So, let's just draw a line: kevent was showed to people, and its performance, although flacky, is a bit faster than epoll. Threadlets bound to any event driven ring do not show any performance degradation in network driven setup with small number of reschedulings with all advantages of simpler programming. So, repeating myself, both models (not kevent and threa
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 11:56:18AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > Even if kevent has the same speed, it still allows to handle _any_ > > kind of events without any major surgery - a very tiny structure of > > lock and list head and you can process your own kernel event in > > userspace with timers, signals, io events, private userspace events > > and others without races and invention of differnet hacks for > > different types - _this_ is main point. > > did it ever occur to you to ... extend epoll? To speed it up? To add a > new wait syscall to it? Instead of introducing a whole new parallel > framework? Yes, I thought about its extension more than a year ago before started kevent, but epoll() is absolutely based on file structure and its file_operations with poll methodt, so it is quite impossible to work with sockets to implement network AIO. Eventually it had gathered a lot of other systems - do we really want to have per process signalfs, timerfs and so on - each simple structure must be bound to a file, which becomes too cost. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > > [...] The numbers are still highly suspect - and we are already > > > > down from the prior claim of kevent being almost twice as fast > > > > to a 25% difference. > > > > > > Btw, there were never almost twice perfromance increase - epoll in > > > my tests always showed 4-5 thousands requests per second, kevent - > > > up to 7 thausands. > > > > i'm referring to your claim in this mail of yours from 4 days ago > > for example: > > > > http://lkml.org/lkml/2007/2/25/116 > > > > "But note, that on my athlon64 3500 test machine kevent is about 7900 > > requests per second compared to 4000+ epoll, so expect a challenge." > > > > no matter how i look at it, but 7900 is 1.9 times 4000 - which is > > "almost twice". > > After your changes epoll increased to 5k. Can we please stop this pointless episode of benchmarketing, where every mail of yours shows different results and you even deny having said something which you clearly said just a few days ago? At this point i simply cannot trust the numbers you are posting, nor is the discussion style you are following productive in any way in my opinion. (you are never ever wrong, and if you are proven wrong on topic A you claim it is an irrelevant topic (without even admitting you were wrong about it) and you point to topic B claiming it's the /real/ topic you talked about all along. And along the way you are slandering other projects like epoll and threadlets, distorting the discussion. This kind of keep-the-ball-moving discussion style is effective in politics but IMO it's a waste of time when developing a kernel.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > Even if kevent has the same speed, it still allows to handle _any_ > kind of events without any major surgery - a very tiny structure of > lock and list head and you can process your own kernel event in > userspace with timers, signals, io events, private userspace events > and others without races and invention of differnet hacks for > different types - _this_ is main point. did it ever occur to you to ... extend epoll? To speed it up? To add a new wait syscall to it? Instead of introducing a whole new parallel framework? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Fri, Mar 02, 2007 at 11:27:14AM +0100, Pavel Machek ([EMAIL PROTECTED]) wrote: > Maybe. It is not up to me to decide. But "it is faster" is _not_ the > only merge criterium. Of course not! Even if kevent has the same speed, it still allows to handle _any_ kind of events without any major surgery - a very tiny structure of lock and list head and you can process your own kernel event in userspace with timers, signals, io events, private userspace events and others without races and invention of differnet hacks for different types - _this_ is main point. > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Hi! > > > > If you can replace them with something simpler, and no worse than 10% > > > > slower in worst case, then go ahead. (We actually tried to do that at > > > > some point, only to realize that efence stresses vm subsystem in very > > > > unexpected/unfriendly way). > > > > > > Agh, only 10% in the worst case. > > > I think you can not even imagine what tricks network uses to get at > > > least aditional 1% out of the box. > > > > Yep? Feel free to rewrite networking to assembly on Eugenix. That > > should get you 1% improvement. If you reserve few registers to be only > > used by kernel (not allowed by userspace), you can speedup networking > > 5%, too. Ouch and you could turn off MMU, that is sure way to get few > > more percent improvement in your networking case. > > It is not _my_ networking, but taht one you use everyday in every Linux > box. Notice which tricks are used to remove single byte from > sk_buff. Ok, so tricks were worth it in sk_buff case. > It is called optimization, and if it does us a single plus it must be > implemented. Not all people have magical fear of new things. But that does not mean "every optimalization must be implemented". Only optimalizations that are "worth it" are... > > > Using such logic you can just abandon any further development, since it > > > work as is right now. > > > > Stop trying to pervert my logic. > > Ugh? :) > I just say in simple words your 'we do not need something if adds 10%, > but is complex to understand'. Yes... but that does not mean "stop development". You are still free to clean up the code _while_ making it faster. > > If your code is so complex that it is almost impossible to use from > > userspace, that is good enough reason not to be merged. "But it is 3% > > faster if..." is not a good-enough argument. > > Is it enough for you? > > epoll 4794.23 req/sec > kevent 6468.95 req/sec Maybe. It is not up to me to decide. But "it is faster" is _not_ the only merge criterium. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi (davidel@xmailserver.org) wrote: > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: > > > Ingo, do you really think I will send mails with faked benchmarks? :)) > > I don't think he ever implied that. He was only suggesting that when you > post benchmarks, and even more when you make claims based on benchmarks, > you need to be extra carefull about what you measure. Otherwise the > external view that you give to others does not look good. > Kevent can be really faster than epoll, but if you post broken benchmarks > (that can be, unrealiable HTTP loaders, broken server implemenations, > etc..) and make claims based on that, the only effect that you have is to > lose your point. We seems to move far away from original topic - I never built any assumptions on top of kevent _performance_ - kevent is a logical extrapolation of the epoll, I only showed that event driven model can be fast and it outperforms threadlet one - after we changed topic we were unable to actually test threadlets in networking environment, since the only test I ran showed that threadlest do not reschedule at all, and Ingo's tests showed small number of reschedulings. So, I only talked that kevent is superior compared to epoll because (and it is _main_ issue) of its ability to handle essentially any kind of events with very small overhead (the same as epoll has in struct file - list and spinlock) and without significant price of struct file binding to event. I did not want and do not want to hurt anyone (even Ingo, although he is against kevent :), but my opinion is that thread moved from nice discussion about threads and events with jokes and fun into quite angry word throwings, and that is too good - let's make it fun again. I'm not a native english speaker (and do not use a dictionary), so it is quite possible that some my phrases were not exactly nice, but it was unintentional (at least not very) :) Peace? > - Davide > -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, 1 Mar 2007, Ingo Molnar wrote: > > wrt. one-shot syscalls, the user-space stack footprint would still > probably be there, because even async contexts that only do single-shot > processing need to drop out of kernel mode to handle signals. Why? The easiest thing to do with signals is to just not pick them up. If the signal was to that *particular* threadlet (ie a "cancel"), then we just want to kill the threadlet. And if the signal was to the thread group, there is no reason why the threadlet should pick it up. In neither case is there *any* reason to handle the signal in the threadlet, afaik. And having to have a stack allocation for each threadlet certainly means that you complicate things a lot. Suddenly you have allocations that can't just go away. Again, I'm pointing to the problems I already pointed out with the allocations of the atom structures - quite often you do *not* want to keep track of anything specific for completion time, and that means that you MUST NOT have to de-allocate anythign either. Again, think aio_read(). With the *exact* current binary interface. PLEASE. If you cannot emulate that with threadlets, then threadlets are *pointless*. On eof the major reasons for the whole exercise was to get rid of the special code in fs/aio.c. So I repeat: if you cannot do that, and remain binary compatible, don't even bother. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
David Lang wrote: On Thu, 1 Mar 2007, Johann Borck wrote: I reported this a while ago and suggested to have the number of pending accepts reported with the event to save that last syscall. I created an ab replacement based on kevent, and at least with my machines, which are comparable to each other, the load on client dropped from 100% to 2% or something. ab just doesn't give meaningful results (if the client is not way more powerful). With that new client I get very similar results for epoll and kevent, from 1000 through to 26000 concurrent requests, the results have been posted on kevent-homepage in october, I just checked it with new version, but there's no significant difference. this is the benchmark with kevent-based client: http://tservice.net.ru/~s0mbre/blog/2006/10/11#2006_10_11 btw, each result is average over 1,000,000 requests and just for comparison, this is on the same machines using ab: http://tservice.net.ru/~s0mbre/blog/2006/10/08#2006_10_08 is this client avaialble? and what patches need to be added to the kernel to use it? It's based on an older version of kevent, so I'll have to adapt it a bit for use with recent patch, no other than kevent is necessary. I'll post a link when it's cleaned up, if you want. Johann David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, 1 Mar 2007, Johann Borck wrote: I reported this a while ago and suggested to have the number of pending accepts reported with the event to save that last syscall. I created an ab replacement based on kevent, and at least with my machines, which are comparable to each other, the load on client dropped from 100% to 2% or something. ab just doesn't give meaningful results (if the client is not way more powerful). With that new client I get very similar results for epoll and kevent, from 1000 through to 26000 concurrent requests, the results have been posted on kevent-homepage in october, I just checked it with new version, but there's no significant difference. this is the benchmark with kevent-based client: http://tservice.net.ru/~s0mbre/blog/2006/10/11#2006_10_11 btw, each result is average over 1,000,000 requests and just for comparison, this is on the same machines using ab: http://tservice.net.ru/~s0mbre/blog/2006/10/08#2006_10_08 is this client avaialble? and what patches need to be added to the kernel to use it? David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
Oh boy, wasn't this thread supposed to focus the syslets/threadlets ... :) On Thu, 1 Mar 2007, Eric Dumazet wrote: > On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote: > > They are there, since ab runs only 50k requests. > > If I change it to something noticebly more than 50/80k, ab crashes: > > # ab -c8000 -t 600 -n8 http://192.168.0.48/ > > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 > > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > > > > Benchmarking 192.168.0.48 (be patient) > > Segmentation fault > > > > Are there any other tool suitable for such loads? > > I only tested httperf (which is worse, since it uses poll/select) and > > 'ab'. > > > > Btw, host machine runs 100% too, so it is possible that client side is > > broken (too). > > I have similar problems here, ab test just doesnt complete... > > I am still investigating with strace and tcpdump. > > In the meantime you could just rewrite it (based on epoll please :) ), since > it should be quite easy to do this (reverse of evserver_epoll) I have a simple one based on coroutines and epoll. You need libpcl and coronet. Debian hs a package named libpcl1-dev for libpcl, otherwise: http://www.xmailserver.org/libpcl.html and 'configure --prefix=/usr && sudo make install'. Coronet is here: http://www.xmailserver.org/coronet-lib.html here just 'configure && make'. Inside the "test" directory there a simple loader named cnhttpload: cnhttpload -s HOST -n NCON [-p PORT (80)] [-r NREQS (1)] [-S STKSIZE (8192)] [-M MAXCONNS] [-t TMUPD (1000)] [-a NACTIVE] [-T TMSAMP (200)] [-h] URL ... HOST = Target host PORT = Target host port NCON = Number of connections to the server NACTIVE = Number of active (live) connections STKSIZE = Stack size for coroutines NREQS = Number of request done for each connection (better be 1 if your server do not support keep-alive) MAXCONNS = Maximum number of total connections done to the server. If not set, the test will continue forever (well, till a ^C) TMUPD = Millisec time of stats update TMSAMP= Millisec internal average-update time URL = Target doc (not http:// or host, just doc path) So for the particular test my inbox was flooded with :), you'd use: cnhttpload -s HOST -n 8 -a 8000 -S 4096 - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: > Ingo, do you really think I will send mails with faked benchmarks? :)) I don't think he ever implied that. He was only suggesting that when you post benchmarks, and even more when you make claims based on benchmarks, you need to be extra carefull about what you measure. Otherwise the external view that you give to others does not look good. Kevent can be really faster than epoll, but if you post broken benchmarks (that can be, unrealiable HTTP loaders, broken server implemenations, etc..) and make claims based on that, the only effect that you have is to lose your point. - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, 1 Mar 2007, Ingo Molnar wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > I posted kevent/epoll benchmarks and related design issues too many > > times both with handmade applications (which might be broken as hell) > > and popular open-source servers to repeat them again. > > numbers are crutial here - and given the epoll bugs in the evserver code > that we found, do you have updated evserver benchmark results that > compare epoll to kevent? I'm wondering why epoll has half the speed of > kevent in those measurements - i suspect some possible benchmarking bug. > The queueing model of epoll and kevent is roughly comparable, both do > only a constant number of steps to serve one particular request, > regardless of how many pending connections/requests there are. What is > the CPU utilization of the server system during an epoll test, and what > is the CPU utilization during a kevent test? 100% utilized in both > cases? With 8K concurrent (live) connections, we may also want to try with the v3 version of the epoll-event-loops-diet patch ;) - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 04:41:27PM +0100, Eric Dumazet wrote: I had to loop on accept() : for (i=0; i On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: The same here - I would just enable a debug to find it. I reported this a while ago and suggested to have the number of pending accepts reported with the event to save that last syscall. I created an ab replacement based on kevent, and at least with my machines, which are comparable to each other, the load on client dropped from 100% to 2% or something. ab just doesn't give meaningful results (if the client is not way more powerful). With that new client I get very similar results for epoll and kevent, from 1000 through to 26000 concurrent requests, the results have been posted on kevent-homepage in october, I just checked it with new version, but there's no significant difference. this is the benchmark with kevent-based client: http://tservice.net.ru/~s0mbre/blog/2006/10/11#2006_10_11 btw, each result is average over 1,000,000 requests and just for comparison, this is on the same machines using ab: http://tservice.net.ru/~s0mbre/blog/2006/10/08#2006_10_08 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, 1 Mar 2007, Evgeniy Polyakov wrote: On Thu, Mar 01, 2007 at 08:56:28AM -0800, David Lang ([EMAIL PROTECTED]) wrote: the ab numbers below do not seem that impressive to me, especially for such stripped down server processes. ... client and server are dual opteron 252 with 8G of ram, running debian in 64 bit mode Decrease your hardware setup in 2-4 times, leave only one apache process and try to get the same - we are not talking about how to create a perfect web server, instead we try to focus possible problems in epoll/kevent event driven logic. for apache I agree that the target box was maxed out, so if you only had a single core on your AMD64 box that would be about half, however the thttpd is only useing ~1 of the CPU's (OS overhead is useing just a smidge of the second, but overall the box is 45-48% idle if the amount of ram is an issue then you are swapping in your tests (or at least throwing out cache that you need) and so would not be testing what you think you are. Vanilla (epoll) lighttpd shows 4000-5000 requests per second in my setup (no logs). Default mpm-apache2 with bunch of threads - about 8k req/s. Default thttpd (disabled logging) - about 2k req/s Btw, all your tests are network bound, try to decrease html page size to get actual event processing speed out of that machines. same test retreiving a ~128b file the server never gets below 51% idle (so it's only useing one CPU) Server Software:thttpd/2.23beta1 Server Hostname:208.2.188.5 Server Port:81 Document Path: /128b Document Length:136 bytes Concurrency Level: 8000 Time taken for tests: 9.372902 seconds Complete requests: 8 Failed requests:0 Write errors: 0 Total transferred: 30762842 bytes HTML transferred: 10952216 bytes Requests per second:8535.24 [#/sec] (mean) Time per request: 937.290 [ms] (mean) Time per request: 0.117 [ms] (mean, across all concurrent requests) Transfer rate: 3205.09 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 36 287 1125.6 739109 Processing:49 89 19.8 87 339 Waiting: 17 62 16.4 62 292 Total: 92 376 1137.41599262 Percentage of the requests served within a certain time (ms) 50%159 66%164 75%165 80%165 90%203 95%260 98% 3233 99% 9201 100% 9262 (longest request) note that this is showing the slowdown from the large concurrancy level, if I reduce the concurrancy level to 500 I get Document Path: /128b Document Length:136 bytes Concurrency Level: 500 Time taken for tests: 4.215025 seconds Complete requests: 8 Failed requests:0 Write errors: 0 Total transferred: 30565348 bytes HTML transferred: 10881904 bytes Requests per second:18979.72 [#/sec] (mean) Time per request: 26.344 [ms] (mean) Time per request: 0.053 [ms] (mean, across all concurrent requests) Transfer rate: 7081.33 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:0 15 206.3 13006 Processing: 27 6.4 6 224 Waiting:16 6.4 5 224 Total: 3 22 208.4 63229 Percentage of the requests served within a certain time (ms) 50% 6 66% 8 75% 10 80% 12 90% 16 95% 17 98% 21 99% 24 100% 3229 (longest request) loadtest2:/proc/sys# again with >50% idle on the server box also, ab appears to only use a single cpu so the fact that there are two on the client box should not make a difference. I will reboot these boxes into a UP kernel if you think that this is still a significant difference. based on what I'm seeing I don't think it will make much of a difference (except for the apache test) David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 08:56:28AM -0800, David Lang ([EMAIL PROTECTED]) wrote: > the ab numbers below do not seem that impressive to me, especially for such > stripped down server processes. ... > client and server are dual opteron 252 with 8G of ram, running debian in 64 > bit mode Decrease your hardware setup in 2-4 times, leave only one apache process and try to get the same - we are not talking about how to create a perfect web server, instead we try to focus possible problems in epoll/kevent event driven logic. Vanilla (epoll) lighttpd shows 4000-5000 requests per second in my setup (no logs). Default mpm-apache2 with bunch of threads - about 8k req/s. Default thttpd (disabled logging) - about 2k req/s Btw, all your tests are network bound, try to decrease html page size to get actual event processing speed out of that machines. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
the ab numbers below do not seem that impressive to me, especially for such stripped down server processes. here are some numbers from a set of test boxes I've got in my lab. I've been useing them to test firewalls, and I've been getting throughput similar to what is listed below when going through a proxy that does a full fork for each connection, and then makes a new connection to the webserver on the other side! the first few sets of numbers are going directly from test client to test server, the final set is going though the proxy. client and server are dual opteron 252 with 8G of ram, running debian in 64 bit mode this is with apache2 MPM as the destination (relativly untuned except for tinkering with the child count settings). this should be about as bad as you can get for a server loadtest2:/proc/sys# ab -c 8000 -n 8 http://208.2.188.5:80/4k This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/ Benchmarking 208.2.188.5 (be patient) Completed 8000 requests Completed 16000 requests Completed 24000 requests Completed 32000 requests Completed 4 requests Completed 48000 requests Completed 56000 requests Completed 64000 requests Completed 72000 requests Finished 8 requests Server Software:Apache/1.3.33 Server Hostname:208.2.188.5 Server Port:80 Document Path: /4k Document Length:4352 bytes Concurrency Level: 8000 Time taken for tests: 10.992838 seconds Complete requests: 8 Failed requests:0 Write errors: 0 Total transferred: 386192835 bytes HTML transferred: 362612992 bytes Requests per second:7277.47 [#/sec] (mean) Time per request: 1099.284 [ms] (mean) Time per request: 0.137 [ms] (mean, across all concurrent requests) Transfer rate: 34307.88 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:8 497 1398.3 719072 Processing:17 236 346.91032995 Waiting:8 91 131.6 651692 Total: 26 734 1435.51879786 Percentage of the requests served within a certain time (ms) 50%187 66%288 75%564 80%754 90% 3085 95% 3163 98% 4316 99% 9186 100% 9786 (longest request) loadtest2:/proc/sys# ab -c 8000 -n 8 http://208.2.188.5:80/8k This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/ Benchmarking 208.2.188.5 (be patient) Completed 8000 requests Completed 16000 requests Completed 24000 requests Completed 32000 requests Completed 4 requests Completed 48000 requests Completed 56000 requests Completed 64000 requests Completed 72000 requests Finished 8 requests Server Software:Apache/1.3.33 Server Hostname:208.2.188.5 Server Port:80 Document Path: /8k Document Length:8704 bytes Concurrency Level: 8000 Time taken for tests: 11.355031 seconds Complete requests: 8 Failed requests:0 Write errors: 0 Total transferred: 736949141 bytes HTML transferred: 713733802 bytes Requests per second:7045.34 [#/sec] (mean) Time per request: 1135.503 [ms] (mean) Time per request: 0.142 [ms] (mean, across all concurrent requests) Transfer rate: 63379.48 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 36 495 1297.1 769056 Processing:81 317 529.51613448 Waiting: 25 89 75.1 761610 Total:124 812 1401.5250 11011 Percentage of the requests served within a certain time (ms) 50%250 66%304 75%497 80%705 90% 3171 95% 3251 98% 3455 99% 9160 100% 11011 (longest request) for both of these tests the server had it's cpu maxed out (<5% idle) switching to thttpd instead of apache and I get loadtest2:/proc/sys# ab -c 8000 -n 8 http://208.2.188.5:81/4k This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/ Benchmarking 208.2.188.5 (be patient) Completed 8000 requests Completed 16000 requests Completed 24000 requests Completed 32000 requests Completed 4 requests Completed 48000 requests Completed 56000 requests Completed 64000 requests Completed 72000 requests Finished 8 requests Server Software:thttpd/2.23beta1 Server Hostname:208.2.188.5 Server Port:81 Document Path: /4k Document Length:4352 bytes Concurrency Level:
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 04:41:27PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > On Thursday 01 March 2007 16:32, Eric Dumazet wrote: > > On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote: > > > They are there, since ab runs only 50k requests. > > > If I change it to something noticebly more than 50/80k, ab crashes: > > > # ab -c8000 -t 600 -n8 http://192.168.0.48/ > > > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 > > > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > > > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > > > > > > Benchmarking 192.168.0.48 (be patient) > > > Segmentation fault > > > > > > Are there any other tool suitable for such loads? > > > I only tested httperf (which is worse, since it uses poll/select) and > > > 'ab'. > > > > > > Btw, host machine runs 100% too, so it is possible that client side is > > > broken (too). > > > > I have similar problems here, ab test just doesnt complete... > > > > I am still investigating with strace and tcpdump. > > OK... I found it. > > I had to loop on accept() : > > for (i=0; i if (event[i].data.fd == main_server_s) { > do { > err = evtest_callback_main(event[i].data.fd); > } while (err != -1); > } > else > err = evtest_callback_client(event[i].data.fd); > } > > Or else we can miss an event forever... The same here - I would just enable a debug to find it. # ab -c8000 -n8 http://192.168.0.48/ This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking 192.168.0.48 (be patient) Completed 8000 requests Completed 16000 requests Completed 24000 requests Completed 32000 requests Completed 4 requests Completed 48000 requests Completed 56000 requests Completed 64000 requests Completed 72000 requests Finished 8 requests Server Software:Apache/1.3.27 Server Hostname:192.168.0.48 Server Port:80 Document Path: / Document Length:3521 bytes Concurrency Level: 8000 Time taken for tests: 18.250921 seconds Complete requests: 8 Failed requests:0 Write errors: 0 Total transferred: 315691904 bytes HTML transferred: 287074172 bytes Requests per second:4383.34 [#/sec] (mean) Time per request: 1825.092 [ms] (mean) Time per request: 0.228 [ms] (mean, across all concurrent requests) Transfer rate: 16891.86 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 137 884 481.19203602 Processing: 567 888 163.6985 997 Waiting: 47 455 238.2439 921 Total:765 1772 566.6 19114556 Percentage of the requests served within a certain time (ms) 50% 1911 66% 1911 75% 1912 80% 1913 90% 1913 95% 1914 98% 4438 99% 4497 100% 4556 (longest request) kano:~# -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 04:32:37PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote: > > They are there, since ab runs only 50k requests. > > If I change it to something noticebly more than 50/80k, ab crashes: > > # ab -c8000 -t 600 -n8 http://192.168.0.48/ > > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 > > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > > > > Benchmarking 192.168.0.48 (be patient) > > Segmentation fault > > > > Are there any other tool suitable for such loads? > > I only tested httperf (which is worse, since it uses poll/select) and > > 'ab'. > > > > Btw, host machine runs 100% too, so it is possible that client side is > > broken (too). > > I have similar problems here, ab test just doesnt complete... > > I am still investigating with strace and tcpdump. > > In the meantime you could just rewrite it (based on epoll please :) ), since > it should be quite easy to do this (reverse of evserver_epoll) Rewriting 'ab' with pure epoll instead of APR lib is like dandruff treatment on a guillotine. I will try to cook up something own - simple client (based on epoll) tomorrow/weekend, now I need to work for money :) -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thursday 01 March 2007 16:32, Eric Dumazet wrote: > On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote: > > They are there, since ab runs only 50k requests. > > If I change it to something noticebly more than 50/80k, ab crashes: > > # ab -c8000 -t 600 -n8 http://192.168.0.48/ > > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 > > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > > > > Benchmarking 192.168.0.48 (be patient) > > Segmentation fault > > > > Are there any other tool suitable for such loads? > > I only tested httperf (which is worse, since it uses poll/select) and > > 'ab'. > > > > Btw, host machine runs 100% too, so it is possible that client side is > > broken (too). > > I have similar problems here, ab test just doesnt complete... > > I am still investigating with strace and tcpdump. OK... I found it. I had to loop on accept() : for (i=0; ihttp://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 04:09:42PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > > > I can tell you that the problem (at least on my machine) comes from : > > > > > > > > gettimeofday(&tm, NULL); > > > > > > > > in evserver_epoll.c > > > > > > yeah, that's another difference - especially if it's something like > > > an Athlon64 and gettimeofday falls back to pm-timer, that could > > > explain the performance difference. That's why i repeatedly asked > > > Evgeniy to use the /very same/ client function for both the epoll > > > and the kevent test and redo the measurements. The numbers are still > > > highly suspect - and we are already down from the prior claim of > > > kevent being almost twice as fast to a 25% difference. > > > > There is no gettimeofday() in the running code anymore, and it was > > placed not in common server processing code btw. > > > > Ingo, do you really think I will send mails with faked benchmarks? :)) > > no, i'd not be in this discussion anymore if i thought that. But i do > think that your benchmark results are extremely sloppy, that make your > conclusions on them essentially useless. > > you were hurling quite colorful and strong assertions into this > discussion, backed up by these numbers, so you should expect at least > some minimal amount of scrutiny of those numbers. This discussion was about event driven vs. thread driven IO models, and threadlet only behaves like event driven because in my tests there was exactly one threadlet rescheduling per severa thousands of clients. Kevent is just a logical interpolation of performance of event driven model. My assumptions were based not on kevent performance, but on the fact, that event delivery is much faster and simpler than thread handling. Ugh, I'm starting that stupid talk again - let's just jump to the end - I agree that in real-life high-performance systems both models must be used. Peace? :) > > > [...] The numbers are still highly suspect - and we are already down > > > from the prior claim of kevent being almost twice as fast to a 25% > > > difference. > > > > Btw, there were never almost twice perfromance increase - epoll in my > > tests always showed 4-5 thousands requests per second, kevent - up to > > 7 thausands. > > i'm referring to your claim in this mail of yours from 4 days ago for > example: > > http://lkml.org/lkml/2007/2/25/116 > > "But note, that on my athlon64 3500 test machine kevent is about 7900 > requests per second compared to 4000+ epoll, so expect a challenge." > > no matter how i look at it, but 7900 is 1.9 times 4000 - which is > "almost twice". After your changes epoll increased to 5k. I can easily reproduce 6300/4300 split, but can not get more than 7k for kevent (with oprofile/idle=poll at least). I've completed 800k run: kevent 4800 epoll 4450 with tons ofoverflows in 'ab': Write errors: 0 Total transferred: -1197367296 bytes HTML transferred: -1478167296 bytes Requests per second:4440.67 [#/sec] (mean) Time per request: 1801.529 [ms] (mean) Time per request: 0.225 [ms] (mean, across all concurrent requests) Transfer rate: -6490.62 [Kbytes/sec] received Any other bench? > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote: > They are there, since ab runs only 50k requests. > If I change it to something noticebly more than 50/80k, ab crashes: > # ab -c8000 -t 600 -n8 http://192.168.0.48/ > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > Copyright 2006 The Apache Software Foundation, http://www.apache.org/ > > Benchmarking 192.168.0.48 (be patient) > Segmentation fault > > Are there any other tool suitable for such loads? > I only tested httperf (which is worse, since it uses poll/select) and > 'ab'. > > Btw, host machine runs 100% too, so it is possible that client side is > broken (too). I have similar problems here, ab test just doesnt complete... I am still investigating with strace and tcpdump. In the meantime you could just rewrite it (based on epoll please :) ), since it should be quite easy to do this (reverse of evserver_epoll) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 03:47:17PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > CPU: AMD64 processors, speed 2210.08 MHz (estimated) > > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit > > mask of 0x00 (No unit mask) count 10 > > samples %symbol name > > 195750 67.3097 cpu_idle > > 14111 4.8521 enter_idle > > 4979 1.7121 IRQ0x51_interrupt > > 4765 1.6385 tcp_v4_rcv > > the pretty much only meaningful way to measure this is to: > > - start a really long 'ab' testrun. Something like "ab -c 8000 -t 600". > - let the system get into 'steady state': i.e. CPU load at 100% > - reset the oprofile counters, then start an oprofile run for 60 > seconds. > - stop the oprofile run. > - stop the test. > > this way there wont be that many 'cpu_idle' entries in your profiles, > and the profiles between the two event delivery mechanisms will be > directly comparable. They are there, since ab runs only 50k requests. If I change it to something noticebly more than 50/80k, ab crashes: # ab -c8000 -t 600 -n8 http://192.168.0.48/ This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking 192.168.0.48 (be patient) Segmentation fault Are there any other tool suitable for such loads? I only tested httperf (which is worse, since it uses poll/select) and 'ab'. Btw, host machine runs 100% too, so it is possible that client side is broken (too). > > In that tests I got epoll perf about 4400 req/s, kevent was about > > 5300. > > So we are now up to epoll being 83% of kevent's performance - while the > noise of numbers seen today alone is around 100% ... Could you update > the files two URLs that you posted before, with the code that you used > for the above numbers: And in a couple of moments I resent profile with 6100 r/s, and now attached with 6300. >http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c >http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c Plus http://tservice.net.ru/~s0mbre/archive/kevent/evserver_common.c which contains common request handling function > thanks, > > Ingo -- Evgeniy Polyakov CPU: AMD64 processors, speed 2210.08 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10 samples %symbol name 168753 65.1189 cpu_idle 12451 4.8046 enter_idle 4814 1.8576 tcp_v4_rcv 3980 1.5358 IRQ0x51_interrupt 3142 1.2124 tcp_ack 2738 1.0565 kmem_cache_free 2346 0.9053 kfree 2341 0.9034 memset_c 1927 0.7436 csum_partial_copy_generic 1723 0.6649 ip_route_input 1650 0.6367 dev_queue_xmit 1452 0.5603 ip_output 1416 0.5464 handle_IRQ_event 1335 0.5152 ip_rcv 1326 0.5117 tcp_rcv_state_process 1069 0.4125 schedule 960 0.3704 __do_softirq 943 0.3639 tcp_sendmsg 915 0.3531 ip_queue_xmit 907 0.3500 tcp_v4_do_rcv 897 0.3461 fget 894 0.3450 system_call 890 0.3434 csum_partial 877 0.3384 tcp_transmit_skb 845 0.3261 netif_receive_skb 822 0.3172 ip_local_deliver 812 0.3133 kmem_cache_alloc 788 0.3041 local_bh_enable 773 0.2983 __alloc_skb 771 0.2975 kfree_skbmem 764 0.2948 __d_lookup 757 0.2921 __tcp_push_pending_frames 734 0.2832 pfifo_fast_enqueue 720 0.2778 copy_user_generic_string 627 0.2419 net_rx_action 603 0.2327 pfifo_fast_dequeue 586 0.2261 ret_from_intr 562 0.2169 __link_path_walk 561 0.2165 sock_wfree 549 0.2118 __fput 547 0.2111 __kfree_skb 543 0.2095 get_unused_fd 534 0.2061 number 527 0.2034 sysret_check 516 0.1991 preempt_schedule 508 0.1960 skb_clone 496 0.1914 tcp_parse_options 487 0.1879 _atomic_dec_and_lock 470 0.1814 tcp_poll 469 0.1810 __ip_route_output_key 466 0.1798 rt_hash_code 464 0.1790 tcp_recvmsg 421 0.1625 dput 420 0.1621 tcp_rcv_established 412 0.1590 __tcp_select_window 407 0.1571 exit_idle 394 0.1520 rb_erase 381 0.1470 sys_close 375 0.1447 __mod_timer 365 0.1408 d_alloc 363 0.1401 mask_and_ack_8259A 335 0.1293 lock_timer_base 315 0.1216 cache_alloc_refill 307 0.1185 ret_from_sys_call 300 0.1158 do_path_lookup 299 0.1154 eth_type_trans 298 0.1150 find_next_zero_bit 294 0.1134 tcp_data_queue 286 0.1104 dentry_iput 285 0.1100 ip_append_data 263 0.1015 thread_return 257 0.0992 __dentry_open 255 0.0984 sock_recvmsg 255 0.0984 tcp_rtt_estimator 252 0.0972 sys_fcntl 250
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > I can tell you that the problem (at least on my machine) comes from : > > > > > > gettimeofday(&tm, NULL); > > > > > > in evserver_epoll.c > > > > yeah, that's another difference - especially if it's something like > > an Athlon64 and gettimeofday falls back to pm-timer, that could > > explain the performance difference. That's why i repeatedly asked > > Evgeniy to use the /very same/ client function for both the epoll > > and the kevent test and redo the measurements. The numbers are still > > highly suspect - and we are already down from the prior claim of > > kevent being almost twice as fast to a 25% difference. > > There is no gettimeofday() in the running code anymore, and it was > placed not in common server processing code btw. > > Ingo, do you really think I will send mails with faked benchmarks? :)) no, i'd not be in this discussion anymore if i thought that. But i do think that your benchmark results are extremely sloppy, that make your conclusions on them essentially useless. you were hurling quite colorful and strong assertions into this discussion, backed up by these numbers, so you should expect at least some minimal amount of scrutiny of those numbers. > > [...] The numbers are still highly suspect - and we are already down > > from the prior claim of kevent being almost twice as fast to a 25% > > difference. > > Btw, there were never almost twice perfromance increase - epoll in my > tests always showed 4-5 thousands requests per second, kevent - up to > 7 thausands. i'm referring to your claim in this mail of yours from 4 days ago for example: http://lkml.org/lkml/2007/2/25/116 "But note, that on my athlon64 3500 test machine kevent is about 7900 requests per second compared to 4000+ epoll, so expect a challenge." no matter how i look at it, but 7900 is 1.9 times 4000 - which is "almost twice". Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 05:43:50PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Thu, Mar 01, 2007 at 02:12:50PM +0100, Eric Dumazet ([EMAIL PROTECTED]) > wrote: > > On Thursday 01 March 2007 12:47, Evgeniy Polyakov wrote: > > > > > > Could you provide at least remote way to find it? > > > > > > > Sure :) > > > > > I only found the same problem at > > > http://lkml.org/lkml/2006/10/27/3 > > > > > > but without any hits to solve the problem. > > > > > > I will try CVS oprofile, if it works I will provide details of course. > > > > > > > # cat CVS/Root > > CVS/Root::pserver:[EMAIL PROTECTED]:/cvsroot/oprofile > > > > # cvs diff >/tmp/oprofile.diff > > > > Hope it helps > > One of the hunks failed, since it was in CVS already. > After setting up some mirrors, I've installed all bits needed for > oprofile. > Attached kevent and epoll profiles. > > In that tests I got epoll perf about 4400 req/s, kevent was about 5300. Attached kevent profile with 6100 req/sec. They all look exactly the same for me - there no kevent or epoll functions in profiles at all. -- Evgeniy Polyakov CPU: AMD64 processors, speed 2210.08 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10 samples %symbol name 103425 55.0868 cpu_idle 8214 4.3750 enter_idle 4712 2.5097 tcp_v4_rcv 3805 2.0266 IRQ0x51_interrupt 3154 1.6799 tcp_ack 2777 1.4791 kmem_cache_free 2286 1.2176 kfree 2155 1.1478 memset_c 1747 0.9305 csum_partial_copy_generic 1710 0.9108 ip_output 1620 0.8629 dev_queue_xmit 1551 0.8261 handle_IRQ_event 1391 0.7409 schedule 1373 0.7313 tcp_rcv_state_process 1337 0.7121 ip_rcv 1100 0.5859 ip_queue_xmit 965 0.5140 ip_route_input 939 0.5001 tcp_sendmsg 935 0.4980 __do_softirq 923 0.4916 ip_local_deliver 916 0.4879 csum_partial 905 0.4820 system_call 889 0.4735 tcp_transmit_skb 884 0.4708 tcp_v4_do_rcv 812 0.4325 netif_receive_skb 778 0.4144 __d_lookup 760 0.4048 __alloc_skb 747 0.3979 local_bh_enable 737 0.3925 __tcp_push_pending_frames 702 0.3739 kfree_skbmem 698 0.3718 pfifo_fast_enqueue 678 0.3611 kmem_cache_alloc 651 0.3467 fget 640 0.3409 pfifo_fast_dequeue 637 0.3393 net_rx_action 629 0.3350 __link_path_walk 602 0.3206 preempt_schedule 599 0.3190 __fput 594 0.3164 sock_wfree 589 0.3137 copy_user_generic_string 579 0.3084 ret_from_intr 559 0.2977 _atomic_dec_and_lock 552 0.2940 __kfree_skb 549 0.2924 skb_clone 514 0.2738 number 494 0.2631 rt_hash_code 473 0.2519 dput 466 0.2482 tcp_parse_options 446 0.2376 tcp_rcv_established 433 0.2306 tcp_recvmsg 431 0.2296 tcp_poll 417 0.2221 get_unused_fd 417 0.2221 sysret_check 377 0.2008 rb_erase 364 0.1939 __tcp_select_window 363 0.1933 lock_timer_base 347 0.1848 __mod_timer 329 0.1752 ip_append_data 326 0.1736 exit_idle 325 0.1731 ret_from_sys_call 317 0.1688 d_alloc 302 0.1609 do_path_lookup 295 0.1571 __ip_route_output_key 290 0.1545 eth_type_trans 285 0.1518 sys_close 283 0.1507 cache_alloc_refill 282 0.1502 mask_and_ack_8259A 275 0.1465 thread_return 267 0.1422 call_softirq 265 0.1411 tcp_rtt_estimator 260 0.1385 tcp_data_queue 258 0.1374 __dentry_open 258 0.1374 vsnprintf 255 0.1358 dentry_iput 255 0.1358 tcp_current_mss 250 0.1332 sk_stream_mem_schedule 239 0.1273 find_next_zero_bit 233 0.1241 cache_grow 233 0.1241 tcp_send_fin 222 0.1182 try_to_wake_up 219 0.1166 sock_recvmsg 216 0.1150 do_generic_mapping_read 211 0.1124 sys_fcntl 209 0.1113 get_empty_filp 207 0.1103 call_rcu 206 0.1097 strncpy_from_user 195 0.1039 sock_def_readable 190 0.1012 generic_drop_inode 190 0.1012 restore_args 184 0.0980 get_page_from_freelist 182 0.0969 sys_recvfrom 176 0.0937 do_lookup 174 0.0927 common_interrupt 171 0.0911 fget_light 167 0.0889 new_inode 167 0.0889 percpu_counter_mod 166 0.0884 link_path_walk 166 0.0884 skb_checksum 160 0.0852 fput 160 0.0852 release_sock 159 0.0847 memcpy_c 158 0.0842 memcmp 157 0.0836 __skb_checksum_complete 157 0.0836 tcp_init_tso_segs 148 0.0788 half_md4_transform 144 0.0767 tcp_v4_send_check 142 0.0756 del_timer 139 0.0740 current_fs_time 135 0.0719 update_send_head 129 0.0687 do_sys_open 126 0.0671 rb_insert_color 125 0.0666 bictcp_cong_avoid 124 0.0660 __put_unused_fd 123 0.0655 schedu
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 03:16:37PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Eric Dumazet <[EMAIL PROTECTED]> wrote: > > > I can tell you that the problem (at least on my machine) comes from : > > > > gettimeofday(&tm, NULL); > > > > in evserver_epoll.c > > yeah, that's another difference - especially if it's something like an > Athlon64 and gettimeofday falls back to pm-timer, that could explain the > performance difference. That's why i repeatedly asked Evgeniy to use the > /very same/ client function for both the epoll and the kevent test and > redo the measurements. The numbers are still highly suspect - and we are > already down from the prior claim of kevent being almost twice as fast > to a 25% difference. There is no gettimeofday() in the running code anymore, and it was placed not in common server processing code btw. Ingo, do you really think I will send mails with faked benchmarks? :)) Btw, there were never almost twice perfromance increase - epoll in my tests always showed 4-5 thousands requests per second, kevent - up to 7 thausands. That starts looking like ghost hunting, Ingo, you already said that you do not see any need in kevent, have you changed your opinion on that? > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > CPU: AMD64 processors, speed 2210.08 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit > mask of 0x00 (No unit mask) count 10 > samples %symbol name > 195750 67.3097 cpu_idle > 14111 4.8521 enter_idle > 4979 1.7121 IRQ0x51_interrupt > 4765 1.6385 tcp_v4_rcv the pretty much only meaningful way to measure this is to: - start a really long 'ab' testrun. Something like "ab -c 8000 -t 600". - let the system get into 'steady state': i.e. CPU load at 100% - reset the oprofile counters, then start an oprofile run for 60 seconds. - stop the oprofile run. - stop the test. this way there wont be that many 'cpu_idle' entries in your profiles, and the profiles between the two event delivery mechanisms will be directly comparable. > In that tests I got epoll perf about 4400 req/s, kevent was about > 5300. So we are now up to epoll being 83% of kevent's performance - while the noise of numbers seen today alone is around 100% ... Could you update the files two URLs that you posted before, with the code that you used for the above numbers: http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c thanks, Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 02:12:50PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > On Thursday 01 March 2007 12:47, Evgeniy Polyakov wrote: > > > > Could you provide at least remote way to find it? > > > > Sure :) > > > I only found the same problem at > > http://lkml.org/lkml/2006/10/27/3 > > > > but without any hits to solve the problem. > > > > I will try CVS oprofile, if it works I will provide details of course. > > > > # cat CVS/Root > CVS/Root::pserver:[EMAIL PROTECTED]:/cvsroot/oprofile > > # cvs diff >/tmp/oprofile.diff > > Hope it helps One of the hunks failed, since it was in CVS already. After setting up some mirrors, I've installed all bits needed for oprofile. Attached kevent and epoll profiles. In that tests I got epoll perf about 4400 req/s, kevent was about 5300. epoll does not contain gettimeofday() call. -- Evgeniy Polyakov CPU: AMD64 processors, speed 2210.08 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10 samples %symbol name 195750 67.3097 cpu_idle 14111 4.8521 enter_idle 4979 1.7121 IRQ0x51_interrupt 4765 1.6385 tcp_v4_rcv 3316 1.1402 tcp_ack 3138 1.0790 kmem_cache_free 2619 0.9006 kfree 2323 0.7988 memset_c 1747 0.6007 schedule 1682 0.5784 csum_partial_copy_generic 1646 0.5660 ip_output 1563 0.5374 tcp_rcv_state_process 1506 0.5178 dev_queue_xmit 1412 0.4855 handle_IRQ_event 1266 0.4353 ip_rcv 1026 0.3528 ip_queue_xmit 1004 0.3452 __do_softirq 1001 0.3442 ip_local_deliver 906 0.3115 csum_partial 902 0.3102 ip_route_input 889 0.3057 __d_lookup 880 0.3026 kmem_cache_alloc 847 0.2912 tcp_v4_do_rcv 841 0.2892 netif_receive_skb 830 0.2854 tcp_sendmsg 819 0.2816 system_call 788 0.2710 kfree_skbmem 780 0.2682 tcp_transmit_skb 742 0.2551 preempt_schedule 731 0.2514 __tcp_push_pending_frames 699 0.2404 __link_path_walk 687 0.2362 pfifo_fast_dequeue 672 0.2311 local_bh_enable 657 0.2259 __alloc_skb 650 0.2235 net_rx_action 627 0.2156 pfifo_fast_enqueue 583 0.2005 sock_wfree 571 0.1963 get_unused_fd 547 0.1881 tcp_parse_options 546 0.1877 copy_user_generic_string 533 0.1833 _atomic_dec_and_lock 529 0.1819 number 524 0.1802 ret_from_intr 509 0.1750 skb_clone 507 0.1743 fget 507 0.1743 tcp_rcv_established 498 0.1712 __kfree_skb 492 0.1692 tcp_poll 481 0.1654 rt_hash_code 471 0.1620 dput 454 0.1561 sock_def_readable 422 0.1451 mask_and_ack_8259A 421 0.1448 sysret_check 419 0.1441 __fput 413 0.1420 exit_idle 396 0.1362 ip_append_data 374 0.1286 sock_poll 371 0.1276 tcp_data_queue 359 0.1234 __tcp_select_window 356 0.1224 tcp_recvmsg 348 0.1197 lock_timer_base 340 0.1169 cache_alloc_refill 338 0.1162 thread_return 319 0.1097 sys_close 318 0.1093 do_path_lookup 318 0.1093 ret_from_sys_call 311 0.1069 vsnprintf 307 0.1056 eth_type_trans 303 0.1042 find_next_zero_bit 302 0.1038 __mod_timer 298 0.1025 d_alloc 296 0.1018 rb_erase 293 0.1007 call_softirq 290 0.0997 __dentry_open 276 0.0949 cache_grow 274 0.0942 __ip_route_output_key 273 0.0939 try_to_wake_up 258 0.0887 dentry_iput 258 0.0887 sk_stream_mem_schedule 257 0.0884 do_lookup 244 0.0839 strncpy_from_user 234 0.0805 do_generic_mapping_read 231 0.0794 memcmp 229 0.0787 tcp_current_mss 228 0.0784 tcp_rtt_estimator 214 0.0736 restore_args 205 0.0705 sys_recvfrom 204 0.0701 fput 203 0.0698 tcp_send_fin 200 0.0688 release_sock 193 0.0664 memcpy_c 191 0.0657 common_interrupt 189 0.0650 fget_light 185 0.0636 skb_checksum 182 0.0626 generic_drop_inode 180 0.0619 do_sys_open 174 0.0598 get_page_from_freelist 168 0.0578 link_path_walk 165 0.0567 schedule_timeout 163 0.0560 del_timer 162 0.0557 rb_insert_color 160 0.0550 percpu_counter_mod 159 0.0547 __up_read 155 0.0533 expand_files 154 0.0530 sys_fcntl 150 0.0516 tcp_v4_send_check 146 0.0502 fd_install 145 0.0499 bictcp_cong_avoid 143 0.0492 call_rcu 141 0.0485 __down_read 141 0.0485 sock_close 140 0.0481 copy_page_c 138 0.0475 __skb_checksum_complete 138 0.0475 lookup_mnt 137 0.0471 getname 132 0.0454 generic_permission 131 0.0450 find_get_page 130 0.0447 __do_page_cache_readahead 130 0.0447 update_send_head 127 0.0437 get_empty_filp 126 0.0433 __path_lookup_intent_open 124 0.0
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Eric Dumazet <[EMAIL PROTECTED]> wrote: > On my machines (again ...), ab is the slow thing... not the 'server' Evgeniy said that both in the epoll and the kevent case the server side CPU was 98%-100% busy - so inefficiencies on the client side do not matter that much - the server is saturated. It's that "kevent is 25% faster than epoll" claim that i'm probing mainly. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thursday 01 March 2007 15:16, Ingo Molnar wrote: > * Eric Dumazet <[EMAIL PROTECTED]> wrote: > > I can tell you that the problem (at least on my machine) comes from : > > > > gettimeofday(&tm, NULL); > > > > in evserver_epoll.c > > yeah, that's another difference - especially if it's something like an > Athlon64 and gettimeofday falls back to pm-timer, that could explain the > performance difference. That's why i repeatedly asked Evgeniy to use the > /very same/ client function for both the epoll and the kevent test and > redo the measurements. The numbers are still highly suspect - and we are > already down from the prior claim of kevent being almost twice as fast > to a 25% difference. Also, ab is quite lame... Maybe we could use a epoll based 'stresser' On my machines (again ...), ab is the slow thing... not the 'server' Some small differences in behavior could have a big impact on ab (and you could think there is a problem on the remote side) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 02:32:42PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > [...] that is why number for kevent is higher - it uses edge-triggered > > handler (which you asked to remove from epoll), [...] > > no - i did not 'ask' it to be removed from epoll, i only pointed out > that with edge-triggered the results were highly unreliable here and > that with level-triggered it worked better. Just to make sure: if you > put back edge-triggered into evserver_epoll.c, do you get the same > numbers, and is CPU utilization still the same 98-100%? No. _Now_ it is about 1500-2000 req/sec with 10-20% CPU utilization. Number of 'Total transferred' and 'HTML transferred' does not equal to 8 multiplied by size of the page. That are strange tests actually - I managed to get 9000 requests per second from epoll server (only once!) and 8900 from kevent (two times only), sometimes they both drop down to 2300-2700 req/s. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Eric Dumazet <[EMAIL PROTECTED]> wrote: > I can tell you that the problem (at least on my machine) comes from : > > gettimeofday(&tm, NULL); > > in evserver_epoll.c yeah, that's another difference - especially if it's something like an Athlon64 and gettimeofday falls back to pm-timer, that could explain the performance difference. That's why i repeatedly asked Evgeniy to use the /very same/ client function for both the epoll and the kevent test and redo the measurements. The numbers are still highly suspect - and we are already down from the prior claim of kevent being almost twice as fast to a 25% difference. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thursday 01 March 2007 14:30, Evgeniy Polyakov wrote: > On Thu, Mar 01, 2007 at 02:11:18PM +0100, Ingo Molnar ([EMAIL PROTECTED]) > wrote: > > ok? > > I undesrtood you couple of mails ago. > No problem, I can put processing into the same function called from > different servers :) > > > Btw., am i correct that in this particular 'ab' test, the 'immediately' > > flag is always zero, i.e. kweb_kevent_remove() is always called? > > Yes. > > > Ingo I can tell you that the problem (at least on my machine) comes from : gettimeofday(&tm, NULL); in evserver_epoll.c - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > [...] that is why number for kevent is higher - it uses edge-triggered > handler (which you asked to remove from epoll), [...] no - i did not 'ask' it to be removed from epoll, i only pointed out that with edge-triggered the results were highly unreliable here and that with level-triggered it worked better. Just to make sure: if you put back edge-triggered into evserver_epoll.c, do you get the same numbers, and is CPU utilization still the same 98-100%? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 02:11:18PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > ok? I undesrtood you couple of mails ago. No problem, I can put processing into the same function called from different servers :) > Btw., am i correct that in this particular 'ab' test, the 'immediately' > flag is always zero, i.e. kweb_kevent_remove() is always called? Yes. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 01:34:23PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > Document Length:3521 bytes > > > Concurrency Level: 8000 > > Time taken for tests: 16.686737 seconds > > Complete requests: 8 > > Failed requests:0 > > Write errors: 0 > > Total transferred: 30976 bytes > > HTML transferred: 28168 bytes > > Requests per second:4794.23 [#/sec] (mean) > > > Concurrency Level: 8000 > > Time taken for tests: 12.366775 seconds > > Complete requests: 8 > > Failed requests:0 > > Write errors: 0 > > Total transferred: 317047104 bytes > > HTML transferred: 288306522 bytes > > Requests per second:6468.95 [#/sec] (mean) > > i'm wondering - how can the 'Total transferred' and 'HTML transferred' > numbers be different? > > Since document length is 3521, and the number of requests is 8, the > correct 'HTML transferred' is 28168 - which is the epoll result. The > kevent result shows more bytes transferred, which suggests that the > kevent loop is probably incorrect somewhere. > > this might be some benign thing, but the /first/ thing you /have to/ do > before claiming that 'kevent is 25% faster than epoll' is to make sure > the results are totally reliable. Kevent sent additional 525 pages ((311792800-30976)/3872) - that is why number for kevent is higher - it uses edge-triggered handler (which you asked to remove from epoll), which can produce false-positives, for absolute result in that case ret_data must be checked where poll flags were stored (before). 'ab' does not count additional data as new requests and does not count them in 'requests per second'. Even if it could do so, additional 500 requests can not provide 35% higher rate. For example, lighttpd results are the same for kevent and epoll and 'Total transferred' and 'HTML transferred' numbers change between runs both for epoll and kevent. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > i dont care whether they are separate or not - but you have not > > replied to the request that there be a handle_web_request() function > > in /both/ files, which is precisely the same function. I didnt ask > > you to merge the two files - i only asked for the two web handling > > functions to be one and the same function. > > They are not the same in general - if kevent is ready immediately, there > will not be its removing from kevent tree, but current kevent server has > it always not-immediately for lighttpd tests - so functions are the same: > open() > sendfile() > cork_off > close(fd) > close(s) > remove_event_from_the_kernel > > with the same parameters. you /STILL/ dont understand. I'm only talking about evserver_epoll.c and evserver_kevent.c. Not about lighttpd. Not about historic reasons. I simply suggested a common-sense change: | | Would it be so hard to introduce a single handle_web_request() | | function that is exactly the same in the two tests? All the queueing | | details (which are of course different in the epoll and the kevent | | case) should be in the client function, which calls | | handle_web_request(). i.e. put remove_event_from_the_kernel() (kweb_kevent_remove()) and evtest_remove()) into a SEPARATE client function, which calls the /common/ handle_web_request(sock) function. You can do the immediate-removal in that separate, kevent-specific client function - but the socket function, handle_web_request(sock) should be /perfectly identical/ in the two files. I.e.: static inline int handle_web_request(int s) { int err, fd, on = 0; off_t offset = 0; int count = 40960; char path[] = "/tmp/index.html"; char buf[4096]; err = recv(s, buf, sizeof(buf), 0); if (err <= 0) return err; fd = open(path, O_RDONLY); if (fd == -1) return fd; err = sendfile(s, fd, &offset, count); if (err < 0) { ulog_err("Failed send %d bytes: fd=%d.\n", count, s); close(fd); return err; } setsockopt(s, SOL_TCP, TCP_CORK, &on, sizeof(on)); close(fd); close(s); /* No keepalive */ return 0; } And in evserver_epoll.c do this: static int evtest_callback_client(int s) { int err = handle_web_request(s); if (err) evtest_remove(s); return err; } and in evserver_kevent.c do this: static int kweb_callback_client(struct ukevent *e, int im) { int err = handle_web_request(e->id.raw[0]); if (err || !im) kweb_kevent_remove(e); return err; } ok? Btw., am i correct that in this particular 'ab' test, the 'immediately' flag is always zero, i.e. kweb_kevent_remove() is always called? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thursday 01 March 2007 12:47, Evgeniy Polyakov wrote: > > Could you provide at least remote way to find it? > Sure :) > I only found the same problem at > http://lkml.org/lkml/2006/10/27/3 > > but without any hits to solve the problem. > > I will try CVS oprofile, if it works I will provide details of course. > # cat CVS/Root CVS/Root::pserver:[EMAIL PROTECTED]:/cvsroot/oprofile # cvs diff >/tmp/oprofile.diff Hope it helps Index: libop/op_alloc_counter.c === RCS file: /cvsroot/oprofile/oprofile/libop/op_alloc_counter.c,v retrieving revision 1.8 diff -r1.8 op_alloc_counter.c 14a15,16 > #include > #include 133c135 < return 0; --- > continue; 145a148,183 > /* determine which directories are counter directories > */ > static int perfcounterdir(const struct dirent * entry) > { > return (isdigit(entry->d_name[0])); > } > > > /** > * @param mask pointer where to place bit mask of unavailable counters > * > * return >= 0 number of counters that are available > *< 0 could not determine number of counters > * > */ > static int op_get_counter_mask(u32 * mask) > { > struct dirent **counterlist; > int count, i; > /* assume nothing is available */ > u32 available=0; > > count = scandir("/dev/oprofile", &counterlist, perfcounterdir, > alphasort); > if (count < 0) > /* unable to determine bit mask */ > return -1; > /* convert to bit map (0 where counter exists) */ > for (i=0; i available |= 1 << atoi(counterlist[i]->d_name); > free(counterlist[i]); > } > *mask=~available; > free(counterlist); > return count; > } 152a191 > u32 unavailable_counters = 0; 154c193,195 < nr_counters = op_get_nr_counters(cpu_type); --- > nr_counters = op_get_counter_mask(&unavailable_counters); > if (nr_counters < 0) > nr_counters = op_get_nr_counters(cpu_type); 162c203,204 < if (!allocate_counter(ctr_arc, nr_events, 0, 0, counter_map)) { --- > if (!allocate_counter(ctr_arc, nr_events, 0, unavailable_counters, > counter_map)) {
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 01:43:36PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > I separated epoll and kevent servers, since originally kevent server > > included additional kevent features, but then new ones were added and > > I moved slowly to the similar to epoll case. > > i dont care whether they are separate or not - but you have not replied > to the request that there be a handle_web_request() function in /both/ > files, which is precisely the same function. I didnt ask you to merge > the two files - i only asked for the two web handling functions to be > one and the same function. They are not the same in general - if kevent is ready immediately, there will not be its removing from kevent tree, but current kevent server has it always not-immediately for lighttpd tests - so functions are the same: open() sendfile() cork_off close(fd) close(s) remove_event_from_the_kernel with the same parameters. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > I separated epoll and kevent servers, since originally kevent server > included additional kevent features, but then new ones were added and > I moved slowly to the similar to epoll case. i dont care whether they are separate or not - but you have not replied to the request that there be a handle_web_request() function in /both/ files, which is precisely the same function. I didnt ask you to merge the two files - i only asked for the two web handling functions to be one and the same function. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > Document Length:3521 bytes > Concurrency Level: 8000 > Time taken for tests: 16.686737 seconds > Complete requests: 8 > Failed requests:0 > Write errors: 0 > Total transferred: 30976 bytes > HTML transferred: 28168 bytes > Requests per second:4794.23 [#/sec] (mean) > Concurrency Level: 8000 > Time taken for tests: 12.366775 seconds > Complete requests: 8 > Failed requests:0 > Write errors: 0 > Total transferred: 317047104 bytes > HTML transferred: 288306522 bytes > Requests per second:6468.95 [#/sec] (mean) i'm wondering - how can the 'Total transferred' and 'HTML transferred' numbers be different? Since document length is 3521, and the number of requests is 8, the correct 'HTML transferred' is 28168 - which is the epoll result. The kevent result shows more bytes transferred, which suggests that the kevent loop is probably incorrect somewhere. this might be some benign thing, but the /first/ thing you /have to/ do before claiming that 'kevent is 25% faster than epoll' is to make sure the results are totally reliable. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 12:28:00PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > I used the CVS version of oprofile plus a patch you can find in the mailing > list archives. Dont remember exactly, since I hit this some months ago Ugh, I started - but CVS compilation requires about 40mb of additional libs (according to debian testing dependencies on my very light installation), so with my miserable 1-1.6 kb/sec do not expect it today :) -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 12:47:35PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > I also changed client socket to nonblocking mode with the same result > > > in epoll server. If you will find it broken, please send me corrected > > > to test too. > > > > this line in evserver_kevent.c looks a bit fishy: > > this one in evserver_kevent.c looks problematic too: > > shutdown(s, SHUT_RDWR); > close(s); > > as evserver_epoll.c only does: > > close(s); > > again, there might be TCP control flow differences due to this. [ Or the > removal of this shutdown() call might be a small speedup for the kevent > case ;) ] :) > Also, the order of fd and socket close() is different in the two cases. > It shouldnt make any difference - but that too just makes the results > harder to trust. Would it be so hard to introduce a single > handle_web_request() function that is exactly the same in the two tests? > All the queueing details (which are of course different in the epoll and > the kevent case) should be in the client function, which calls > handle_web_request(). I've removed shutdown - things are the same. Sometimes kevent performance drops to lower numbers and its graph of times needed to handle events has high platoes (with and without shutdown - it was always), like this: Percentage of the requests served within a certain time (ms) 50%128 66%486 75%505 80%507 90%732 95% 3087 // something is wrong at this point 98% 9058 99% 9072 100% 15562 (longest request) it is possible that threre are some other bugs in the server though, which prevent sockets from being quicly closed and thus its processing time increases - I do not know for sure the root of that event. I separated epoll and kevent servers, since originally kevent server included additional kevent features, but then new ones were added and I moved slowly to the similar to epoll case. Current version of the server was a pre-test one for lighttpd patches, so essentially it should be like epoll except minor details. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 12:41:37PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > I also changed client socket to nonblocking mode with the same result > > in epoll server. If you will find it broken, please send me corrected > > to test too. > > this line in evserver_kevent.c looks a bit fishy: > > err = recv(s, buf, 100, 0); > > because on the evserver_epoll.c side the following is done: > > err = recv(s, buf, 4096, 0); > > now, for 'ab', the request size is 76 bytes, so it should fit fine > functionality-wise. But, the TCP stack might decide differently of > whether to return with a partial packet depending on how much data is > requested. I dont know whether it actually makes a difference in the TCP > flow decisions, and whether it makes a performance difference in your > test, but safest would be to use 4096 in both cases. Well, that would be quite strange - as far as I known linux network stack (for which kevent was originally created to support network AIO), there should not be any difference. Anyway, I've reran the test with the same values: # ab -c8000 -n8 http://192.168.0.48/ This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking 192.168.0.48 (be patient) Completed 8000 requests Completed 16000 requests Completed 24000 requests Completed 32000 requests Completed 4 requests Completed 48000 requests Completed 56000 requests Completed 64000 requests Completed 72000 requests Finished 8 requests Server Software:Apache/1.3.27 Server Hostname:192.168.0.48 Server Port:80 Document Path: / Document Length:3521 bytes Concurrency Level: 8000 Time taken for tests: 18.398381 seconds Complete requests: 8 Failed requests:0 Write errors: 0 Total transferred: 338738048 bytes HTML transferred: 308031164 bytes Requests per second:4348.21 [#/sec] (mean) Time per request: 1839.838 [ms] (mean) Time per request: 0.230 [ms] (mean, across all concurrent requests) Transfer rate: 17979.73 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 148 795 196.98083599 Processing: 824 882 39.7878 986 Waiting: 59 426 212.6423 914 Total: 1073 1678 200.8 16734579 Percentage of the requests served within a certain time (ms) 50% 1673 66% 1674 75% 1678 80% 1686 90% 1852 95% 1861 98% 1864 99% 1865 100% 4579 (longest request) Essentially the same result (in limits of some inaccuracy). > in general, please make sure the exact same system calls are done in the > client function. (except of course for the event queueing syscalls > themselves) Yes, that should be done of course. I even have a plan to create the same binary for both, but have also in plans to turn some kevent optimization (mainly readiness-on-submit, when requested event (secv/send/anything) is ready immediately - kevent supports to return that event in the submission syscall without additional overhead by reading it from ring or queue). > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > I also changed client socket to nonblocking mode with the same result > > in epoll server. If you will find it broken, please send me corrected > > to test too. > > this line in evserver_kevent.c looks a bit fishy: this one in evserver_kevent.c looks problematic too: shutdown(s, SHUT_RDWR); close(s); as evserver_epoll.c only does: close(s); again, there might be TCP control flow differences due to this. [ Or the removal of this shutdown() call might be a small speedup for the kevent case ;) ] Also, the order of fd and socket close() is different in the two cases. It shouldnt make any difference - but that too just makes the results harder to trust. Would it be so hard to introduce a single handle_web_request() function that is exactly the same in the two tests? All the queueing details (which are of course different in the epoll and the kevent case) should be in the client function, which calls handle_web_request(). Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 12:28:00PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > On Thursday 01 March 2007 12:20, Evgeniy Polyakov wrote: > > On Thu, Mar 01, 2007 at 12:14:44PM +0100, Eric Dumazet ([EMAIL PROTECTED]) > wrote: > > > On Thursday 01 March 2007 11:59, Evgeniy Polyakov wrote: > > > > Yes, it is about 98-100% in both cases. > > > > I've just re-run tests on my amd64 test machine without debug options: > > > > > > > > epoll 4794.23 > > > > kevent 6468.95 > > > > > > It would be valuable if you could post oprofile results > > > (CPU_CLK_UNHALTED) for both tests. > > > > I can't - oprofile does not work on this x86_64 machine: > > > > Yes, this is a known problem, but you can make it works, as I did. > > Please :) I can not resist :) > I used the CVS version of oprofile plus a patch you can find in the mailing > list archives. Dont remember exactly, since I hit this some months ago Could you provide at least remote way to find it? I only found the same problem at http://lkml.org/lkml/2006/10/27/3 but without any hits to solve the problem. I will try CVS oprofile, if it works I will provide details of course. My tree is based on rc1 and has this latest commit: commit b5bf28cde894b3bb3bd25c13a7647020562f9ea0 Author: Linus Torvalds <[EMAIL PROTECTED]> Date: Wed Feb 21 11:21:44 2007 -0800 There are no commits after that data with word 'oprofile' in git-whatchanged at least. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > I also changed client socket to nonblocking mode with the same result > in epoll server. If you will find it broken, please send me corrected > to test too. this line in evserver_kevent.c looks a bit fishy: err = recv(s, buf, 100, 0); because on the evserver_epoll.c side the following is done: err = recv(s, buf, 4096, 0); now, for 'ab', the request size is 76 bytes, so it should fit fine functionality-wise. But, the TCP stack might decide differently of whether to return with a partial packet depending on how much data is requested. I dont know whether it actually makes a difference in the TCP flow decisions, and whether it makes a performance difference in your test, but safest would be to use 4096 in both cases. in general, please make sure the exact same system calls are done in the client function. (except of course for the event queueing syscalls themselves) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 12:27:00PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > I've uploaded them to: > > > > http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c > > http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c > > thanks. > > > I also changed client socket to nonblocking mode with the same result > > in epoll server. [...] > > what does this mean exactly? Did you change this line in > evserver_epoll.c: > > //fcntl(cs, F_SETFL, O_NONBLOCK); > > to: > > fcntl(cs, F_SETFL, O_NONBLOCK); > > and the result was the same? Yep. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > I've uploaded them to: > > http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c > http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c thanks. > I also changed client socket to nonblocking mode with the same result > in epoll server. [...] what does this mean exactly? Did you change this line in evserver_epoll.c: //fcntl(cs, F_SETFL, O_NONBLOCK); to: fcntl(cs, F_SETFL, O_NONBLOCK); and the result was the same? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thursday 01 March 2007 12:20, Evgeniy Polyakov wrote: > On Thu, Mar 01, 2007 at 12:14:44PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > > On Thursday 01 March 2007 11:59, Evgeniy Polyakov wrote: > > > Yes, it is about 98-100% in both cases. > > > I've just re-run tests on my amd64 test machine without debug options: > > > > > > epoll 4794.23 > > > kevent6468.95 > > > > It would be valuable if you could post oprofile results > > (CPU_CLK_UNHALTED) for both tests. > > I can't - oprofile does not work on this x86_64 machine: > Yes, this is a known problem, but you can make it works, as I did. Please :) I used the CVS version of oprofile plus a patch you can find in the mailing list archives. Dont remember exactly, since I hit this some months ago - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 12:14:44PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > On Thursday 01 March 2007 11:59, Evgeniy Polyakov wrote: > > > Yes, it is about 98-100% in both cases. > > I've just re-run tests on my amd64 test machine without debug options: > > > > epoll 4794.23 > > kevent 6468.95 > > > > It would be valuable if you could post oprofile results (CPU_CLK_UNHALTED) > for > both tests. I can't - oprofile does not work on this x86_64 machine: #opcontrol --setup --vmlinux=/home/s0mbre/aWork/git/linux-2.6.kevent/vmlinux # opcontrol --start Using default event: CPU_CLK_UNHALTED:10:0:1:1 /usr/bin/opcontrol: line 994: /dev/oprofile/0/enabled: No such file or directory /usr/bin/opcontrol: line 994: /dev/oprofile/0/event: No such file or directory /usr/bin/opcontrol: line 994: /dev/oprofile/0/count: No such file or directory /usr/bin/opcontrol: line 994: /dev/oprofile/0/kernel: No such file or directory /usr/bin/opcontrol: line 994: /dev/oprofile/0/user: No such file or directory /usr/bin/opcontrol: line 994: /dev/oprofile/0/unit_mask: No such file or directory # ls -l /dev/oprofile/ total 0 drwxr-xr-x 1 root root 0 2007-03-01 09:41 1 drwxr-xr-x 1 root root 0 2007-03-01 09:41 2 drwxr-xr-x 1 root root 0 2007-03-01 09:41 3 -rw-r--r-- 1 root root 0 2007-03-01 09:41 backtrace_depth -rw-r--r-- 1 root root 0 2007-03-01 09:41 buffer -rw-r--r-- 1 root root 0 2007-03-01 09:41 buffer_size -rw-r--r-- 1 root root 0 2007-03-01 09:41 buffer_watershed -rw-r--r-- 1 root root 0 2007-03-01 09:41 cpu_buffer_size -rw-r--r-- 1 root root 0 2007-03-01 09:41 cpu_type -rw-rw-rw- 1 root root 0 2007-03-01 09:41 dump -rw-r--r-- 1 root root 0 2007-03-01 09:41 enable -rw-r--r-- 1 root root 0 2007-03-01 09:41 pointer_size drwxr-xr-x 1 root root 0 2007-03-01 09:41 stats > Thank you -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 11:11:02AM +0100, Pavel Machek ([EMAIL PROTECTED]) wrote: > > > > > 10% gain in speed is NOT worth major complexity increase. > > > > > > > > Should I create a patch to remove rb-tree implementation? > > > > > > If you can replace them with something simpler, and no worse than 10% > > > slower in worst case, then go ahead. (We actually tried to do that at > > > some point, only to realize that efence stresses vm subsystem in very > > > unexpected/unfriendly way). > > > > Agh, only 10% in the worst case. > > I think you can not even imagine what tricks network uses to get at > > least aditional 1% out of the box. > > Yep? Feel free to rewrite networking to assembly on Eugenix. That > should get you 1% improvement. If you reserve few registers to be only > used by kernel (not allowed by userspace), you can speedup networking > 5%, too. Ouch and you could turn off MMU, that is sure way to get few > more percent improvement in your networking case. It is not _my_ networking, but taht one you use everyday in every Linux box. Notice which tricks are used to remove single byte from sk_buff. It is called optimization, and if it does us a single plus it must be implemented. Not all people have magical fear of new things. > > Using such logic you can just abandon any further development, since it > > work as is right now. > > Stop trying to pervert my logic. Ugh? :) I just say in simple words your 'we do not need something if adds 10%, but is complex to understand'. > > > > That practice is stupid IMO. > > > > > > Too bad. Now you can start Linux fork called Eugenix. > > > > > > (But really, Linux is not "maximum performance at any cost". Linux is > > > "how fast can we get that while keeping it maintainable?"). > > > > Should I read it like: we do not understand what it is and thus we do > > not want it? > > Actually, yes, that's a concern. If your code is so crappy that we > can't understand it, guess what, it is not going to be merged. Notice > that someone will have to maintain your code if you get hit by bus. > > If your code is so complex that it is almost impossible to use from > userspace, that is good enough reason not to be merged. "But it is 3% > faster if..." is not a good-enough argument. Is it enough for you? epoll 4794.23 req/sec kevent 6468.95 req/sec And we are not saying about other kevent features like ability to deliver essentially any event through its queue or shared ring (and a some of its ideas are being slowly implemented in syslet/threadlet code, btw). Even if kevent is as fast as epoll, it allows to work with any kind of events (signals, timers, aio completion, io events and any other you like) with one queue/ring, which removes races and does _simplofy_ development, since there is no need to create different models to handle different events. > > > That is why, while arguing syslets vs. kevents, you need need to argue > > > not "kevents are faster because they avoid context switch overhead", > > > but "kevents are _so much_ faster that it is worth the added > > > complexity". And Ingo seems to showing you they are not _so much_ > > > faster. > > > > Threadlets behave much worse without event driven model, events can > > behave worse without backed threads, they are mutually compensating. > > I think Ingo demonstrated unoptimized threadlets to be within 5% to > the speed of kevent. Demonstrate that kevents are twice faster than > syslets on reasonable test case, and I guess we'll listen... That was compared to epoll, not kevent. But I repeat again - kevent is not only epoll, it can do a lot of ther things which does improve performance and simplify development - did you saw terrible hacks in libevent to handle signals without race in polling loop? It is not needed anymore completely - one event loop, one event structure, completely unified interface for all operations. Some kevent features are slowly being implemented in syslet/threadlet async code too, and it looks like I see where things will end up :), but likely I do not care about new 'kevent', I just wanted that that was said half a year ago, when I started its resending again, but Ingo already said his definitive word :) > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
On Thu, Mar 01, 2007 at 12:00:22PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: > > * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > I've just re-run tests on my amd64 test machine without debug options: > > > > epoll4794.23 > > kevent 6468.95 > > could you please post the two URLs for the exact evserver code used for > these measurements? (even if you did so already in the past - best to > have them always together with the numbers) Thanks! I've uploaded them to: http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c I also changed client socket to nonblocking mode with the same result in epoll server. If you will find it broken, please send me corrected to test too. > Ingo -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/