subject:"\"Re\\\: \\\[patch 00\\\/13\\\] Syslets, \\\"Threadlets\\\", generic AIO support, v3\""

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Evgeniy Polyakov

On Wed, Mar 07, 2007 at 03:21:19PM -0300, Kirk Kuchov ([EMAIL PROTECTED]) wrote:
> On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >
> >* Kirk Kuchov <[EMAIL PROTECTED]> wrote:
> >
> >> I don't believe I'm wasting my time explaining this. They don't exist
> >> as /dev/null, they are just fucking _LINKS_.
> >[...]
> >> > Either stop flaming kernel developers or become one. It is that
> >> > simple.
> >>
> >> If I were to become a kernel developer I would stick with FreeBSD.
> >> [...]
> >
> >Hey, really, this is an excellent idea: what a boon you could become to
> >FreeBSD, again! How much they must be longing for your insightful
> >feedback, how much they must be missing your charming style and tactful
> >approach! I bet they'll want to print your mails out, frame them and
> >hang them over their fireplace, to remember the good old days on cold
> >snowy winter days, with warmth in their hearts! Please?
> >
> 
> http://www.totallytom.com/thecureforgayness.html

Fonts are a bit bad in my browser :)

Kirk, I understand your frustration - yes, Linux is not the perfect
place to include startups ideas, and yes it lacks some features modern
(or old) systems support for years, but things change with time.

I posted a patch which allows to poll for signals, it can be trivially
adopted to support timers and essentially any other events.
Kevent did that too, but some things are just too radical for immediate
support, especially when majority of users do not require additional
functionality.

People do work, and a lot of them do really good work, so no need for
rude talks about how things are bad. Things change - even I support
that, although kevent ignorance should put me into the first line with
you :)

Be good, and be cool.

> --
> Kirk Kuchov

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Kirk Kuchov

On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:

* Kirk Kuchov <[EMAIL PROTECTED]> wrote:

> I don't believe I'm wasting my time explaining this. They don't exist
> as /dev/null, they are just fucking _LINKS_.
[...]
> > Either stop flaming kernel developers or become one. It is that
> > simple.
>
> If I were to become a kernel developer I would stick with FreeBSD.
> [...]

Hey, really, this is an excellent idea: what a boon you could become to
FreeBSD, again! How much they must be longing for your insightful
feedback, how much they must be missing your charming style and tactful
approach! I bet they'll want to print your mails out, frame them and
hang them over their fireplace, to remember the good old days on cold
snowy winter days, with warmth in their hearts! Please?

http://www.totallytom.com/thecureforgayness.html

--
Kirk Kuchov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Jens Axboe

On Wed, Mar 07 2007, Kirk Kuchov wrote:
> On 3/7/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >
> >* Kirk Kuchov <[EMAIL PROTECTED]> wrote:
> >
> >> I don't believe I'm wasting my time explaining this. They don't exist
> >> as /dev/null, they are just fucking _LINKS_.
> >[...]
> >> > Either stop flaming kernel developers or become one. It is that
> >> > simple.
> >>
> >> If I were to become a kernel developer I would stick with FreeBSD.
> >> [...]
> >
> >Hey, really, this is an excellent idea: what a boon you could become to
> >FreeBSD, again! How much they must be longing for your insightful
> >feedback, how much they must be missing your charming style and tactful
> >approach! I bet they'll want to print your mails out, frame them and
> >hang them over their fireplace, to remember the good old days on cold
> >snowy winter days, with warmth in their hearts! Please?
> >
> 
> http://www.totallytom.com/thecureforgayness.html

Dude, get a life. But more importantly, go waste somebody elses time
instead of lkml's.

-- 
Jens Axboe, updating killfile

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Ingo Molnar

* Kirk Kuchov <[EMAIL PROTECTED]> wrote:

> I don't believe I'm wasting my time explaining this. They don't exist 
> as /dev/null, they are just fucking _LINKS_.
[...]
> > Either stop flaming kernel developers or become one. It is that 
> > simple.
> 
> If I were to become a kernel developer I would stick with FreeBSD. 
> [...]

Hey, really, this is an excellent idea: what a boon you could become to 
FreeBSD, again! How much they must be longing for your insightful 
feedback, how much they must be missing your charming style and tactful 
approach! I bet they'll want to print your mails out, frame them and 
hang them over their fireplace, to remember the good old days on cold 
snowy winter days, with warmth in their hearts! Please?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Linus Torvalds

On Wed, 7 Mar 2007, Kirk Kuchov wrote:
> 
> I don't believe I'm wasting my time explaining this. They don't exist
> as /dev/null, they are just fucking _LINKS_. I could even "ln -s
> /proc/self/fd/0 sucker". A real /dev/stdout can/could even exist, but
> that's not the point!

Actually, one large reason for /proc/self/ existing is exactly /dev/stdin 
and friends.

And yes, /proc/self looks like a link too, but that doesn't change the 
fact that it's a very special file. No different from /dev/null or 
friends.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-07 Thread Kirk Kuchov

On 3/6/07, Pavel Machek <[EMAIL PROTECTED]> wrote:

> >As for why common abstractions like file are a good thing, think about why
> >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd
> >value to be plugged everywhere,
>
> This is a stupid comparaison. By your logic we should also have /dev/stdin,
> /dev/stdout and /dev/stderr.

Bzzt, wrong. We have them.

[EMAIL PROTECTED]:~$ ls -al /dev/std*
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stderr -> fd/2
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdin -> fd/0
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdout -> fd/1
[EMAIL PROTECTED]:~$ ls -al /proc/self/fd
total 0
dr-x-- 2 pavel users  0 Mar  6 09:18 .
dr-xr-xr-x 4 pavel users  0 Mar  6 09:18 ..
lrwx-- 1 pavel users 64 Mar  6 09:18 0 -> /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 1 -> /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 2 -> /dev/ttyp2
lr-x-- 1 pavel users 64 Mar  6 09:18 3 -> /proc/2299/fd
[EMAIL PROTECTED]:~$

I don't believe I'm wasting my time explaining this. They don't exist
as /dev/null, they are just fucking _LINKS_. I could even "ln -s
/proc/self/fd/0 sucker". A real /dev/stdout can/could even exist, but
that's not the point!

It remains a stupid comparison because /dev/stdin/stderr/whatever
"must" be plugged, else how could a process write to stdout/stderr
that it coud'nt open it ? The way things are is not because it's
cleaner to have it as a file but because it's the only sane way.
/dev/null is not a must have, it's mainly used for redirecting
purposes. A sys_nullify(fileno(stdout)) would rule out almost any use
of /dev/null.

> >As for why common abstractions like file are a good thing, think about why
> >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd
> >value to be plugged everywhere,

> >But here the list could be almost endless.
> >And please don't start the, they don't scale or they need heavy file
> >binding tossfeast. They scale as well as the interface that will receive
> >them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
> >the struct file? How many signal/timer fd are you gonna have? Like 100K?
> >Really moot argument when opposed to the benefit of being compatible with
> >existing POSIX interfaces and being more Unix friendly.
>
> So why the HELL don't we have those yet? Why haven't you designed
> epoll with those in mind? Why don't you back your claims with patches?
> (I'm not a kernel developer.)

Either stop flaming kernel developers or become one. It is  that
simple.

If I were to become a kernel developer I would stick with FreeBSD. At
least they have kqueue for about seven years now.

--
Kirk Kuchov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-06 Thread Pavel Machek

> >As for why common abstractions like file are a good thing, think about why
> >having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd
> >value to be plugged everywhere,
> 
> This is a stupid comparaison. By your logic we should also have /dev/stdin,
> /dev/stdout and /dev/stderr.

Bzzt, wrong. We have them.

[EMAIL PROTECTED]:~$ ls -al /dev/std*
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stderr -> fd/2
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdin -> fd/0
lrwxrwxrwx 1 root root 4 Nov 12  2003 /dev/stdout -> fd/1
[EMAIL PROTECTED]:~$ ls -al /proc/self/fd
total 0
dr-x-- 2 pavel users  0 Mar  6 09:18 .
dr-xr-xr-x 4 pavel users  0 Mar  6 09:18 ..
lrwx-- 1 pavel users 64 Mar  6 09:18 0 -> /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 1 -> /dev/ttyp2
lrwx-- 1 pavel users 64 Mar  6 09:18 2 -> /dev/ttyp2
lr-x-- 1 pavel users 64 Mar  6 09:18 3 -> /proc/2299/fd
[EMAIL PROTECTED]:~$

> >But here the list could be almost endless.
> >And please don't start the, they don't scale or they need heavy file
> >binding tossfeast. They scale as well as the interface that will receive
> >them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
> >the struct file? How many signal/timer fd are you gonna have? Like 100K?
> >Really moot argument when opposed to the benefit of being compatible with
> >existing POSIX interfaces and being more Unix friendly.
> 
> So why the HELL don't we have those yet? Why haven't you designed
> epoll with those in mind? Why don't you back your claims with patches?
> (I'm not a kernel developer.)

Either stop flaming kernel developers or become one. It is  that
simple.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Michael K. Edwards


On 3/4/07, Kyle Moffett <[EMAIL PROTECTED]> wrote:

Well, even this far into 2.6, Linus' patch from 2003 still (mostly)
applies; the maintenance cost for this kind of code is virtually
zilch.  If it matters that much to you clean it up and make it apply;
add an alarmfd() syscall (another 100 lines of code at most?) and
make a "read" return an architecture-independent siginfo-like
structure and submit it for inclusion.  Adding epoll() support for
random objects is as simple as a 75-line object-filesystem and a 25-
line syscall to return an FD to a new inode.  Have fun!  Go wild!
Something this trivially simple could probably spend a week in -mm
and go to linus for 2.6.22.


Or, if you want to do slightly more work and produce something a great
deal more useful, you could implement additional netlink address
families for additional "event" sources.  The socket - setsockopt -
bind - sendmsg/recvmsg sequence is a well understood and well
documented UNIX paradigm for multiplexing non-blocking I/O to many
destinations over one socket.  Everyone who has read Stevens is
familiar with the basic UDP and "fd open server" techniques, and if
you look at Linux's IP_PKTINFO and NETLINK_W1 (bravo, Evgeniy!) you'll
see how easily they could be extended to file AIO and other kinds of
event sources.

For file AIO, you might have the application open one AIO socket per
mount point, open files indirectly via the SCM_RIGHTS mechanism, and
submit/retire read/write requests via sendmsg/recvmsg with ancillary
data consisting of an lseek64 tuple and a user-provided cookie.
Although the process still has to have one fd open per actual open
file (because trying to authenticate file accesses without opening fds
is madness), the only fds it has to manipulate directly are those
representing entire pools of outstanding requests.  This is usually a
small enough set that select() will do just fine, if you're careful
with fd allocation.  (You can simply punt indirectly opened fds up to
a high numerical range, where they can't be accessed directly from
userspace but still make fine cookies for use in lseek64 tuples within
cmsg headers).

The same basic approach will work for timers, signals, and just about
any other event source.  Userspace is of course still stuck doing its
own state machines / thread scheduling / however you choose to think
of it.  But all the important activity goes through socketcall(), and
the data and control parameters are all packaged up into a struct
msghdr instead of the bare buffer pointers of read/write.  So if
someone else does come along later and design an ultralight threading
mechanism that isn't a total botch, the actual data paths won't need
much rework; the exception handling will just get a lot simpler.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Magnus Naeslund(k)


Kirk Kuchov wrote:
[snip]


This is a stupid comparaison. By your logic we should also have /dev/stdin,
/dev/stdout and /dev/stderr.



Well, as a matter of fact (on my system):

# ls -l /dev/std*
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stderr -> fd/2
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stdin -> fd/0
lrwxrwxrwx  1 root root 4 Feb  1  2006 /dev/stdout -> fd/1

Please don't bother to respond to this mail, I just saw that you 
apparently needed the info.


Magnus

P.S.: *PLONK*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Discussing LKML community [OT from the Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3]

2007-03-04 Thread Oleg Verych

> From: "Michael K. Edwards" <[EMAIL PROTECTED]>
> Newsgroups: gmane.linux.kernel
> Subject: Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3
> Date: Wed, 28 Feb 2007 09:01:07 -0800

Michael,

[]
> In this instance, there didn't seem to be any harm in sending my
> thoughts to LKML as I wrote them, on the off chance that Ingo or
> Davide would get some value out of them in this design cycle (which
> any code I eventually get around to producing will miss).  So far,
> I've gotten some rather dismissive pushback from Ingo and Alan (who
> seem to have no interest outside x86 and less understanding than I
> would have thought of what real userspace code looks like), a "why
> preach to people who know more than you do" from Davide,

this may be sad, unless you've spent time and effort to make a Patch,
i.e. read source, understand why it's written so, why it's being used now
that way, and why it has to be updated on new cycle of kernel
development.

> a brief aside on the dominance of x86 from Oleg,

I didn't have a chance, and probably i will not have one, to communicate
with people like you to learn from your wisdom personally. That's why
i've replied to your, after you've mentioned transputers. And i've got
rather different opinion, than i expected. That shows my test-tube
being, little experience etc. As discussion was about CPUs, it was
technical, thus on-topic for LKML.

> and one off-list "keep up the good work".  Not a very rich harvest from
> (IMHO) pretty good seeds.

Offlist message was my share of view about things, that were offtopic,
and clarifying about lkml thing, and it wasn't on-topic for LKML.

I'm pretty sure, that there libraries of books, written on every single
bit of things Linux currently *implements* in asm/C.

(1) Thus, `return -ENOPATCH', man, regardless what you are saying in
lkml. That's why prominent people, you've joined me with (:, replied in
go-to-kernelnewbie style.

> In short, so far the "Linux kernel community" is upholding its
> reputation for insularity, arrogance, coding without prior design,
> lack of interest in userspace problems, and inability to learn from
> the mistakes of others.  (None of these characterizations depends on
> there being any real insight in anything I have written.)

You, as a person, who have right to be personally wrong, may think that
way. But do not forget, as i've wrote you offlist and in (1), this is
development community, sometimes development of development one, etc;
educated, enthusiastic, wise, Open Source, poor on time (and money :).

> Happy hacking,
> - Michael

And you too. LKML *can* (sometimes may) show how useful this hacking is.

> P. S.  I do think "threadlets" are brilliant, though, and reading
> Ingo's patches gave me a much better idea of what would be involved in
> prototyping Asynchronously Executed I/O Unit opcodes.

You are discussing on-topic thing in the P.S. And this is IMHO wrong
approach.

Also, note, that i've changed subject, stripped cc list, please note,
that i can be young and naive boy barking up the wrong tree.

Kind regards.
--
-o--=O`C  /. .\
 #oo'L O  o
<___=E M^-- (Wuuf)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Davide Libenzi

On Sun, 4 Mar 2007, Kirk Kuchov wrote:

> I don't give a shit.

Here's another good use of /dev/null:

*PLONK*



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Kirk Kuchov

On 3/4/07, Davide Libenzi  wrote:

On Sun, 4 Mar 2007, Kirk Kuchov wrote:

> On 3/3/07, Davide Libenzi  wrote:
> > 
> >
> > Those *other* (tons?!?) interfaces can be created *when* the need comes
> > (see Linus signalfd [1] example to show how urgent that was). *When*
> > the need comes, they will work with existing POSIX interfaces, without
> > requiring your own just-another event interface. Those other interfaces
> > could also be more easily adopted by other Unix cousins, because of
> > the fact that they rely on existing POSIX interfaces.
>
> Please stop with this crap, this chicken or the egg argument of yours is utter
> BULLSHIT!

Wow, wow, fella! You _deinitely_ cannot afford rudeness here.

I don't give a shit.

You started bad, and you end even worse. By listing a some APIs that will
work only with epoll. As I said already, and as it was listed in the
thread I posted the link, something like:

int signalfd(...);  // Linus initial interface would be perfectly fine
int timerfd(...);   // Open ...
int eventfd(...);   // [1]

Will work *even* with standard POSIX select/poll. 95% or more of the
software does not have scalability issues, and select/poll are more
portable and easy to use for simple stuff. On top of that, as I already
said, they are *confined* interfaces that could be more easily adopted by
other Unixes (if they are 100-200 lines on Linux, don't expect them to be
a lot more on other Unixes) [2]. We *already* have the infrastructure
inside Linux to deliver events (f_op->poll subsystem), how about we use
that instead of just-another way? [3]

Man you're so full of shit, your eyes are brown. NOBODY cares about
select/poll or that the interfaces are going to be adopted by other
Unixes. This issue has already been solved by then YEARS ago.

What I want (and a ton of other users) is a SIMPLE and generic way to
receive events from _MULTIPLE_multiple sources. I don't care about
kernel-level portability, easiness or whatever, the linux kernel
developers are good at not knowing what their users want.

As for why common abstractions like file are a good thing, think about why
having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd
value to be plugged everywhere,

This is a stupid comparaison. By your logic we should also have /dev/stdin,
/dev/stdout and /dev/stderr.

or why I can use find/grep/cat/echo/... to
look/edit at my configuration inside /proc, instead of using a frigging
registry editor.

Yet another stupid comparaison, /proc is a MESS! Almost as worse as
the registry. Linux now has three pieces of crap for
configuration/information: /proc, sysfs and sysctl. Nobody knows
exactly what should go into each one of those. Crap design at it's
best.

But here the list could be almost endless.
And please don't start the, they don't scale or they need heavy file
binding tossfeast. They scale as well as the interface that will receive
them (poll, select, epoll). Heavy file binding what? 100 or so bytes for
the struct file? How many signal/timer fd are you gonna have? Like 100K?
Really moot argument when opposed to the benefit of being compatible with
existing POSIX interfaces and being more Unix friendly.

So why the HELL don't we have those yet? Why haven't you designed
epoll with those in mind? Why don't you back your claims with patches?
(I'm not a kernel developer.)

As for the AIO stuff, if threadlets/syslets will prove effective, you can
host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of
userspace code needed to do that, fall inside your definition of "kludge",
we can even find a way to bridge the two.

I don't care about threadlets in this context, I just want to wait for
EVENTS from MULTIPLE sources WITHOUT mixing signals and other crap.
Your arrogance is amusing, stop pushing narrow-minded beliefs down the
throats of all Linux users. Kqueue, event ports,
WaitForMultipleObjects, epoll with multiple sources. That's what users
want, not yet another syscall/whatever hack.

Now, how about we focus on the topic of this thread?

[1] This could be an idea. People already uses pipes for this, but pipes
has some memory overhead inside the kernel (plus use two fds) that
could, if really felt necessary, be avoided.

Yet another hack!! 64kiB of space just to push some user events
around. Great idea!

[2] This is how those kind of interfaces should be designed. Modular,
re-usable, file-based interfaces, whose acceptance is not linked into
slurping-in a whole new interface with tenths of sub, interface-only,
objects. And from this POV, epoll is the friendlier.

Who said I want yet another interface? I just fucking want to receive
events from MULTIPLE sources through epoll. With or without a fd! My
anger and frustration is that we can get past this SIMPLE need!

[3] Notice the similarity between threadlets/syslets and epoll? They
enable pretty darn good scalability, with *existing* infrastructure,
and w/out special ad-hoc code to b

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Davide Libenzi

On Sun, 4 Mar 2007, Kirk Kuchov wrote:

> On 3/3/07, Davide Libenzi  wrote:
> > 
> > 
> > Those *other* (tons?!?) interfaces can be created *when* the need comes
> > (see Linus signalfd [1] example to show how urgent that was). *When*
> > the need comes, they will work with existing POSIX interfaces, without
> > requiring your own just-another event interface. Those other interfaces
> > could also be more easily adopted by other Unix cousins, because of
> > the fact that they rely on existing POSIX interfaces.
> 
> Please stop with this crap, this chicken or the egg argument of yours is utter
> BULLSHIT!

Wow, wow, fella! You _deinitely_ cannot afford rudeness here.
You started bad, and you end even worse. By listing a some APIs that will 
work only with epoll. As I said already, and as it was listed in the 
thread I posted the link, something like:

int signalfd(...);  // Linus initial interface would be perfectly fine
int timerfd(...);   // Open ...
int eventfd(...);   // [1]

Will work *even* with standard POSIX select/poll. 95% or more of the 
software does not have scalability issues, and select/poll are more 
portable and easy to use for simple stuff. On top of that, as I already 
said, they are *confined* interfaces that could be more easily adopted by 
other Unixes (if they are 100-200 lines on Linux, don't expect them to be 
a lot more on other Unixes) [2]. We *already* have the infrastructure 
inside Linux to deliver events (f_op->poll subsystem), how about we use 
that instead of just-another way? [3]
As for why common abstractions like file are a good thing, think about why 
having "/dev/null" is cleaner that having a special plug DEVNULL_FD fd 
value to be plugged everywhere, or why I can use find/grep/cat/echo/... to 
look/edit at my configuration inside /proc, instead of using a frigging 
registry editor. But here the list could be almost endless.
And please don't start the, they don't scale or they need heavy file 
binding tossfeast. They scale as well as the interface that will receive 
them (poll, select, epoll). Heavy file binding what? 100 or so bytes for 
the struct file? How many signal/timer fd are you gonna have? Like 100K? 
Really moot argument when opposed to the benefit of being compatible with 
existing POSIX interfaces and being more Unix friendly.
As for the AIO stuff, if threadlets/syslets will prove effective, you can 
host an epoll_wait over a syslet/threadlet. Or, if the 3 lines of 
userspace code needed to do that, fall inside your definition of "kludge", 
we can even find a way to bridge the two.
Now, how about we focus on the topic of this thread?

[1] This could be an idea. People already uses pipes for this, but pipes 
has some memory overhead inside the kernel (plus use two fds) that 
could, if really felt necessary, be avoided.

[2] This is how those kind of interfaces should be designed. Modular,
re-usable, file-based interfaces, whose acceptance is not linked into 
slurping-in a whole new interface with tenths of sub, interface-only, 
objects. And from this POV, epoll is the friendlier.

[3] Notice the similarity between threadlets/syslets and epoll? They 
enable pretty darn good scalability, with *existing* infrastructure, 
and w/out special ad-hoc code to be plugged everywhere. This translate 
directly in easier to maintain code.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Kyle Moffett


On Mar 04, 2007, at 11:23:37, Kirk Kuchov wrote:
So here we are, 2007. epoll() works with files, pipes, sockets,  
inotify and anything pollable (file descriptors) but aio, timers,  
signals and user-defined event. Can we please get those working  
with epoll ? Something as simple as:


[code snipped]

Would this be acceptable? Can we finally move on?


Well, even this far into 2.6, Linus' patch from 2003 still (mostly)  
applies; the maintenance cost for this kind of code is virtually  
zilch.  If it matters that much to you clean it up and make it apply;  
add an alarmfd() syscall (another 100 lines of code at most?) and  
make a "read" return an architecture-independent siginfo-like  
structure and submit it for inclusion.  Adding epoll() support for  
random objects is as simple as a 75-line object-filesystem and a 25- 
line syscall to return an FD to a new inode.  Have fun!  Go wild!   
Something this trivially simple could probably spend a week in -mm  
and go to linus for 2.6.22.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Kirk Kuchov


On 3/3/07, Davide Libenzi  wrote:



Those *other* (tons?!?) interfaces can be created *when* the need comes
(see Linus signalfd [1] example to show how urgent that was). *When*
the need comes, they will work with existing POSIX interfaces, without
requiring your own just-another event interface. Those other interfaces
could also be more easily adopted by other Unix cousins, because of
the fact that they rely on existing POSIX interfaces.


Please stop with this crap, this chicken or the egg argument of yours is utter
BULLSHIT! Just because Linux doesn't have a decent kernel event
notification mechanism it does not mean that users don't need. Nobody
cared about Linus's
signalfd because it wasn't mainline.

Look at any event notification libraries out there, it makes me sick how much
kludge they have to go thru to get near the same functionality of
kqueue on Linux.

Solaris has the Event Ports mechanism since 2003. FreeBSD, NetBSD, OpenBSD
and Mac OS X support kqueue since around 2000. Windows has had event
notification for ages now. These _facilities_ are all widely used,
given the platforms
popularity.

So here we are, 2007. epoll() works with files, pipes, sockets,
inotify and anything
pollable (file descriptors) but aio, timers, signals and user-defined
event. Can we
please get those working with epoll ? Something as simple as:

struct epoll_event ev;

ev.events = EV_TIMER | EPOLLONESHOT;
ev.data.u64 = 1000; /* timeout */

epoll_ctl(epfd, EPOLL_CTL_ADD, 0 /* ignored */, &ev);

or

struct sigevent ev;

ev.sigev_notify = SIGEV_EPOLL;
ev.sigev_signo = epfd;
ev.sigev_value = &ev;

timer_create(CLOCK_MONOTONIC, &ev, &timerid);

AIO:

struct sigevent ev;
int fd = io_setup(..); /* oh boy, I wish... but it works */

ev.events = EV_AIO | EPOLLONESHOT;
/* event.data.ptr returns pointer to the iocb */
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev);

or

struct iocb iocb;

iocb.aio_fildes = fileno(stdin);
iocb.aio_lio_opcode = IO_CMD_PREAD;
iocb.c.notify = IO_NOTIFY_EPOLL; /* __pad3/4 */

Would this be acceptable? Can we finally move on?

--
Kirk Kuchov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-04 Thread Michael K. Edwards


Please don't take this the wrong way, Ray, but I don't think _you_
understand the problem space that people are (or should be) trying to
address here.

Servers want to always, always block.  Not on a socket, not on a stat,
not on any _one_ thing, but in a condition where the optimum number of
concurrent I/O requests are outstanding (generally of several kinds
with widely varying expected latencies).  I have an embedded server I
wrote that avoids forking internally for any reason, although it
watches the damn serial port signals in parallel with handling network
I/O, audio, and child processes that handle VoIP signaling protocols
(which are separate processes because it was more practical to write
them in a different language with mediocre embeddability).  There's a
lot of things that can block out there, not just disk I/O, but the
only thing a genuinely scalable server process ever blocks on (apart
from the odd spinlock) is a wait-for-IO-from-somewhere mechanism like
select or epoll or kqueue (or even sleep() while awaiting SIGRT+n, or
if it genuinely doesn't suck, the thread scheduler).

Furthermore, not only do servers want to block rather than shove more
I/O into the plumbing than it can handle without backing up, they also
want to throttle the concurrency of requests at the kernel level *for
the kernel's benefit*.  In particular, a server wants to submit to the
kernel a ton of stats and I/O in parallel, far more than it makes
sense to actually issue concurrently, so that efficient sequencing of
these requests can be left to the kernel.  But the server wants to
guide the kernel with regard to the ratios of concurrency appropriate
to the various classes and the relative urgency of the individual
requests within each class.  The server also wants to be able to
reprioritize groups of requests or cancel them altogether based on new
information about hardware status and user behavior.

Finally, the biggest argument against syslets/threadlets AFAICS is
that -- if done incorrectly, as currently proposed -- they would unify
the AIO and normal IO paths in the kernel.  This would shackle AIO to
the current semantics of synchronous syscalls, in which buffers are
passed as bare pointers and exceptional results are tangled up with
programming errors.  This would, in turn, make it quite impossible for
future hardware to pipeline and speculatively execute chains of AIO
operations, leaving "syslets" to a few RDBMS programmers with time to
burn.  The unimproved ease of long term maintenance on the kernel (not
to mention the complete failure to make the writing of _correct_,
performant server code any easier) makes them unworthy of
consideration for inclusion.

So, while everybody has been talking about cached and non-cached
cases, those are really total irrelevancies.  The principal problem
that needs solving is to model the process's pool of in-flight I/O
requests, together with a much larger number of submitted but not yet
issued requests whose results are foreseeably likely to be needed
soon, using a data structure that efficiently supports _all_ of the
operations needed, including bulk cancellation, reprioritization, and
batch migration based on affinities among requests and locality to the
correct I/O resources.  Memory footprint and gentle-on-real-hardware
scheduling are secondary, but also important, considerations.  If you
happen to be able to service certain things directly from cache,
that's gravy -- but it's not very smart IMHO to put that central to
your design process.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ray Lee

Ihar `Philips` Filipau wrote:
> On 3/3/07, Ray Lee <[EMAIL PROTECTED]> wrote:
>> On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote:
>> > What I'm trying to get to: keep things simple. The proposed
>> > optimization by Ingo does nothing else but allowing AIO to probe file
>> > cache - if data there to go with fast path. So why not to implement
>> > what the people want - probing of cache? Because it sounds bad? But
>> > they are in fact proposing precisely that just masked with "fast
>> > threads".
>>
>>
>> Servers want to never, ever block. Not on a socket, not on a stat, not
>> on anything. (I have an embedded server I wrote that has to fork
>> internally just to watch the damn serial port signals in parallel with
>> handling network I/O, audio, and child processes that handle H323.)
>> There's a lot of things that can block out there, and it's not just
>> disk I/O.
>>
> 
> Why select/poll/epoll/friends do not work? I have programmed on both
> sides - user-space network servers and in-kernel network protocols -
> and "never blocking" thing was implemented in *nix in the times I was
> walking under table.
> 

Then you've never had to write something that watches serial port
signals. Google on TIOCMIWAIT to see what I'm talking about. The only
option for a userspace programmer to deal with that is to fork() or poll
the signals every so many milliseconds. There are probably more easy
examples, but that's the one off the top of my head that affected me.

In short, this isn't just about network IO, this isn't just about file IO.

> One can poll() more or less *any* device in system. With frigging
> exception of - right - files.

The problem is the "more or less." Say you're right, and 95% of the
system calls are either already asynchronous or non-blocking/poll()able.
One of the questions on the table is how to extend it to the last 5%.

> User-space-wise, check how squid (caching http proxy) does it: you
> have several (forked) instances to serve network requests and you have
> one/several disk I/O daemons. (So called "diskd storeio") Why? Because
> you cannot poll() file descriptors, but you can poll unix socket
> connected to diskd. If diskd blocks, squid still can serve requests.
> How threadlets are better then pool of diskd instances? All nastiness
> of shared memory set loose...

Samba/lighttpd/git want to issue dozens of stats in parallel so that the
kernel can have an opportunity to sort them better. Are you saying they
should fork() a process per stat that they want to issue in parallel?

> What I'm trying to get to. Threadlets wouldn't help existing
> single-threaded applications - what is about 95% of all applications.

Eh, I don't think that's right. Part of the reason threadlets and
syslets are on the table because it may be a more efficient way to do
AIO. And the differences between the syslet API and the current kernel
Async IO API can be abstracted away by glibc, so that today's apps that
do AIO would immediately benefit.

> What's more, as having some limited experience of kernel programming,
> I fail to see what threadlets would simplify on kernel side.

You can yank the entire separate AIO path, and just treat them as
another blocking API that syslets makes nonblocking. Immediate reduction
of code, and everybody is now using the same code paths, which means
higher test coverage and reduced maintenance cost.

This last point is really important. Even if no extra functionality
eventually makes it to userspace, this last point would still be enough
to make the powers that be consider inclusion.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Ray Lee <[EMAIL PROTECTED]> wrote:

On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote:
> What I'm trying to get to: keep things simple. The proposed
> optimization by Ingo does nothing else but allowing AIO to probe file
> cache - if data there to go with fast path. So why not to implement
> what the people want - probing of cache? Because it sounds bad? But
> they are in fact proposing precisely that just masked with "fast
> threads".

Servers want to never, ever block. Not on a socket, not on a stat, not
on anything. (I have an embedded server I wrote that has to fork
internally just to watch the damn serial port signals in parallel with
handling network I/O, audio, and child processes that handle H323.)
There's a lot of things that can block out there, and it's not just
disk I/O.

Why select/poll/epoll/friends do not work? I have programmed on both
sides - user-space network servers and in-kernel network protocols -
and "never blocking" thing was implemented in *nix in the times I was
walking under table.

One can poll() more or less *any* device in system. With frigging
exception of - right - files. IOW for 75% of I/O problem doesn't
exists since there is proper interface - e.g. sockets - in place.

User-space-wise, check how squid (caching http proxy) does it: you
have several (forked) instances to serve network requests and you have
one/several disk I/O daemons. (So called "diskd storeio") Why? Because
you cannot poll() file descriptors, but you can poll unix socket
connected to diskd. If diskd blocks, squid still can serve requests.
How threadlets are better then pool of diskd instances? All nastiness
of shared memory set loose...

What I'm trying to get to. Threadlets wouldn't help existing
single-threaded applications - what is about 95% of all applications.
And multi-threaded applications would gain little because few real
application create threads dynamically: creation need resources and
can fail, uncontrollable thread spawning hurts overall manageability
and additional care is needed regarding deadlocks/lock contentions
proofing. (The category of applications which want the performance
gain are also the applications which need to ensure greater stability
over long non-stop runs. Uncontrollable dynamism helps nothing.)

Having implemented several "file servers" - daemons serving file I/O
to other daemons - I honestly hardly see any improvements. Now people
configure such file servers to issue e.g. 10 file operations
simultaneously - using pool of 10 threads. What threadlets change? In
the end just to keep in check with threadlets I would need to issue
pthread_join() after some number of threadlets created. And the latter
number is the former "e.g. 10". IOW, programmer-wise the
implementation remain same - and all the limitations remain the same.
And all overhead of user-space locking remain the same. (*)

What's more, as having some limited experience of kernel programming,
I fail to see what threadlets would simplify on kernel side. End
result as I see it: user space becomes bit more complicated because of
dynamic multi-threading and kernel-space becomes also more complicated
because of the same added dynamism.

(*) Hm... On other side, if application would be able to tell kernel
to limit number of issued threadlets to N, then it might simplify the
job. Application can tell kernel "I need at most 10 blocking
threadlets, block me if there are more" and then dumbly throw I/O
threadlets at kernel as they are coming in. And kernel would then put
process to sleep if N+1 thredlets are blocking. That would definitely
simplify the job in user-space: it wouldn't need to call
pthread_join(). But it is still no replacement to poll()able file
descriptor or truly async mmap().

--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Davide Libenzi

On Sat, 3 Mar 2007, Davide Libenzi wrote:

> Those *other* (tons?!?) interfaces can be created *when* the need comes 
> (see Linus signalfd [1] example to show how urgent that was). *When* 
> the need comes, they will work with existing POSIX interfaces, without 
> requiring your own just-another event interface. Those other interfaces 
> could also be more easily adopted by other Unix cousins, because of 
> the fact that they rely on existing POSIX interfaces. One of the reason 
> about the Unix file abstraction interfaces, is that you do *not* have to 
> plan and bloat interfaces before. As long as your new abstraction behave 
> in a file-fashion, it can be automatically used with existing interfaces. 
> And you create them *when* the need comes.

Now, if you don't mind, my spare time is really limited and I prefer to 
spend it looking at stuff the topic of this thread talks about.
Even because the whole epoll/kevent discussion is heavily dependent on the 
fact that syslets/threadlets will or will not result a viable method for 
generic AIO. Savvy?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Davide Libenzi

On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:

> > I was referring to dropping an event directly to a userspace buffer, from 
> > the poll callback. If pages are not there, you might sleep, and you can't 
> > since the wakeup function holds a spinlock on the waitqueue head while 
> > looping through the waiters to issue the wakeup. Also, you don't know from 
> > where the poll wakeup is called.
> 
> Ugh, no, that is very limited solution - memory must be either pinned
> (which leads to dos and limited ring buffer), or callback must sleep.
> Actually in any way there _must_ exist a queue - if ring buffer is full
> event is not allowed to be dropped - it must be stored in some other
> place, for example in queue from where entries will be read (copied)
> which ring buffer will have entries (that is how it is implemented in
> kevent at least).

I was not advocating for that, if you read carefully. The fact that epoll 
does not do that, should be a clear hint. The old /dev/epoll IIRC was only 
10% faster than the current epoll under an *heavy* event frequency 
micro-bench like pipetest (and that version of epoll did not have the 
single pass over the ready set optimization). And /dev/epoll was 
delivering events *directly* on userspace visible (mmaped) memory in a 
zero-copy fashion.

> > BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
> > fd. Code remained there and nobody cared. Question: Was it because
> > 1) it had file bindings or 2) because nobody really cared to deliver 
> > signals to an event collector?
> > And *if* later requirements come, you don't need to change the API by 
> > adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
> > XXEVENT-only submission structure. You create an API that automatically 
> > makes that new abstraction work with POSIX poll/select, and you get epoll 
> > support for free. Without even changing a bit in the epoll API.
> 
> Well, we get epoll support for free, but we need to create tons of other
> interfaces and infrastructure for kernel users, and we need to change 
> userspace anyway.

Those *other* (tons?!?) interfaces can be created *when* the need comes 
(see Linus signalfd [1] example to show how urgent that was). *When* 
the need comes, they will work with existing POSIX interfaces, without 
requiring your own just-another event interface. Those other interfaces 
could also be more easily adopted by other Unix cousins, because of 
the fact that they rely on existing POSIX interfaces. One of the reason 
about the Unix file abstraction interfaces, is that you do *not* have to 
plan and bloat interfaces before. As long as your new abstraction behave 
in a file-fashion, it can be automatically used with existing interfaces. 
And you create them *when* the need comes.

[1] That was like 100 lines of code or so. See here:

http://tinyurl.com/3yuna5

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ray Lee


On 3/3/07, Ihar `Philips` Filipau <[EMAIL PROTECTED]> wrote:

What I'm trying to get to: keep things simple. The proposed
optimization by Ingo does nothing else but allowing AIO to probe file
cache - if data there to go with fast path. So why not to implement
what the people want - probing of cache? Because it sounds bad? But
they are in fact proposing precisely that just masked with "fast
threads".


Please don't take this the wrong way, but I don't think you understand
the problem space that people are trying to address here.

Servers want to never, ever block. Not on a socket, not on a stat, not
on anything. (I have an embedded server I wrote that has to fork
internally just to watch the damn serial port signals in parallel with
handling network I/O, audio, and child processes that handle H323.)
There's a lot of things that can block out there, and it's not just
disk I/O.

Further, not only do servers not want to block, they also want to cram
a lot more requests into the kernel at once *for the kernel's
benefit*. In particular, a server wants to issue a ton of stats and
I/O in parallel so that the kernel can optimize which order to handle
the requests.

Finally, the biggest argument in favor of syslets/threadlets AFAICS is
that -- if done correctly -- it would unify the AIO and normal IO
paths in the kernel. The improved ease of long term maintenance on the
kernel (and more test coverage, and more directed optimization,
etc...) just for this point alone makes them worth considering for
inclusion.

So, while everybody has been talking about cached and non-cached
cases, those are really special cases of the entire package that the
rest of us want.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov

On Sat, Mar 03, 2007 at 10:46:59AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:
> 
> > > You've to excuse me if my memory is bad, but IIRC the whole discussion 
> > > and loong benchmark feast born with you throwing a benchmark at Ingo 
> > > (with kevent showing a 1.9x performance boost WRT epoll), not with you 
> > > making any other point.
> > 
> > So, how does it sound?
> > "Threadlets are bad for IO because kevent is 2 times faster than epoll?"
> > 
> > I said threadlets are bad for IO (and we agreed that both approaches
> > shouldbe usedfor the maximum performance) because of rescheduling overhead -
> > tasks are quite heavy structuresa to move around - even pt_regs copy
> > takes more than event structure, but not because there is something in other
> > galaxy which might work faster than another something in another galaxy.
> > That was stupid even to think about that.
> 
> Evgeny, other folks on this thread read what you said, so let's not drag 
> this over.

Sure, I was wrong to start this again, but try to get my position - I
really tired from trying to prove that I'm not a camel just because we
had some misunderstanding at the start.

I do think that threadlets are relly cool solution and are indeed very
good approach for majority of the parallel processing, but my point is
still that it is not a perfect solution for all tasks.

Just to draw a line: kevent example is extrapolation of what can be
achieved with event-driven model, but that does not mean that it must be
_only_ used for AIO model - threadlets _and_ event driven model (yes, I
accepted Ingo's point about its declining) is the best solution.

> > > And if you really feel raw about the single O(nready) loop that epoll
> > > currently does, a new epoll_wait2 (or whatever) API could be used to
> > > deliver the event directly into a userspace buffer [1], directly from the
> > > poll callback, w/out extra delivery loops 
> > > (IRQ/event->epoll_callback->event_buffer).
> >
> > > [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
> > > mlocked userspace buffer, or some kernel pages mapped to userspace.
> > 
> > Callbacks never sleep - they add event into list just like current
> > implementation (maybe some lock must be changed from mutex to spinlock,
> > I do not rememeber), main problem is binding to the file structure,
> > which is heavy.
> 
> I was referring to dropping an event directly to a userspace buffer, from 
> the poll callback. If pages are not there, you might sleep, and you can't 
> since the wakeup function holds a spinlock on the waitqueue head while 
> looping through the waiters to issue the wakeup. Also, you don't know from 
> where the poll wakeup is called.

Ugh, no, that is very limited solution - memory must be either pinned
(which leads to dos and limited ring buffer), or callback must sleep.
Actually in any way there _must_ exist a queue - if ring buffer is full
event is not allowed to be dropped - it must be stored in some other
place, for example in queue from where entries will be read (copied)
which ring buffer will have entries (that is how it is implemented in
kevent at least).

> File binding heavy? The first, and by *far* biggest, source of events 
> inside an event collector, of someone that cares about scalability, are 
> sockets. And those are already files. Second would be AIO, and those (if 
> performance figures agrees) can be hosted inside syslets/threadlets.
> Then you fall into the no-care category, where the extra 100 bytes do not 
> make a case against the ability of using it with an existing POSIX 
> infrastructure (poll/select).

Well, sockets are the files indeed, and sockets already are perfectly
handled by epoll - but there are other users of petential interace - and
it must be designed to scale in _any_ situation very well.
Even if we right now do not have problems with some types of events, we
must scale with any new one.

> BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
> fd. Code remained there and nobody cared. Question: Was it because
> 1) it had file bindings or 2) because nobody really cared to deliver 
> signals to an event collector?
> And *if* later requirements come, you don't need to change the API by 
> adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
> XXEVENT-only submission structure. You create an API that automatically 
> makes that new abstraction work with POSIX poll/select, and you get epoll 
> support for free. Without even changing a bit in the epoll API.

Well, we get epoll support for free, but we need to create tons of other
interfaces and infrastructure for kernel users, and we need to change 
userspace anyway.
But epoll support requires to have quite heavy bindings to file
structure, so why don't we want to design new interface (since we need
to change userspace anyway) so that it could allow to scale and be very
memory optimized

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Davide Libenzi

On Sat, 3 Mar 2007, Evgeniy Polyakov wrote:

> > You've to excuse me if my memory is bad, but IIRC the whole discussion 
> > and loong benchmark feast born with you throwing a benchmark at Ingo 
> > (with kevent showing a 1.9x performance boost WRT epoll), not with you 
> > making any other point.
> 
> So, how does it sound?
> "Threadlets are bad for IO because kevent is 2 times faster than epoll?"
> 
> I said threadlets are bad for IO (and we agreed that both approaches
> shouldbe usedfor the maximum performance) because of rescheduling overhead -
> tasks are quite heavy structuresa to move around - even pt_regs copy
> takes more than event structure, but not because there is something in other
> galaxy which might work faster than another something in another galaxy.
> That was stupid even to think about that.

Evgeny, other folks on this thread read what you said, so let's not drag 
this over.

> > And if you really feel raw about the single O(nready) loop that epoll
> > currently does, a new epoll_wait2 (or whatever) API could be used to
> > deliver the event directly into a userspace buffer [1], directly from the
> > poll callback, w/out extra delivery loops 
> > (IRQ/event->epoll_callback->event_buffer).
>
> > [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
> > mlocked userspace buffer, or some kernel pages mapped to userspace.
> 
> Callbacks never sleep - they add event into list just like current
> implementation (maybe some lock must be changed from mutex to spinlock,
> I do not rememeber), main problem is binding to the file structure,
> which is heavy.

I was referring to dropping an event directly to a userspace buffer, from 
the poll callback. If pages are not there, you might sleep, and you can't 
since the wakeup function holds a spinlock on the waitqueue head while 
looping through the waiters to issue the wakeup. Also, you don't know from 
where the poll wakeup is called.
File binding heavy? The first, and by *far* biggest, source of events 
inside an event collector, of someone that cares about scalability, are 
sockets. And those are already files. Second would be AIO, and those (if 
performance figures agrees) can be hosted inside syslets/threadlets.
Then you fall into the no-care category, where the extra 100 bytes do not 
make a case against the ability of using it with an existing POSIX 
infrastructure (poll/select).
BTW, Linus made a signalfd sketch code time ago, to deliver signals to an 
fd. Code remained there and nobody cared. Question: Was it because
1) it had file bindings or 2) because nobody really cared to deliver 
signals to an event collector?
And *if* later requirements come, you don't need to change the API by 
adding an XXEVENT_SIGNAL_ADD or XXEVENT_TIMER_ADD, or creating a new 
XXEVENT-only submission structure. You create an API that automatically 
makes that new abstraction work with POSIX poll/select, and you get epoll 
support for free. Without even changing a bit in the epoll API.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> >Threadlets can work with any functionas a base - if it would be
> >recv-like it will limit possible case for parallel programming, so you
> >can code anything in threadlets - it is not only about IO.
>
> What I'm trying to get to: keep things simple. The proposed
> optimization by Ingo does nothing else but allowing AIO to probe file
> cache - if data there to go with fast path. So why not to implement
> what the people want - probing of cache? Because it sounds bad? But
> they are in fact proposing precisely that just masked with "fast
> threads".

There can be other parts than just plain recv/read syscalls - you can
create a logical processing entity and if it will block (as a whole, no
matter where), the whole processing will continue as a new thread.
And having different syscall to warm cache can end up in cache flush in
between warming and processing itself.

I'm not talking about cache warm up. And if we do - and that the whole
freaking point of AIO - Linux IIRC pins freshly loaded clean pages
anyway. So there would be problem but only under memory pressure. If
you under memory pressure - you already lost the game and do not care
about performance/what threads you are using.

It is the whole "threadlets to threads on blocking" things doesn't
sound convincing. It sounds more like "premature optimization". But
anyway, not that I'm AIO specialist. For networking it is totally
unnecessary since most applications which care have already rate
control and buffer management built in. Network connections/sockets
allows greater level of application control on what and how they do.
Compared to blockdev's plain dumb read()/write() going through global
cache. And not that (judging from interface) AIO changes that much -
it is still dumb read() what IMHO makes no sense whatsoever to mmap()
oriented Linux.

--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov

On Sat, Mar 03, 2007 at 11:58:17AM +0100, Ihar `Philips` Filipau ([EMAIL 
PROTECTED]) wrote:
> On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau 
> >([EMAIL PROTECTED]) wrote:
> >> I'm not well versed in modern kernel development discussions, and
> >> since you have put the thing into the networked context anyway, can
> >> you please ask on lkml why (if they want threadlets solely for AIO)
> >> not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT).
> >> Developers already know the inteface, socket infrastructure is already
> >> in kernel, etc. And it might do precisely what they want: access file
> >> in disk cache - just like in case of socket it does access recv buffer
> >> of socket. Why bother with implicit threads/waiting/etc - if all they
> >> want some way to probe cache?
> >
> >Threadlets can work with any functionas a base - if it would be
> >recv-like it will limit possible case for parallel programming, so you
> >can code anything in threadlets - it is not only about IO.
> >
> 
> Ingo defined them as "plain function calls as long as they do not block".
> 
> But when/what function could block?
> 
> (1) File descriptors. Read. If data are in cache it wouldn't block.
> Otherwise it would. Write. If there is space in cache it wouldn't
> block. Otherwise it would.
> 
> (2) Network sockets. Recv. If data are in buffer they wouldn't block.
> Otherwise they would. Send. If there is space in send buffer it
> wouldn't block. Otherwise it would.
> 
> (3) Pipes, fifos & unix sockets. Unfortunately gain nothing since the
> reliable local communication used mostly for control information
> passing. If you have to block on such socket it most likely important
> information anyway. (e.g. X server communication or sql query to SQL
> server). (Or even less important here case of shell pipes.) And most
> users here are all single threaded and I/O bound: they would gain
> nothing from multi-threading - only PITA of added locking.
> 
> What I'm trying to get to: keep things simple. The proposed
> optimization by Ingo does nothing else but allowing AIO to probe file
> cache - if data there to go with fast path. So why not to implement
> what the people want - probing of cache? Because it sounds bad? But
> they are in fact proposing precisely that just masked with "fast
> threads".

There can be other parts than just plain recv/read syscalls - you can
create a logical processing entity and if it will block (as a whole, no
matter where), the whole processing will continue as a new thread.
And having different syscall to warm cache can end up in cache flush in
between warming and processing itself.
 
> -- 
> Don't walk behind me, I may not lead.
> Don't walk in front of me, I may not follow.
> Just walk beside me and be my friend.
>-- Albert Camus (attributed to)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Ihar `Philips` Filipau

On 3/3/07, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

On Fri, Mar 02, 2007 at 08:20:26PM +0100, Ihar `Philips` Filipau ([EMAIL 
PROTECTED]) wrote:
> I'm not well versed in modern kernel development discussions, and
> since you have put the thing into the networked context anyway, can
> you please ask on lkml why (if they want threadlets solely for AIO)
> not to implement analogue of recv(*filedes*, b, l, MSG_DONTWAIT).
> Developers already know the inteface, socket infrastructure is already
> in kernel, etc. And it might do precisely what they want: access file
> in disk cache - just like in case of socket it does access recv buffer
> of socket. Why bother with implicit threads/waiting/etc - if all they
> want some way to probe cache?

Threadlets can work with any functionas a base - if it would be
recv-like it will limit possible case for parallel programming, so you
can code anything in threadlets - it is not only about IO.

Ingo defined them as "plain function calls as long as they do not block".

But when/what function could block?

(1) File descriptors. Read. If data are in cache it wouldn't block.
Otherwise it would. Write. If there is space in cache it wouldn't
block. Otherwise it would.

(2) Network sockets. Recv. If data are in buffer they wouldn't block.
Otherwise they would. Send. If there is space in send buffer it
wouldn't block. Otherwise it would.

(3) Pipes, fifos & unix sockets. Unfortunately gain nothing since the
reliable local communication used mostly for control information
passing. If you have to block on such socket it most likely important
information anyway. (e.g. X server communication or sql query to SQL
server). (Or even less important here case of shell pipes.) And most
users here are all single threaded and I/O bound: they would gain
nothing from multi-threading - only PITA of added locking.

What I'm trying to get to: keep things simple. The proposed
optimization by Ingo does nothing else but allowing AIO to probe file
cache - if data there to go with fast path. So why not to implement
what the people want - probing of cache? Because it sounds bad? But
they are in fact proposing precisely that just masked with "fast
threads".

--
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
   -- Albert Camus (attributed to)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov

On Fri, Mar 02, 2007 at 09:28:10AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:
> 
> > do we really want to have per process signalfs, timerfs and so on - each 
> > simple structure must be bound to a file, which becomes too cost.
> 
> I may be old school, but if you ask me, and if you *really* want those 
> events, yes. Reason? Unix's everything-is-a-file rule, and being able to 
> use them with *existing* POSIX poll/select. Remember, not every app 
> requires huge scalability efforts, so working with simpler and familiar 
> APIs is always welcome.
> The *only* thing that was not practical to have as fd, was block requests. 
> But maybe threadlets/syslets will handle those just fine, and close the gap.

That means that we bind very small object like timer or signal to the
whoe file structure - yes, as I stated - it is doable, but do we really
have to create a file each time create_timer() or signal() is called?
Signals as a filesystem are limited in that regard that we need to
create additional structures to have signal number<->private data
relations.
I designed kevent to be as small as possible, so I removed file binding
idea first. I do not say it is wrong or epoll (and threadlets) are broken 
(fsck, I hope people do understand that), but as is it can not handle
that scenario, so it must be extended and/or a lot of other stuff
written to be compatible with epoll design. Kevent has different design
(which allows to work with old one though - there is a patch to
implement epoll over kevent).

> - Davide
> 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Evgeniy Polyakov

On Fri, Mar 02, 2007 at 09:13:40AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:
> 
> > On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
> > (davidel@xmailserver.org) wrote:
> > > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
> > > 
> > > > Ingo, do you really think I will send mails with faked benchmarks? :))
> > > 
> > > I don't think he ever implied that. He was only suggesting that when you 
> > > post benchmarks, and even more when you make claims based on benchmarks, 
> > > you need to be extra carefull about what you measure. Otherwise the 
> > > external view that you give to others does not look good.
> > > Kevent can be really faster than epoll, but if you post broken benchmarks 
> > > (that can be, unrealiable HTTP loaders, broken server implemenations, 
> > > etc..) and make claims based on that, the only effect that you have is to 
> > > lose your point.
> >  
> > So, I only talked that kevent is superior compared to epoll because (and
> > it is _main_ issue) of its ability to handle essentially any kind of
> > events with very small overhead (the same as epoll has in struct file -
> > list and spinlock) and without significant price of struct file binding
> > to event.
> 
> You've to excuse me if my memory is bad, but IIRC the whole discussion 
> and loong benchmark feast born with you throwing a benchmark at Ingo 
> (with kevent showing a 1.9x performance boost WRT epoll), not with you 
> making any other point.

So, how does it sound?
"Threadlets are bad for IO because kevent is 2 times faster than epoll?"

I said threadlets are bad for IO (and we agreed that both approaches
shouldbe usedfor the maximum performance) because of rescheduling overhead -
tasks are quite heavy structuresa to move around - even pt_regs copy
takes more than event structure, but not because there is something in other
galaxy which might work faster than another something in another galaxy.
That was stupid even to think about that.

> As far as epoll not being able to handle other events. Said who? Of 
> course, with zero modifications, you can handle zero additional events. 
> With modifications, you can handle other events. But lets talk about those 
> other events. The *only* kind of event that ppl (and being the epoll 
> maintainer I tend to receive those requests) missed in epoll, was AIO 
> events, That's the *only* thing that was missed by real life application 
> developers. And if something like threadlets/syslets will prove effective, 
> the gap is closed WRT that requirement.
> Epoll handle already the whole class of pollable devices inside the 
> kernel, and if you exclude block AIO, that's a pretty wide class already. 
> The *existing* f_op->poll subsystem can be used to deliver events at the 
> poll-head wakeup time (by using the "key" member of the poll callback), so 
> that you don't even need the extra f_op->poll call to fetch events.
> And if you really feel raw about the single O(nready) loop that epoll 
> currently does, a new epoll_wait2 (or whatever) API could be used to 
> deliver the event directly into a userspace buffer [1], directly from the 
> poll callback, w/out extra delivery loops 
> (IRQ/event->epoll_callback->event_buffer).

Signals, futexes, timers and userspace events I was requested to add into 
kevent, so far only futexes are missed because I was asked to freeze
development so other hackers could check the project.

> 
> [1] From the epoll callback, we cannot sleep, so it's gonna be either an 
> mlocked userspace buffer, or some kernel pages mapped to userspace.

Callbacks never sleep - they add event into list just like current
implementation (maybe some lock must be changed from mutex to spinlock,
I do not rememeber), main problem is binding to the file structure,
which is heavy.

> - Davide
> 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-03 Thread Davide Libenzi

On Sat, 3 Mar 2007, Ingo Molnar wrote:

> * Davide Libenzi  wrote:
> 
> > [...] Status word and control bits should not be changed from 
> > underneath userspace AFAIK. [...]
> 
> Note that the control bits do not just magically change during normal 
> FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense 
> to change those per-thread anyway. This is a non-issue anyway - what is 
> important is that the big bulk of 512 (or more) bytes of FPU state /are/ 
> callee-saved (both on 32-bit and on 64-bit), hence there's no need to 
> unlazy anything or to do expensive FPU state saves or other FPU juggling 
> around threadlet (or even syslet) use.

Well, the unlazy/sync happen in any case later when we switch (given 
TS_USEDFPU set). We'd avoid a copy of it given the above conditions true. 
Wouldn't it makes sense to carry over only the status word and the control 
bits eventually?
Also, if the caller saves the whole context, and if we're scheduled while 
inside a system call (not totally unfrequent case), can't we implement a 
smarter unlazy_fpu that avoids fxsave during schedule-out and frstor after 
schedule-in (do not do stts on this condition, so the newly scheduled 
task don't get a fault at all)? If the above conditions are true (no need 
context-copy for new head in async_exec), this should be possible too.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> Note that the control bits do not just magically change during normal 
> FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense 
> to change those per-thread anyway. This is a non-issue anyway - what is 
> important is that the big bulk of 512 (or more) bytes of FPU state /are/ 
> callee-saved (both on 32-bit and on 64-bit), hence there's no need to 
 ^ caller-saved
> unlazy anything or to do expensive FPU state saves or other FPU juggling 
> around threadlet (or even syslet) use.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Davide Libenzi  wrote:

> [...] Status word and control bits should not be changed from 
> underneath userspace AFAIK. [...]

Note that the control bits do not just magically change during normal 
FPU use. It's a bit like sys_setsid()/iopl/etc., it makes little sense 
to change those per-thread anyway. This is a non-issue anyway - what is 
important is that the big bulk of 512 (or more) bytes of FPU state /are/ 
callee-saved (both on 32-bit and on 64-bit), hence there's no need to 
unlazy anything or to do expensive FPU state saves or other FPU juggling 
around threadlet (or even syslet) use.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi

On Fri, 2 Mar 2007, Nicholas Miell wrote:

> On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote:
> > On Fri, 2 Mar 2007, Nicholas Miell wrote:
> > 
> > > The point Ingo was making is that the x86 ABI already requires the FPU
> > > context to be saved before *all* function calls.
> > 
> > I've not seen that among Ingo's points, but yeah some status is caller 
> > saved. But, aren't things like status word and control bits callee saved? 
> > If that's the case, it might require proper handling.
> > 
> 
> Ingo mentioned it in one of the parts you cut out of your reply:
> 
> > and here is where thinking about threadlets as a function call and not 
> > as an asynchronous context helps alot: the classic gcc convention for 
> > FPU use & function calls should apply: gcc does not call an external 
> > function with an in-use FPU stack/register, it always neatly unuses it, 
> > as no FPU register is callee-saved, all are caller-saved.
> 
> The i386 psABI is ancient (i.e. it predates SSE, so no mention of the
> XMM or MXCSR registers) and a bit vague (no mention at all of the FP
> status word), but I'm fairly certain that Ingo is right.

I'm not sure if that's the case. I'd be happy if it was, but I'm afraid 
it's not. Status word and control bits should not be changed from 
underneath userspace AFAIK. The ABI I remember tells me that those are 
callee saved. A quick gcc asm test tells me that too.
And assuming that's the case, why don't we have a smarter unlazy_fpu() 
then, that avoid FPU context sync if we're scheduled while inside a 
syscall (this is no different than an enter inside sys_async_exec - 
userspace should have taken care of it)?
IMO a syscall enter should not assume that userspace took care of saving 
the whole FPU context.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Benjamin LaHaise

On Fri, Mar 02, 2007 at 05:36:01PM -0800, Nicholas Miell wrote:
> > as an asynchronous context helps alot: the classic gcc convention for 
> > FPU use & function calls should apply: gcc does not call an external 
> > function with an in-use FPU stack/register, it always neatly unuses it, 
> > as no FPU register is callee-saved, all are caller-saved.
> 
> The i386 psABI is ancient (i.e. it predates SSE, so no mention of the
> XMM or MXCSR registers) and a bit vague (no mention at all of the FP
> status word), but I'm fairly certain that Ingo is right.

The FPU status word *must* be saved, as the rounding behaviour and error mode 
bits are assumed to be preserved.  Iow, yes, there is state which is required.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Nicholas Miell

On Fri, 2007-03-02 at 16:52 -0800, Davide Libenzi wrote:
> On Fri, 2 Mar 2007, Nicholas Miell wrote:
> 
> > The point Ingo was making is that the x86 ABI already requires the FPU
> > context to be saved before *all* function calls.
> 
> I've not seen that among Ingo's points, but yeah some status is caller 
> saved. But, aren't things like status word and control bits callee saved? 
> If that's the case, it might require proper handling.
> 

Ingo mentioned it in one of the parts you cut out of your reply:

> and here is where thinking about threadlets as a function call and not 
> as an asynchronous context helps alot: the classic gcc convention for 
> FPU use & function calls should apply: gcc does not call an external 
> function with an in-use FPU stack/register, it always neatly unuses it, 
> as no FPU register is callee-saved, all are caller-saved.

The i386 psABI is ancient (i.e. it predates SSE, so no mention of the
XMM or MXCSR registers) and a bit vague (no mention at all of the FP
status word), but I'm fairly certain that Ingo is right.


-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi

On Fri, 2 Mar 2007, Nicholas Miell wrote:

> The point Ingo was making is that the x86 ABI already requires the FPU
> context to be saved before *all* function calls.

I've not seen that among Ingo's points, but yeah some status is caller 
saved. But, aren't things like status word and control bits callee saved? 
If that's the case, it might require proper handling.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Nicholas Miell

On Fri, 2007-03-02 at 12:53 -0800, Davide Libenzi wrote:
> On Fri, 2 Mar 2007, Ingo Molnar wrote:
> 
> > 
> > * Davide Libenzi  wrote:
> > 
> > > I think that the "dirty" FPU context must, at least, follow the new 
> > > head. That's what the userspace sees, and you don't want an async_exec 
> > > to re-emerge with a different FPU context.
> > 
> > well. I think there's some confusion about terminology, so please let me 
> > describe everything in detail. This is how execution goes:
> > 
> >   outer loop() {
> >   call_threadlet();
> >   }
> > 
> > this all runs in the 'head' context. call_threadlet() always switches to 
> > the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
> > while executing the threadlet function, we block, then the 
> > threadlet-thread gets to keep the task (the threadlet stack and also the 
> > FPU), and blocks - and we pick a 'new head' from the thread pool and 
> > continue executing in that context - right after the call_threadlet() 
> > function, in the 'old' head's stack. I.e. it's as if we returned 
> > immediately from call_threadlet(), with a return code that signals that 
> > the 'threadlet went async'.
> > 
> > now, the FPU state that was when the threadlet blocked is totally 
> > meaningless to the 'new head' - that FPU state is from the middle of the 
> > threadlet execution.
> 
> For threadlets, it might be. Now think about a task wanting to dispatch N 
> parallel AIO requests as N independent syslets.
> Think about this task having USEDFPU set, so the FPU context is dirty.
> When it returns from async_exec, with one of the requests being become 
> sleepy, it needs to have the same FPU context it had when it entered, 
> otherwise it won't prolly be happy.
> For the same reason a schedule() must preserve/sync the "prev" FPU 
> context, to be reloaded at the next FPU fault.

The point Ingo was making is that the x86 ABI already requires the FPU
context to be saved before *all* function calls.

Unfortunately, this isn't true of other ABIs -- looking over the psABIs
specs I have laying around, AMD64, PPC64, and MIPS require at least part
of the FPU state to be preserved across function calls, and I'm sure
this is also true of others.

Then there's the other nasty details of new thread creation --
thankfully, the contents of the TLS isn't inherited from the parent
thread, but it still needs to be initialized; not to mention all the
other details involved in pthread creation and destruction.

I don't see any way around the pthread issues other than making a libc
upcall on return from the first system call that blocked.

-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Michael K. Edwards


On 3/2/07, Davide Libenzi  wrote:

For threadlets, it might be. Now think about a task wanting to dispatch N
parallel AIO requests as N independent syslets.
Think about this task having USEDFPU set, so the FPU context is dirty.
When it returns from async_exec, with one of the requests being become
sleepy, it needs to have the same FPU context it had when it entered,
otherwise it won't prolly be happy.
For the same reason a schedule() must preserve/sync the "prev" FPU
context, to be reloaded at the next FPU fault.


And if you actually think this through, I think you will arrive at (a
subset of) the conclusions I did a week ago: to keep the threadlets
lightweight enough to schedule and migrate cheaply, they can't be
allowed to "own" their own FPU and TLS context.  They have to be
allowed to _use_ the FPU (or they're useless) and to _use_ TLS (or
they can't use any glibc wrapper around a syscall, since they
practically all set the thread-local errno).  But they have to
"quiesce" the FPU and stash any thread-local state they want to keep
on their stack before entering the next syscall, or else it'll get
clobbered.

Keep thinking, especially about FPU flags, and you'll see why
threadlets spawned from the _same_ threadlet entrypoint should all run
in the same pool of threads, one per CPU, while threadlets from
_different_ entrypoints should never run in the same thread (FPU/TLS
context).  You'll see why threadlets in the same pool shouldn't be
permitted to preempt one another except at syscalls that block, and
the cost of preempting the real thread associated with one threadlet
pool with another real thread associated with a different threadlet
pool is the same as any other thread switch.  At which point,
threadlet pools are themselves first-class objects (to use the snake
oil phrase), and might as well be enhanced to a data structure that
has efficient operations for reprioritization, bulk cancellation, and
all that jazz.

Did I mention that there is actually quite a bit of prior art in this
area, which makes a much better guide to the design of round wheels
than micro-benchmarks do?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi

On Fri, 2 Mar 2007, Ingo Molnar wrote:

> 
> * Davide Libenzi  wrote:
> 
> > I think that the "dirty" FPU context must, at least, follow the new 
> > head. That's what the userspace sees, and you don't want an async_exec 
> > to re-emerge with a different FPU context.
> 
> well. I think there's some confusion about terminology, so please let me 
> describe everything in detail. This is how execution goes:
> 
>   outer loop() {
>   call_threadlet();
>   }
> 
> this all runs in the 'head' context. call_threadlet() always switches to 
> the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
> while executing the threadlet function, we block, then the 
> threadlet-thread gets to keep the task (the threadlet stack and also the 
> FPU), and blocks - and we pick a 'new head' from the thread pool and 
> continue executing in that context - right after the call_threadlet() 
> function, in the 'old' head's stack. I.e. it's as if we returned 
> immediately from call_threadlet(), with a return code that signals that 
> the 'threadlet went async'.
> 
> now, the FPU state that was when the threadlet blocked is totally 
> meaningless to the 'new head' - that FPU state is from the middle of the 
> threadlet execution.

For threadlets, it might be. Now think about a task wanting to dispatch N 
parallel AIO requests as N independent syslets.
Think about this task having USEDFPU set, so the FPU context is dirty.
When it returns from async_exec, with one of the requests being become 
sleepy, it needs to have the same FPU context it had when it entered, 
otherwise it won't prolly be happy.
For the same reason a schedule() must preserve/sync the "prev" FPU 
context, to be reloaded at the next FPU fault.




> > So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU 
> > context with an early unlazy_fpu(), *and* copy the sync'd FPU context 
> > to the new head. This should really be a fork of the dirty FPU context 
> > IMO, and should only happen if the USEDFPU bit is set.
> 
> why? The only effect this will have is a slowdown :) The FPU context 
> from the middle of the threadlet function is totally meaningless to the 
> 'new head'. It might be anything. (although in practice system calls are 
> almost never called with a truly in-use FPU.)

See above ;)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Davide Libenzi  wrote:

> I think that the "dirty" FPU context must, at least, follow the new 
> head. That's what the userspace sees, and you don't want an async_exec 
> to re-emerge with a different FPU context.

well. I think there's some confusion about terminology, so please let me 
describe everything in detail. This is how execution goes:

  outer loop() {
  call_threadlet();
  }

this all runs in the 'head' context. call_threadlet() always switches to 
the 'threadlet stack'. The 'outer context' runs in the 'head stack'. If, 
while executing the threadlet function, we block, then the 
threadlet-thread gets to keep the task (the threadlet stack and also the 
FPU), and blocks - and we pick a 'new head' from the thread pool and 
continue executing in that context - right after the call_threadlet() 
function, in the 'old' head's stack. I.e. it's as if we returned 
immediately from call_threadlet(), with a return code that signals that 
the 'threadlet went async'.

now, the FPU state that was when the threadlet blocked is totally 
meaningless to the 'new head' - that FPU state is from the middle of the 
threadlet execution.

and here is where thinking about threadlets as a function call and not 
as an asynchronous context helps alot: the classic gcc convention for 
FPU use & function calls should apply: gcc does not call an external 
function with an in-use FPU stack/register, it always neatly unuses it, 
as no FPU register is callee-saved, all are caller-saved.

> So, IMO, if the USEDFPU bit is set, we need to sync the dirty FPU 
> context with an early unlazy_fpu(), *and* copy the sync'd FPU context 
> to the new head. This should really be a fork of the dirty FPU context 
> IMO, and should only happen if the USEDFPU bit is set.

why? The only effect this will have is a slowdown :) The FPU context 
from the middle of the threadlet function is totally meaningless to the 
'new head'. It might be anything. (although in practice system calls are 
almost never called with a truly in-use FPU.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi

On Fri, 2 Mar 2007, Ingo Molnar wrote:

> 
> * Davide Libenzi  wrote:
> 
> > [...] We're still missing proper FPU context switch in the 
> > move_user_context(). [...]
> 
> yeah - i'm starting to be of the opinion that the FPU context should 
> stay with the threadlet, exclusively. I.e. when calling a threadlet, the 
> 'outer loop' (the event loop) should not leak FPU context into the 
> threadlet and then expect it to be replicated from whatever random point 
> the threadlet ended up sleeping at. It would be possible, but it just 
> makes no sense. What makes most sense is to just keep the FPU context 
> with the threadlet, and to let the 'new head' use an initial (unused) 
> FPU context. And it's in fact the threadlet that will most likely have 
> an acrive FPU context across a system call, not the outer loop. In other 
> words: no special FPU support needed at all for threadlets (i.e. no 
> flipping needed even) - this behavior just naturally happens in the 
> current implementation. Hm?

I think that the "dirty" FPU context must, at least, follow the new head. 
That's what the userspace sees, and you don't want an async_exec to 
re-emerge with a different FPU context.
I think it should also follow the async thread (old, going-to-sleep, 
thread), since a threadlet might have that dirtied, and as a consequence 
it'll want to find it back when it's re-scheduled.
So, IMO, if the USEDFPU bit is set, we need to sync the dirty  FPU context 
with an early unlazy_fpu(), *and* copy the sync'd FPU context to the new head.
This should really be a fork of the dirty FPU context IMO, and should only 
happen if the USEDFPU bit is set.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Davide Libenzi  wrote:

> [...] We're still missing proper FPU context switch in the 
> move_user_context(). [...]

yeah - i'm starting to be of the opinion that the FPU context should 
stay with the threadlet, exclusively. I.e. when calling a threadlet, the 
'outer loop' (the event loop) should not leak FPU context into the 
threadlet and then expect it to be replicated from whatever random point 
the threadlet ended up sleeping at. It would be possible, but it just 
makes no sense. What makes most sense is to just keep the FPU context 
with the threadlet, and to let the 'new head' use an initial (unused) 
FPU context. And it's in fact the threadlet that will most likely have 
an acrive FPU context across a system call, not the outer loop. In other 
words: no special FPU support needed at all for threadlets (i.e. no 
flipping needed even) - this behavior just naturally happens in the 
current implementation. Hm?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi

On Fri, 2 Mar 2007, Davide Libenzi wrote:

> And if you really feel raw about the single O(nready) loop that epoll 
> currently does, a new epoll_wait2 (or whatever) API could be used to 
> deliver the event directly into a userspace buffer [1], directly from the 
> poll callback, w/out extra delivery loops 
> (IRQ/event->epoll_callback->event_buffer).

And if you ever wonder from where the "epoll" name came, it came from the 
old /dev/epoll. The epoll predecessor /dev/epoll, was adding plugs 
everywhere events where needed and was delivering those events in O(1) 
*directly* on a user visible (mmap'd) buffer, in a zero-copy fashion.
The old /dev/epoll was faster the the current epoll, but the latter was 
chosen because despite being sloghtly slower, it had support for every 
pollable device, *without* adding more plugs into the existing code.
Performance and code maintainance are not to be taken disjointly whenever 
you evaluate a solution. That's the reason I got excited about this new 
generic AIO slution.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi

On Fri, 2 Mar 2007, Ingo Molnar wrote:

> > After your changes epoll increased to 5k.
> 
> Can we please stop this pointless episode of benchmarketing, where every 
> mail of yours shows different results and you even deny having said 
> something which you clearly said just a few days ago? At this point i 
> simply cannot trust the numbers you are posting, nor is the discussion 
> style you are following productive in any way in my opinion.

Agreed. Can we focus on the topic here? We're still missing proper FPU 
context switch in the move_user_context(). In v6?


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi

On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:

> do we really want to have per process signalfs, timerfs and so on - each 
> simple structure must be bound to a file, which becomes too cost.

I may be old school, but if you ask me, and if you *really* want those 
events, yes. Reason? Unix's everything-is-a-file rule, and being able to 
use them with *existing* POSIX poll/select. Remember, not every app 
requires huge scalability efforts, so working with simpler and familiar 
APIs is always welcome.
The *only* thing that was not practical to have as fd, was block requests. 
But maybe threadlets/syslets will handle those just fine, and close the gap.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Davide Libenzi

On Fri, 2 Mar 2007, Evgeniy Polyakov wrote:

> On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
> (davidel@xmailserver.org) wrote:
> > On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
> > 
> > > Ingo, do you really think I will send mails with faked benchmarks? :))
> > 
> > I don't think he ever implied that. He was only suggesting that when you 
> > post benchmarks, and even more when you make claims based on benchmarks, 
> > you need to be extra carefull about what you measure. Otherwise the 
> > external view that you give to others does not look good.
> > Kevent can be really faster than epoll, but if you post broken benchmarks 
> > (that can be, unrealiable HTTP loaders, broken server implemenations, 
> > etc..) and make claims based on that, the only effect that you have is to 
> > lose your point.
>  
> So, I only talked that kevent is superior compared to epoll because (and
> it is _main_ issue) of its ability to handle essentially any kind of
> events with very small overhead (the same as epoll has in struct file -
> list and spinlock) and without significant price of struct file binding
> to event.

You've to excuse me if my memory is bad, but IIRC the whole discussion 
and loong benchmark feast born with you throwing a benchmark at Ingo 
(with kevent showing a 1.9x performance boost WRT epoll), not with you 
making any other point.
As far as epoll not being able to handle other events. Said who? Of 
course, with zero modifications, you can handle zero additional events. 
With modifications, you can handle other events. But lets talk about those 
other events. The *only* kind of event that ppl (and being the epoll 
maintainer I tend to receive those requests) missed in epoll, was AIO 
events, That's the *only* thing that was missed by real life application 
developers. And if something like threadlets/syslets will prove effective, 
the gap is closed WRT that requirement.
Epoll handle already the whole class of pollable devices inside the 
kernel, and if you exclude block AIO, that's a pretty wide class already. 
The *existing* f_op->poll subsystem can be used to deliver events at the 
poll-head wakeup time (by using the "key" member of the poll callback), so 
that you don't even need the extra f_op->poll call to fetch events.
And if you really feel raw about the single O(nready) loop that epoll 
currently does, a new epoll_wait2 (or whatever) API could be used to 
deliver the event directly into a userspace buffer [1], directly from the 
poll callback, w/out extra delivery loops 
(IRQ/event->epoll_callback->event_buffer).

[1] From the epoll callback, we cannot sleep, so it's gonna be either an 
mlocked userspace buffer, or some kernel pages mapped to userspace.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov

On Fri, Mar 02, 2007 at 11:57:13AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > > > > [...] The numbers are still highly suspect - and we are already 
> > > > > down from the prior claim of kevent being almost twice as fast 
> > > > > to a 25% difference.
> > > >
> > > > Btw, there were never almost twice perfromance increase - epoll in 
> > > > my tests always showed 4-5 thousands requests per second, kevent - 
> > > > up to 7 thausands.
> > > 
> > > i'm referring to your claim in this mail of yours from 4 days ago 
> > > for example:
> > > 
> > >   http://lkml.org/lkml/2007/2/25/116
> > > 
> > >  "But note, that on my athlon64 3500 test machine kevent is about 7900
> > >   requests per second compared to 4000+ epoll, so expect a challenge."
> > > 
> > > no matter how i look at it, but 7900 is 1.9 times 4000 - which is 
> > > "almost twice".
> > 
> > After your changes epoll increased to 5k.
> 
> Can we please stop this pointless episode of benchmarketing, where every 
> mail of yours shows different results and you even deny having said 
> something which you clearly said just a few days ago? At this point i 
> simply cannot trust the numbers you are posting, nor is the discussion 
> style you are following productive in any way in my opinion.

I just show what I see in tests - I do not perform deep analysis of
that, since I do not see why it should be done - it is not fake, it is
not fantasy - real behaviour which is observed in my test machine, if it
will sudenly change I will report it.
Btw, I showed cases when epoll behaved better than kevent and
performance was unbeatable 9k requests per second - I do not know, why
it happend - maybe some cache related issues, other process all slept in
once, increased radiation or strong wind blew away my bad aura - it is
not reproducible on demand too.

> (you are never ever wrong, and if you are proven wrong on topic A you 
> claim it is an irrelevant topic (without even admitting you were wrong 
> about it) and you point to topic B claiming it's the /real/ topic you 
> talked about all along. And along the way you are slandering other 
> projects like epoll and threadlets, distorting the discussion. This kind 
> of keep-the-ball-moving discussion style is effective in politics but 
> IMO it's a waste of time when developing a kernel.)

Heh - that is why I'm not subscribed to lkml@ - it tooo frequently ends
up with politics :)

What we are talking about - we try to insult each other with something,
that was supposed to be said after some assumption on theoretical mental
exercise? I can only laugh on that :)

Ingo, I never ever tried to show that something is broken - that is
fantasy based on straight words, not on the real intension.

I never said epoll is broken. Absolutely.

I never said threadlet is broken. Absolutely.

I just showed that it is not (in my opinion) right decision to use
threadlets for IO based model instead of event driven - it is not based
on kevent performance (I _never_ stated it as a main factor - kevent was
only an example of event driven model, you were confused it with kevent
AIO, which is different beast), but instead on experience with nptl
threads and linuxthreads, and related rescheduling overhead compared to 
userspace one.

I showed kevent as a possible usage scenario - since it does support own
AIO. And you started to fight against it in every detail, since you
think kevent is not a good idea to handle AIO model - well, it can be
pefectly correct, I showed kevent AIO (please do not think that kevent
and kevent AIO are the same - the latter is just one of the possible
users I implemented, it only uses kevent to deliver completion event to 
userspace) as possible AIO implementation, but not _kevent_ itself.

But somehow we ended with binding to me some words I never said and ideas
I never based my assumptions on... I do not really think you even
remotely wanted to make any somehow personal assumptions on what we had
discussed.

We even concluded, that perfect IO model should use both approaches to
really scale - both threadlets with its on-demand-only rescheduling, and
event driven ring.
You pointed your opinion on kevents - well, I can not agree with it, but
that is your right not to like something.

Let's not continue bad practice of kicking each other just because there
were some problematic roots which noone even remember correctly - let's
do not make a mistake of pointing something personal out of trivial bits
- if you will be in Russia of around any time soon I will happily buy you 
a beer or what you prefer :)

So, let's just draw a line:
kevent was showed to people, and its performance, although flacky, is a
bit faster than epoll. Threadlets bound to any event driven ring do not
show any performance degradation in network driven setup with small
number of reschedulings with all advantages of simpler programming.
So, repeating myself, both models (not kevent and threa

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov

On Fri, Mar 02, 2007 at 11:56:18AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > Even if kevent has the same speed, it still allows to handle _any_ 
> > kind of events without any major surgery - a very tiny structure of 
> > lock and list head and you can process your own kernel event in 
> > userspace with timers, signals, io events, private userspace events 
> > and others without races and invention of differnet hacks for 
> > different types - _this_ is main point.
> 
> did it ever occur to you to ... extend epoll? To speed it up? To add a 
> new wait syscall to it? Instead of introducing a whole new parallel 
> framework?

Yes, I thought about its extension more than a year ago before started 
kevent, but epoll() is absolutely based on file structure and its 
file_operations with poll methodt, so it is quite impossible to work 
with sockets to implement network AIO. Eventually it had gathered a lot 
of other systems - do we really want to have per process signalfs, timerfs 
and so on - each simple structure must be bound to a file, which becomes 
too cost.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> > > > [...] The numbers are still highly suspect - and we are already 
> > > > down from the prior claim of kevent being almost twice as fast 
> > > > to a 25% difference.
> > >
> > > Btw, there were never almost twice perfromance increase - epoll in 
> > > my tests always showed 4-5 thousands requests per second, kevent - 
> > > up to 7 thausands.
> > 
> > i'm referring to your claim in this mail of yours from 4 days ago 
> > for example:
> > 
> >   http://lkml.org/lkml/2007/2/25/116
> > 
> >  "But note, that on my athlon64 3500 test machine kevent is about 7900
> >   requests per second compared to 4000+ epoll, so expect a challenge."
> > 
> > no matter how i look at it, but 7900 is 1.9 times 4000 - which is 
> > "almost twice".
> 
> After your changes epoll increased to 5k.

Can we please stop this pointless episode of benchmarketing, where every 
mail of yours shows different results and you even deny having said 
something which you clearly said just a few days ago? At this point i 
simply cannot trust the numbers you are posting, nor is the discussion 
style you are following productive in any way in my opinion.

(you are never ever wrong, and if you are proven wrong on topic A you 
claim it is an irrelevant topic (without even admitting you were wrong 
about it) and you point to topic B claiming it's the /real/ topic you 
talked about all along. And along the way you are slandering other 
projects like epoll and threadlets, distorting the discussion. This kind 
of keep-the-ball-moving discussion style is effective in politics but 
IMO it's a waste of time when developing a kernel.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Ingo Molnar


* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> Even if kevent has the same speed, it still allows to handle _any_ 
> kind of events without any major surgery - a very tiny structure of 
> lock and list head and you can process your own kernel event in 
> userspace with timers, signals, io events, private userspace events 
> and others without races and invention of differnet hacks for 
> different types - _this_ is main point.

did it ever occur to you to ... extend epoll? To speed it up? To add a 
new wait syscall to it? Instead of introducing a whole new parallel 
framework?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov

On Fri, Mar 02, 2007 at 11:27:14AM +0100, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
> Maybe. It is not up to me to decide. But "it is faster" is _not_ the
> only merge criterium.

Of course not!
Even if kevent has the same speed, it still allows to handle _any_ kind
of events without any major surgery - a very tiny structure of lock and
list head and you can process your own kernel event in userspace with 
timers, signals, io events, private userspace events and others without 
races and invention of differnet hacks for different types - 
_this_ is main point.

>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Pavel Machek

Hi!

> > > > If you can replace them with something simpler, and no worse than 10%
> > > > slower in worst case, then go ahead. (We actually tried to do that at
> > > > some point, only to realize that efence stresses vm subsystem in very
> > > > unexpected/unfriendly way).
> > > 
> > > Agh, only 10% in the worst case.
> > > I think you can not even imagine what tricks network uses to get at
> > > least aditional 1% out of the box.
> > 
> > Yep? Feel free to rewrite networking to assembly on Eugenix. That
> > should get you 1% improvement. If you reserve few registers to be only
> > used by kernel (not allowed by userspace), you can speedup networking
> > 5%, too. Ouch and you could turn off MMU, that is sure way to get few
> > more percent improvement in your networking case.
> 
> It is not _my_ networking, but taht one you use everyday in every Linux
> box. Notice which tricks are used to remove single byte from
> sk_buff.

Ok, so tricks were worth it in sk_buff case.

> It is called optimization, and if it does us a single plus it must be
> implemented. Not all people have magical fear of new things.

But that does not mean "every optimalization must be
implemented". Only optimalizations that are "worth it" are... 

> > > Using such logic you can just abandon any further development, since it
> > > work as is right now.
> > 
> > Stop trying to pervert my logic.
> 
> Ugh? :)
> I just say in simple words your 'we do not need something if adds 10%,
> but is complex to understand'.

Yes... but that does not mean "stop development". You are still free
to clean up the code _while_ making it faster.

> > If your code is so complex that it is almost impossible to use from
> > userspace, that is good enough reason not to be merged. "But it is 3%
> > faster if..." is not a good-enough argument.
> 
> Is it enough for you?
> 
> epoll   4794.23 req/sec
> kevent  6468.95 req/sec

Maybe. It is not up to me to decide. But "it is faster" is _not_ the
only merge criterium.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-02 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 11:31:14AM -0800, Davide Libenzi 
(davidel@xmailserver.org) wrote:
> On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:
> 
> > Ingo, do you really think I will send mails with faked benchmarks? :))
> 
> I don't think he ever implied that. He was only suggesting that when you 
> post benchmarks, and even more when you make claims based on benchmarks, 
> you need to be extra carefull about what you measure. Otherwise the 
> external view that you give to others does not look good.
> Kevent can be really faster than epoll, but if you post broken benchmarks 
> (that can be, unrealiable HTTP loaders, broken server implemenations, 
> etc..) and make claims based on that, the only effect that you have is to 
> lose your point.

We seems to move far away from original topic - I never built any
assumptions on top of kevent _performance_ - kevent is a logical
extrapolation of the epoll, I only showed that event driven model can be
fast and it outperforms threadlet one - after we changed topic we were
unable to actually test threadlets in networking environment, since the
only test I ran showed that threadlest do not reschedule at all, and
Ingo's tests showed small number of reschedulings.

So, I only talked that kevent is superior compared to epoll because (and
it is _main_ issue) of its ability to handle essentially any kind of
events with very small overhead (the same as epoll has in struct file -
list and spinlock) and without significant price of struct file binding
to event.

I did not want and do not want to hurt anyone (even Ingo, although he is 
against kevent :), but my opinion is that thread moved from nice 
discussion about threads and events with jokes and fun into quite angry 
word throwings, and that is too good - let's make it fun again.
I'm not a native english speaker (and do not use a dictionary), so it is 
quite possible that some my phrases were not exactly nice, but it was 
unintentional (at least not very) :)

Peace?

> - Davide
> 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Linus Torvalds

On Thu, 1 Mar 2007, Ingo Molnar wrote:
> 
> wrt. one-shot syscalls, the user-space stack footprint would still 
> probably be there, because even async contexts that only do single-shot 
> processing need to drop out of kernel mode to handle signals.

Why?

The easiest thing to do with signals is to just not pick them up. If the 
signal was to that *particular* threadlet (ie a "cancel"), then we just 
want to kill the threadlet. And if the signal was to the thread group, 
there is no reason why the threadlet should pick it up.

In neither case is there *any* reason to handle the signal in the 
threadlet, afaik.

And having to have a stack allocation for each threadlet certainly means 
that you complicate things a lot. Suddenly you have allocations that can't 
just go away. Again, I'm pointing to the problems I already pointed out 
with the allocations of the atom structures - quite often you do *not* 
want to keep track of anything specific for completion time, and that 
means that you MUST NOT have to de-allocate anythign either.

Again, think aio_read(). With the *exact* current binary interface. 
PLEASE. If you cannot emulate that with threadlets, then threadlets are 
*pointless*. On eof the major reasons for the whole exercise was to get 
rid of the special code in fs/aio.c.

So I repeat: if you cannot do that, and remain binary compatible, don't 
even bother.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Johann Borck


David Lang wrote:


On Thu, 1 Mar 2007, Johann Borck wrote:

I reported this a while ago and suggested to have the number of 
pending accepts reported with the event to save that last syscall.
I  created an ab replacement based on kevent, and at least with my 
machines, which are comparable to each other, the load on client 
dropped from 100% to 2% or something. ab just doesn't give meaningful 
results  (if the client is not way more powerful). With that new 
client I get very similar results for epoll and kevent, from 1000 
through to 26000 concurrent requests, the results have been posted on 
kevent-homepage in october, I just checked it with new version, but 
there's no significant difference.


this is the benchmark with kevent-based client:
http://tservice.net.ru/~s0mbre/blog/2006/10/11#2006_10_11
btw, each result is average over 1,000,000 requests

and just for comparison, this is on the same machines using ab:
http://tservice.net.ru/~s0mbre/blog/2006/10/08#2006_10_08


is this client avaialble? and what patches need to be added to the 
kernel to use it?


It's based on an older version of kevent, so I'll have to adapt it a bit 
for use with recent patch, no other than kevent is necessary. I'll post 
a link when it's cleaned up, if you want.


Johann

David Lang



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread David Lang



On Thu, 1 Mar 2007, Johann Borck wrote:

I reported this a while ago and suggested to have the number of pending 
accepts reported with the event to save that last syscall.
I  created an ab replacement based on kevent, and at least with my machines, 
which are comparable to each other, the load on client dropped from 100% to 
2% or something. ab just doesn't give meaningful results  (if the client is 
not way more powerful). With that new client I get very similar results for 
epoll and kevent, from 1000 through to 26000 concurrent requests, the results 
have been posted on kevent-homepage in october, I just checked it with new 
version, but there's no significant difference.


this is the benchmark with kevent-based client:
http://tservice.net.ru/~s0mbre/blog/2006/10/11#2006_10_11
btw, each result is average over 1,000,000 requests

and just for comparison, this is on the same machines using ab:
http://tservice.net.ru/~s0mbre/blog/2006/10/08#2006_10_08


is this client avaialble? and what patches need to be added to the kernel to use 
it?


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Davide Libenzi

Oh boy, wasn't this thread supposed to focus the syslets/threadlets ... :)

On Thu, 1 Mar 2007, Eric Dumazet wrote:

> On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote:
> > They are there, since ab runs only 50k requests.
> > If I change it to something noticebly more than 50/80k, ab crashes:
> > # ab -c8000 -t 600 -n8 http://192.168.0.48/
> > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
> > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> > Copyright 2006 The Apache Software Foundation, http://www.apache.org/
> >
> > Benchmarking 192.168.0.48 (be patient)
> > Segmentation fault
> >
> > Are there any other tool suitable for such loads?
> > I only tested httperf (which is worse, since it uses poll/select) and
> > 'ab'.
> >
> > Btw, host machine runs 100% too, so it is possible that client side is
> > broken (too).
> 
> I have similar problems here, ab test just doesnt complete...
> 
> I am still investigating with strace and tcpdump.
> 
> In the meantime you could just rewrite it (based on epoll please :) ), since 
> it should be quite easy to do this (reverse of evserver_epoll)

I have a simple one based on coroutines and epoll. You need libpcl and 
coronet. Debian hs a package named libpcl1-dev for libpcl, otherwise:

http://www.xmailserver.org/libpcl.html

and 'configure --prefix=/usr && sudo make install'.
Coronet is here:

http://www.xmailserver.org/coronet-lib.html

here just 'configure && make'.
Inside the "test" directory there a simple loader named cnhttpload:

  cnhttpload -s HOST -n NCON [-p PORT (80)] [-r NREQS (1)] [-S STKSIZE (8192)]
 [-M MAXCONNS] [-t TMUPD (1000)] [-a NACTIVE] [-T TMSAMP (200)]
 [-h] URL ...

HOST  = Target host
PORT  = Target host port
NCON  = Number of connections to the server
NACTIVE   = Number of active (live) connections
STKSIZE   = Stack size for coroutines
NREQS = Number of request done for each connection (better be 1 if 
your server do not support keep-alive)
MAXCONNS  = Maximum number of total connections done to the server. If not 
set, the test will continue forever (well, till a ^C)
TMUPD = Millisec time of stats update
TMSAMP= Millisec internal average-update time
URL   = Target doc (not http:// or host, just doc path)

So for the particular test my inbox was flooded with :), you'd use:

cnhttpload -s HOST -n 8 -a 8000 -S 4096

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Davide Libenzi

On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:

> Ingo, do you really think I will send mails with faked benchmarks? :))

I don't think he ever implied that. He was only suggesting that when you 
post benchmarks, and even more when you make claims based on benchmarks, 
you need to be extra carefull about what you measure. Otherwise the 
external view that you give to others does not look good.
Kevent can be really faster than epoll, but if you post broken benchmarks 
(that can be, unrealiable HTTP loaders, broken server implemenations, 
etc..) and make claims based on that, the only effect that you have is to 
lose your point.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Davide Libenzi

On Thu, 1 Mar 2007, Ingo Molnar wrote:

> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > I posted kevent/epoll benchmarks and related design issues too many 
> > times both with handmade applications (which might be broken as hell) 
> > and popular open-source servers to repeat them again.
> 
> numbers are crutial here - and given the epoll bugs in the evserver code 
> that we found, do you have updated evserver benchmark results that 
> compare epoll to kevent? I'm wondering why epoll has half the speed of 
> kevent in those measurements - i suspect some possible benchmarking bug. 
> The queueing model of epoll and kevent is roughly comparable, both do 
> only a constant number of steps to serve one particular request, 
> regardless of how many pending connections/requests there are. What is 
> the CPU utilization of the server system during an epoll test, and what 
> is the CPU utilization during a kevent test? 100% utilized in both 
> cases?

With 8K concurrent (live) connections, we may also want to try with the v3 
version of the epoll-event-loops-diet patch ;)



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Johann Borck


On Thu, Mar 01, 2007 at 04:41:27PM +0100, Eric Dumazet wrote:


I had to loop on accept() :

for (i=0; i
On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:


The same here - I would just enable a debug to find it.




I reported this a while ago and suggested to have the number of 
pending accepts reported with the event to save that last syscall.
I  created an ab replacement based on kevent, and at least with my 
machines, which are comparable to each other, the load on client 
dropped from 100% to 2% or something. ab just doesn't give meaningful 
results  (if the client is not way more powerful). With that new client 
I get very similar results for epoll and kevent, from 1000 through to 
26000 concurrent requests, the results have been posted on 
kevent-homepage in october, I just checked it with new version, but 
there's no significant difference.


this is the benchmark with kevent-based client:
http://tservice.net.ru/~s0mbre/blog/2006/10/11#2006_10_11
btw, each result is average over 1,000,000 requests

and just for comparison, this is on the same machines using ab:
http://tservice.net.ru/~s0mbre/blog/2006/10/08#2006_10_08

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread David Lang


On Thu, 1 Mar 2007, Evgeniy Polyakov wrote:


On Thu, Mar 01, 2007 at 08:56:28AM -0800, David Lang ([EMAIL PROTECTED]) wrote:

the ab numbers below do not seem that impressive to me, especially for such
stripped down server processes.

...

client and server are dual opteron 252 with 8G of ram, running debian in 64
bit mode


Decrease your hardware setup in 2-4 times, leave only one apache process
and try to get the same - we are not talking about how to create a
perfect web server, instead we try to focus possible problems in
epoll/kevent event driven logic.


for apache I agree that the target box was maxed out, so if you only had a 
single core on your AMD64 box that would be about half, however the thttpd is 
only useing ~1 of the CPU's (OS overhead is useing just a smidge of the second, 
but overall the box is 45-48% idle


if the amount of ram is an issue then you are swapping in your tests (or at 
least throwing out cache that you need) and so would not be testing what you 
think you are.



Vanilla (epoll) lighttpd shows 4000-5000 requests per second in my setup (no 
logs).
Default mpm-apache2 with bunch of threads - about 8k req/s.
Default thttpd (disabled logging) - about 2k req/s

Btw, all your tests are network bound, try to decrease
html page size to get actual event processing speed out of that machines.


same test retreiving a ~128b file the server never gets below 51% idle (so it's 
only useing one CPU)


Server Software:thttpd/2.23beta1
Server Hostname:208.2.188.5
Server Port:81

Document Path:  /128b
Document Length:136 bytes

Concurrency Level:  8000
Time taken for tests:   9.372902 seconds
Complete requests:  8
Failed requests:0
Write errors:   0
Total transferred:  30762842 bytes
HTML transferred:   10952216 bytes
Requests per second:8535.24 [#/sec] (mean)
Time per request:   937.290 [ms] (mean)
Time per request:   0.117 [ms] (mean, across all concurrent requests)
Transfer rate:  3205.09 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:   36  287 1125.6 739109
Processing:49   89  19.8 87 339
Waiting:   17   62  16.4 62 292
Total: 92  376 1137.41599262

Percentage of the requests served within a certain time (ms)
  50%159
  66%164
  75%165
  80%165
  90%203
  95%260
  98%   3233
  99%   9201
 100%   9262 (longest request)

note that this is showing the slowdown from the large concurrancy level, if I 
reduce the concurrancy level to 500 I get


Document Path:  /128b
Document Length:136 bytes

Concurrency Level:  500
Time taken for tests:   4.215025 seconds
Complete requests:  8
Failed requests:0
Write errors:   0
Total transferred:  30565348 bytes
HTML transferred:   10881904 bytes
Requests per second:18979.72 [#/sec] (mean)
Time per request:   26.344 [ms] (mean)
Time per request:   0.053 [ms] (mean, across all concurrent requests)
Transfer rate:  7081.33 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:0   15 206.3  13006
Processing: 27   6.4  6 224
Waiting:16   6.4  5 224
Total:  3   22 208.4  63229

Percentage of the requests served within a certain time (ms)
  50%  6
  66%  8
  75% 10
  80% 12
  90% 16
  95% 17
  98% 21
  99% 24
 100%   3229 (longest request)
loadtest2:/proc/sys#

again with >50% idle on the server box

also, ab appears to only use a single cpu so the fact that there are two on the 
client box should not make a difference.


I will reboot these boxes into a UP kernel if you think that this is still a 
significant difference. based on what I'm seeing I don't think it will make much 
of a difference (except for the apache test)


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 08:56:28AM -0800, David Lang ([EMAIL PROTECTED]) wrote:
> the ab numbers below do not seem that impressive to me, especially for such 
> stripped down server processes.
...
> client and server are dual opteron 252 with 8G of ram, running debian in 64 
> bit mode

Decrease your hardware setup in 2-4 times, leave only one apache process 
and try to get the same - we are not talking about how to create a
perfect web server, instead we try to focus possible problems in
epoll/kevent event driven logic.

Vanilla (epoll) lighttpd shows 4000-5000 requests per second in my setup (no 
logs).
Default mpm-apache2 with bunch of threads - about 8k req/s.
Default thttpd (disabled logging) - about 2k req/s

Btw, all your tests are network bound, try to decrease 
html page size to get actual event processing speed out of that machines.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread David Lang

the ab numbers below do not seem that impressive to me, especially for such 
stripped down server processes.


here are some numbers from a set of test boxes I've got in my lab. I've been 
useing them to test firewalls, and I've been getting throughput similar to what 
is listed below when going through a proxy that does a full fork for each 
connection, and then makes a new connection to the webserver on the other side! 
the first few sets of numbers are going directly from test client to test 
server, the final set is going though the proxy.


client and server are dual opteron 252 with 8G of ram, running debian in 64 bit 
mode


this is with apache2 MPM as the destination (relativly untuned except for 
tinkering with the child count settings). this should be about as bad as you can 
get for a server


loadtest2:/proc/sys# ab -c 8000 -n 8 http://208.2.188.5:80/4k
This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 208.2.188.5 (be patient)
Completed 8000 requests
Completed 16000 requests
Completed 24000 requests
Completed 32000 requests
Completed 4 requests
Completed 48000 requests
Completed 56000 requests
Completed 64000 requests
Completed 72000 requests
Finished 8 requests


Server Software:Apache/1.3.33
Server Hostname:208.2.188.5
Server Port:80

Document Path:  /4k
Document Length:4352 bytes

Concurrency Level:  8000
Time taken for tests:   10.992838 seconds
Complete requests:  8
Failed requests:0
Write errors:   0
Total transferred:  386192835 bytes
HTML transferred:   362612992 bytes
Requests per second:7277.47 [#/sec] (mean)
Time per request:   1099.284 [ms] (mean)
Time per request:   0.137 [ms] (mean, across all concurrent requests)
Transfer rate:  34307.88 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:8  497 1398.3 719072
Processing:17  236 346.91032995
Waiting:8   91 131.6 651692
Total: 26  734 1435.51879786

Percentage of the requests served within a certain time (ms)
  50%187
  66%288
  75%564
  80%754
  90%   3085
  95%   3163
  98%   4316
  99%   9186
 100%   9786 (longest request)
loadtest2:/proc/sys# ab -c 8000 -n 8 http://208.2.188.5:80/8k
This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 208.2.188.5 (be patient)
Completed 8000 requests
Completed 16000 requests
Completed 24000 requests
Completed 32000 requests
Completed 4 requests
Completed 48000 requests
Completed 56000 requests
Completed 64000 requests
Completed 72000 requests
Finished 8 requests


Server Software:Apache/1.3.33
Server Hostname:208.2.188.5
Server Port:80

Document Path:  /8k
Document Length:8704 bytes

Concurrency Level:  8000
Time taken for tests:   11.355031 seconds
Complete requests:  8
Failed requests:0
Write errors:   0
Total transferred:  736949141 bytes
HTML transferred:   713733802 bytes
Requests per second:7045.34 [#/sec] (mean)
Time per request:   1135.503 [ms] (mean)
Time per request:   0.142 [ms] (mean, across all concurrent requests)
Transfer rate:  63379.48 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:   36  495 1297.1 769056
Processing:81  317 529.51613448
Waiting:   25   89  75.1 761610
Total:124  812 1401.5250   11011

Percentage of the requests served within a certain time (ms)
  50%250
  66%304
  75%497
  80%705
  90%   3171
  95%   3251
  98%   3455
  99%   9160
 100%  11011 (longest request)

for both of these tests the server had it's cpu maxed out (<5% idle)

switching to thttpd instead of apache and I get

loadtest2:/proc/sys# ab -c 8000 -n 8 http://208.2.188.5:81/4k
This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 208.2.188.5 (be patient)
Completed 8000 requests
Completed 16000 requests
Completed 24000 requests
Completed 32000 requests
Completed 4 requests
Completed 48000 requests
Completed 56000 requests
Completed 64000 requests
Completed 72000 requests
Finished 8 requests


Server Software:thttpd/2.23beta1
Server Hostname:208.2.188.5
Server Port:81

Document Path:  /4k
Document Length:4352 bytes

Concurrency Level:

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 04:41:27PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Thursday 01 March 2007 16:32, Eric Dumazet wrote:
> > On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote:
> > > They are there, since ab runs only 50k requests.
> > > If I change it to something noticebly more than 50/80k, ab crashes:
> > > # ab -c8000 -t 600 -n8 http://192.168.0.48/
> > > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
> > > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> > > Copyright 2006 The Apache Software Foundation, http://www.apache.org/
> > >
> > > Benchmarking 192.168.0.48 (be patient)
> > > Segmentation fault
> > >
> > > Are there any other tool suitable for such loads?
> > > I only tested httperf (which is worse, since it uses poll/select) and
> > > 'ab'.
> > >
> > > Btw, host machine runs 100% too, so it is possible that client side is
> > > broken (too).
> >
> > I have similar problems here, ab test just doesnt complete...
> >
> > I am still investigating with strace and tcpdump.
> 
> OK... I found it.
> 
> I had to loop on accept() :
> 
> for (i=0; i if (event[i].data.fd == main_server_s) {
> do {
> err = evtest_callback_main(event[i].data.fd);
> } while (err != -1);
> }
> else
> err = evtest_callback_client(event[i].data.fd);
> }
> 
> Or else we can miss an event forever...

The same here - I would just enable a debug to find it.

# ab -c8000 -n8 http://192.168.0.48/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.48 (be patient)
Completed 8000 requests
Completed 16000 requests
Completed 24000 requests
Completed 32000 requests
Completed 4 requests
Completed 48000 requests
Completed 56000 requests
Completed 64000 requests
Completed 72000 requests
Finished 8 requests


Server Software:Apache/1.3.27
Server Hostname:192.168.0.48
Server Port:80

Document Path:  /
Document Length:3521 bytes

Concurrency Level:  8000
Time taken for tests:   18.250921 seconds
Complete requests:  8
Failed requests:0
Write errors:   0
Total transferred:  315691904 bytes
HTML transferred:   287074172 bytes
Requests per second:4383.34 [#/sec] (mean)
Time per request:   1825.092 [ms] (mean)
Time per request:   0.228 [ms] (mean, across all concurrent
requests)
Transfer rate:  16891.86 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:  137  884 481.19203602
Processing:   567  888 163.6985 997
Waiting:   47  455 238.2439 921
Total:765 1772 566.6   19114556

Percentage of the requests served within a certain time
(ms)
50%   1911
66%   1911
75%   1912
80%   1913
90%   1913
95%   1914
98%   4438
99%   4497
100%   4556 (longest request)
kano:~#


-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 04:32:37PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote:
> > They are there, since ab runs only 50k requests.
> > If I change it to something noticebly more than 50/80k, ab crashes:
> > # ab -c8000 -t 600 -n8 http://192.168.0.48/
> > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
> > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> > Copyright 2006 The Apache Software Foundation, http://www.apache.org/
> >
> > Benchmarking 192.168.0.48 (be patient)
> > Segmentation fault
> >
> > Are there any other tool suitable for such loads?
> > I only tested httperf (which is worse, since it uses poll/select) and
> > 'ab'.
> >
> > Btw, host machine runs 100% too, so it is possible that client side is
> > broken (too).
> 
> I have similar problems here, ab test just doesnt complete...
> 
> I am still investigating with strace and tcpdump.
> 
> In the meantime you could just rewrite it (based on epoll please :) ), since 
> it should be quite easy to do this (reverse of evserver_epoll)

Rewriting 'ab' with pure epoll instead of APR lib is like
dandruff treatment on a guillotine.

I will try to cook up something own - simple client (based on epoll)
tomorrow/weekend, now I need to work for money :)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Eric Dumazet

On Thursday 01 March 2007 16:32, Eric Dumazet wrote:
> On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote:
> > They are there, since ab runs only 50k requests.
> > If I change it to something noticebly more than 50/80k, ab crashes:
> > # ab -c8000 -t 600 -n8 http://192.168.0.48/
> > This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
> > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> > Copyright 2006 The Apache Software Foundation, http://www.apache.org/
> >
> > Benchmarking 192.168.0.48 (be patient)
> > Segmentation fault
> >
> > Are there any other tool suitable for such loads?
> > I only tested httperf (which is worse, since it uses poll/select) and
> > 'ab'.
> >
> > Btw, host machine runs 100% too, so it is possible that client side is
> > broken (too).
>
> I have similar problems here, ab test just doesnt complete...
>
> I am still investigating with strace and tcpdump.

OK... I found it.

I had to loop on accept() :

for (i=0; ihttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 04:09:42PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > > > I can tell you that the problem (at least on my machine) comes from :
> > > > 
> > > > gettimeofday(&tm, NULL);
> > > > 
> > > > in evserver_epoll.c
> > > 
> > > yeah, that's another difference - especially if it's something like 
> > > an Athlon64 and gettimeofday falls back to pm-timer, that could 
> > > explain the performance difference. That's why i repeatedly asked 
> > > Evgeniy to use the /very same/ client function for both the epoll 
> > > and the kevent test and redo the measurements. The numbers are still 
> > > highly suspect - and we are already down from the prior claim of 
> > > kevent being almost twice as fast to a 25% difference.
> > 
> > There is no gettimeofday() in the running code anymore, and it was 
> > placed not in common server processing code btw.
> > 
> > Ingo, do you really think I will send mails with faked benchmarks? :))
> 
> no, i'd not be in this discussion anymore if i thought that. But i do 
> think that your benchmark results are extremely sloppy, that make your 
> conclusions on them essentially useless.
>
> you were hurling quite colorful and strong assertions into this 
> discussion, backed up by these numbers, so you should expect at least 
> some minimal amount of scrutiny of those numbers.

This discussion was about event driven vs. thread driven IO models, and
threadlet only behaves like event driven because in my tests there was
exactly one threadlet rescheduling per severa thousands of clients.

Kevent is just a logical interpolation of performance of event driven
model.

My assumptions were based not on kevent performance, but on the fact,
that event delivery is much faster and simpler than thread handling.

Ugh, I'm starting that stupid talk again - let's just jump to the end -
I agree that in real-life high-performance systems both models must be
used.

Peace? :)

> > > [...] The numbers are still highly suspect - and we are already down 
> > > from the prior claim of kevent being almost twice as fast to a 25% 
> > > difference.
> >
> > Btw, there were never almost twice perfromance increase - epoll in my 
> > tests always showed 4-5 thousands requests per second, kevent - up to 
> > 7 thausands.
> 
> i'm referring to your claim in this mail of yours from 4 days ago for 
> example:
> 
>   http://lkml.org/lkml/2007/2/25/116
> 
>  "But note, that on my athlon64 3500 test machine kevent is about 7900
>   requests per second compared to 4000+ epoll, so expect a challenge."
> 
> no matter how i look at it, but 7900 is 1.9 times 4000 - which is 
> "almost twice".

After your changes epoll increased to 5k.
I can easily reproduce 6300/4300 split, but can not get more than 7k for
kevent (with oprofile/idle=poll at least).

I've completed 800k run:
kevent 4800
epoll 4450

with tons ofoverflows in 'ab':

Write errors:   0
Total transferred:  -1197367296 bytes
HTML transferred:   -1478167296 bytes
Requests per second:4440.67 [#/sec] (mean)
Time per request:   1801.529 [ms] (mean)
Time per request:   0.225 [ms] (mean, across all concurrent
requests)
Transfer rate:  -6490.62 [Kbytes/sec] received

Any other bench?

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Eric Dumazet

On Thursday 01 March 2007 16:23, Evgeniy Polyakov wrote:
> They are there, since ab runs only 50k requests.
> If I change it to something noticebly more than 50/80k, ab crashes:
> # ab -c8000 -t 600 -n8 http://192.168.0.48/
> This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Copyright 2006 The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 192.168.0.48 (be patient)
> Segmentation fault
>
> Are there any other tool suitable for such loads?
> I only tested httperf (which is worse, since it uses poll/select) and
> 'ab'.
>
> Btw, host machine runs 100% too, so it is possible that client side is
> broken (too).

I have similar problems here, ab test just doesnt complete...

I am still investigating with strace and tcpdump.

In the meantime you could just rewrite it (based on epoll please :) ), since 
it should be quite easy to do this (reverse of evserver_epoll)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 03:47:17PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > CPU: AMD64 processors, speed 2210.08 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
> > mask of 0x00 (No unit mask) count 10
> > samples  %symbol name
> > 195750   67.3097  cpu_idle
> > 14111 4.8521  enter_idle
> > 4979  1.7121  IRQ0x51_interrupt
> > 4765  1.6385  tcp_v4_rcv
> 
> the pretty much only meaningful way to measure this is to:
> 
> - start a really long 'ab' testrun. Something like "ab -c 8000 -t 600".
> - let the system get into 'steady state': i.e. CPU load at 100%
> - reset the oprofile counters, then start an oprofile run for 60 
>   seconds.
> - stop the oprofile run.
> - stop the test.
> 
> this way there wont be that many 'cpu_idle' entries in your profiles, 
> and the profiles between the two event delivery mechanisms will be 
> directly comparable.

They are there, since ab runs only 50k requests.
If I change it to something noticebly more than 50/80k, ab crashes:
# ab -c8000 -t 600 -n8 http://192.168.0.48/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.48 (be patient)
Segmentation fault

Are there any other tool suitable for such loads?
I only tested httperf (which is worse, since it uses poll/select) and
'ab'.

Btw, host machine runs 100% too, so it is possible that client side is
broken (too).

> > In that tests I got epoll perf about 4400 req/s, kevent was about 
> > 5300.
> 
> So we are now up to epoll being 83% of kevent's performance - while the 
> noise of numbers seen today alone is around 100% ... Could you update 
> the files two URLs that you posted before, with the code that you used 
> for the above numbers:

And in a couple of moments I resent profile with 6100 r/s, and now
attached with 6300.

>http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c
>http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c

Plus http://tservice.net.ru/~s0mbre/archive/kevent/evserver_common.c
which contains common request handling function

> thanks,
> 
>   Ingo

-- 
Evgeniy Polyakov
CPU: AMD64 processors, speed 2210.08 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 10
samples  %symbol name
168753   65.1189  cpu_idle
12451 4.8046  enter_idle
4814  1.8576  tcp_v4_rcv
3980  1.5358  IRQ0x51_interrupt
3142  1.2124  tcp_ack
2738  1.0565  kmem_cache_free
2346  0.9053  kfree
2341  0.9034  memset_c
1927  0.7436  csum_partial_copy_generic
1723  0.6649  ip_route_input
1650  0.6367  dev_queue_xmit
1452  0.5603  ip_output
1416  0.5464  handle_IRQ_event
1335  0.5152  ip_rcv
1326  0.5117  tcp_rcv_state_process
1069  0.4125  schedule
960   0.3704  __do_softirq
943   0.3639  tcp_sendmsg
915   0.3531  ip_queue_xmit
907   0.3500  tcp_v4_do_rcv
897   0.3461  fget
894   0.3450  system_call
890   0.3434  csum_partial
877   0.3384  tcp_transmit_skb
845   0.3261  netif_receive_skb
822   0.3172  ip_local_deliver
812   0.3133  kmem_cache_alloc
788   0.3041  local_bh_enable
773   0.2983  __alloc_skb
771   0.2975  kfree_skbmem
764   0.2948  __d_lookup
757   0.2921  __tcp_push_pending_frames
734   0.2832  pfifo_fast_enqueue
720   0.2778  copy_user_generic_string
627   0.2419  net_rx_action
603   0.2327  pfifo_fast_dequeue
586   0.2261  ret_from_intr
562   0.2169  __link_path_walk
561   0.2165  sock_wfree
549   0.2118  __fput
547   0.2111  __kfree_skb
543   0.2095  get_unused_fd
534   0.2061  number
527   0.2034  sysret_check
516   0.1991  preempt_schedule
508   0.1960  skb_clone
496   0.1914  tcp_parse_options
487   0.1879  _atomic_dec_and_lock
470   0.1814  tcp_poll
469   0.1810  __ip_route_output_key
466   0.1798  rt_hash_code
464   0.1790  tcp_recvmsg
421   0.1625  dput
420   0.1621  tcp_rcv_established
412   0.1590  __tcp_select_window
407   0.1571  exit_idle
394   0.1520  rb_erase
381   0.1470  sys_close
375   0.1447  __mod_timer
365   0.1408  d_alloc
363   0.1401  mask_and_ack_8259A
335   0.1293  lock_timer_base
315   0.1216  cache_alloc_refill
307   0.1185  ret_from_sys_call
300   0.1158  do_path_lookup
299   0.1154  eth_type_trans
298   0.1150  find_next_zero_bit
294   0.1134  tcp_data_queue
286   0.1104  dentry_iput
285   0.1100  ip_append_data
263   0.1015  thread_return
257   0.0992  __dentry_open
255   0.0984  sock_recvmsg
255   0.0984  tcp_rtt_estimator
252   0.0972  sys_fcntl
250

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar


* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> > > I can tell you that the problem (at least on my machine) comes from :
> > > 
> > > gettimeofday(&tm, NULL);
> > > 
> > > in evserver_epoll.c
> > 
> > yeah, that's another difference - especially if it's something like 
> > an Athlon64 and gettimeofday falls back to pm-timer, that could 
> > explain the performance difference. That's why i repeatedly asked 
> > Evgeniy to use the /very same/ client function for both the epoll 
> > and the kevent test and redo the measurements. The numbers are still 
> > highly suspect - and we are already down from the prior claim of 
> > kevent being almost twice as fast to a 25% difference.
> 
> There is no gettimeofday() in the running code anymore, and it was 
> placed not in common server processing code btw.
> 
> Ingo, do you really think I will send mails with faked benchmarks? :))

no, i'd not be in this discussion anymore if i thought that. But i do 
think that your benchmark results are extremely sloppy, that make your 
conclusions on them essentially useless.

you were hurling quite colorful and strong assertions into this 
discussion, backed up by these numbers, so you should expect at least 
some minimal amount of scrutiny of those numbers.

> > [...] The numbers are still highly suspect - and we are already down 
> > from the prior claim of kevent being almost twice as fast to a 25% 
> > difference.
>
> Btw, there were never almost twice perfromance increase - epoll in my 
> tests always showed 4-5 thousands requests per second, kevent - up to 
> 7 thausands.

i'm referring to your claim in this mail of yours from 4 days ago for 
example:

  http://lkml.org/lkml/2007/2/25/116

 "But note, that on my athlon64 3500 test machine kevent is about 7900
  requests per second compared to 4000+ epoll, so expect a challenge."

no matter how i look at it, but 7900 is 1.9 times 4000 - which is 
"almost twice".

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 05:43:50PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
> On Thu, Mar 01, 2007 at 02:12:50PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
> wrote:
> > On Thursday 01 March 2007 12:47, Evgeniy Polyakov wrote:
> > >
> > > Could you provide at least remote way to find it?
> > >
> > 
> > Sure :)
> > 
> > > I only found the same problem at
> > > http://lkml.org/lkml/2006/10/27/3
> > >
> > > but without any hits to solve the problem.
> > >
> > > I will try CVS oprofile, if it works I will provide details of course.
> > >
> > 
> > # cat CVS/Root
> > CVS/Root::pserver:[EMAIL PROTECTED]:/cvsroot/oprofile
> > 
> > # cvs diff >/tmp/oprofile.diff
> > 
> > Hope it helps
> 
> One of the hunks failed, since it was in CVS already.
> After setting up some mirrors, I've installed all bits needed for
> oprofile.
> Attached kevent and epoll profiles.
> 
> In that tests I got epoll perf about 4400 req/s, kevent was about 5300.

Attached kevent profile with 6100 req/sec.
They all look exactly the same for me - there no kevent or epoll
functions in profiles at all.

-- 
Evgeniy Polyakov
CPU: AMD64 processors, speed 2210.08 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 10
samples  %symbol name
103425   55.0868  cpu_idle
8214  4.3750  enter_idle
4712  2.5097  tcp_v4_rcv
3805  2.0266  IRQ0x51_interrupt
3154  1.6799  tcp_ack
2777  1.4791  kmem_cache_free
2286  1.2176  kfree
2155  1.1478  memset_c
1747  0.9305  csum_partial_copy_generic
1710  0.9108  ip_output
1620  0.8629  dev_queue_xmit
1551  0.8261  handle_IRQ_event
1391  0.7409  schedule
1373  0.7313  tcp_rcv_state_process
1337  0.7121  ip_rcv
1100  0.5859  ip_queue_xmit
965   0.5140  ip_route_input
939   0.5001  tcp_sendmsg
935   0.4980  __do_softirq
923   0.4916  ip_local_deliver
916   0.4879  csum_partial
905   0.4820  system_call
889   0.4735  tcp_transmit_skb
884   0.4708  tcp_v4_do_rcv
812   0.4325  netif_receive_skb
778   0.4144  __d_lookup
760   0.4048  __alloc_skb
747   0.3979  local_bh_enable
737   0.3925  __tcp_push_pending_frames
702   0.3739  kfree_skbmem
698   0.3718  pfifo_fast_enqueue
678   0.3611  kmem_cache_alloc
651   0.3467  fget
640   0.3409  pfifo_fast_dequeue
637   0.3393  net_rx_action
629   0.3350  __link_path_walk
602   0.3206  preempt_schedule
599   0.3190  __fput
594   0.3164  sock_wfree
589   0.3137  copy_user_generic_string
579   0.3084  ret_from_intr
559   0.2977  _atomic_dec_and_lock
552   0.2940  __kfree_skb
549   0.2924  skb_clone
514   0.2738  number
494   0.2631  rt_hash_code
473   0.2519  dput
466   0.2482  tcp_parse_options
446   0.2376  tcp_rcv_established
433   0.2306  tcp_recvmsg
431   0.2296  tcp_poll
417   0.2221  get_unused_fd
417   0.2221  sysret_check
377   0.2008  rb_erase
364   0.1939  __tcp_select_window
363   0.1933  lock_timer_base
347   0.1848  __mod_timer
329   0.1752  ip_append_data
326   0.1736  exit_idle
325   0.1731  ret_from_sys_call
317   0.1688  d_alloc
302   0.1609  do_path_lookup
295   0.1571  __ip_route_output_key
290   0.1545  eth_type_trans
285   0.1518  sys_close
283   0.1507  cache_alloc_refill
282   0.1502  mask_and_ack_8259A
275   0.1465  thread_return
267   0.1422  call_softirq
265   0.1411  tcp_rtt_estimator
260   0.1385  tcp_data_queue
258   0.1374  __dentry_open
258   0.1374  vsnprintf
255   0.1358  dentry_iput
255   0.1358  tcp_current_mss
250   0.1332  sk_stream_mem_schedule
239   0.1273  find_next_zero_bit
233   0.1241  cache_grow
233   0.1241  tcp_send_fin
222   0.1182  try_to_wake_up
219   0.1166  sock_recvmsg
216   0.1150  do_generic_mapping_read
211   0.1124  sys_fcntl
209   0.1113  get_empty_filp
207   0.1103  call_rcu
206   0.1097  strncpy_from_user
195   0.1039  sock_def_readable
190   0.1012  generic_drop_inode
190   0.1012  restore_args
184   0.0980  get_page_from_freelist
182   0.0969  sys_recvfrom
176   0.0937  do_lookup
174   0.0927  common_interrupt
171   0.0911  fget_light
167   0.0889  new_inode
167   0.0889  percpu_counter_mod
166   0.0884  link_path_walk
166   0.0884  skb_checksum
160   0.0852  fput
160   0.0852  release_sock
159   0.0847  memcpy_c
158   0.0842  memcmp
157   0.0836  __skb_checksum_complete
157   0.0836  tcp_init_tso_segs
148   0.0788  half_md4_transform
144   0.0767  tcp_v4_send_check
142   0.0756  del_timer
139   0.0740  current_fs_time
135   0.0719  update_send_head
129   0.0687  do_sys_open
126   0.0671  rb_insert_color
125   0.0666  bictcp_cong_avoid
124   0.0660  __put_unused_fd
123   0.0655  schedu

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 03:16:37PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Eric Dumazet <[EMAIL PROTECTED]> wrote:
> 
> > I can tell you that the problem (at least on my machine) comes from :
> > 
> > gettimeofday(&tm, NULL);
> > 
> > in evserver_epoll.c
> 
> yeah, that's another difference - especially if it's something like an 
> Athlon64 and gettimeofday falls back to pm-timer, that could explain the 
> performance difference. That's why i repeatedly asked Evgeniy to use the 
> /very same/ client function for both the epoll and the kevent test and 
> redo the measurements. The numbers are still highly suspect - and we are 
> already down from the prior claim of kevent being almost twice as fast 
> to a 25% difference.

There is no gettimeofday() in the running code anymore, and it was
placed not in common server processing code btw.

Ingo, do you really think I will send mails with faked benchmarks? :))

Btw, there were never almost twice perfromance increase - epoll in my
tests always showed 4-5 thousands requests per second, kevent - up to 7
thausands.

That starts looking like ghost hunting, Ingo, you already said that
you do not see any need in kevent, have you changed your opinion on
that?

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar


* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> CPU: AMD64 processors, speed 2210.08 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
> mask of 0x00 (No unit mask) count 10
> samples  %symbol name
> 195750   67.3097  cpu_idle
> 14111 4.8521  enter_idle
> 4979  1.7121  IRQ0x51_interrupt
> 4765  1.6385  tcp_v4_rcv

the pretty much only meaningful way to measure this is to:

- start a really long 'ab' testrun. Something like "ab -c 8000 -t 600".
- let the system get into 'steady state': i.e. CPU load at 100%
- reset the oprofile counters, then start an oprofile run for 60 
  seconds.
- stop the oprofile run.
- stop the test.

this way there wont be that many 'cpu_idle' entries in your profiles, 
and the profiles between the two event delivery mechanisms will be 
directly comparable.

> In that tests I got epoll perf about 4400 req/s, kevent was about 
> 5300.

So we are now up to epoll being 83% of kevent's performance - while the 
noise of numbers seen today alone is around 100% ... Could you update 
the files two URLs that you posted before, with the code that you used 
for the above numbers:

   http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c
   http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c

thanks,

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 02:12:50PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Thursday 01 March 2007 12:47, Evgeniy Polyakov wrote:
> >
> > Could you provide at least remote way to find it?
> >
> 
> Sure :)
> 
> > I only found the same problem at
> > http://lkml.org/lkml/2006/10/27/3
> >
> > but without any hits to solve the problem.
> >
> > I will try CVS oprofile, if it works I will provide details of course.
> >
> 
> # cat CVS/Root
> CVS/Root::pserver:[EMAIL PROTECTED]:/cvsroot/oprofile
> 
> # cvs diff >/tmp/oprofile.diff
> 
> Hope it helps

One of the hunks failed, since it was in CVS already.
After setting up some mirrors, I've installed all bits needed for
oprofile.
Attached kevent and epoll profiles.

In that tests I got epoll perf about 4400 req/s, kevent was about 5300.

epoll does not contain gettimeofday() call.

-- 
Evgeniy Polyakov
CPU: AMD64 processors, speed 2210.08 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 10
samples  %symbol name
195750   67.3097  cpu_idle
14111 4.8521  enter_idle
4979  1.7121  IRQ0x51_interrupt
4765  1.6385  tcp_v4_rcv
3316  1.1402  tcp_ack
3138  1.0790  kmem_cache_free
2619  0.9006  kfree
2323  0.7988  memset_c
1747  0.6007  schedule
1682  0.5784  csum_partial_copy_generic
1646  0.5660  ip_output
1563  0.5374  tcp_rcv_state_process
1506  0.5178  dev_queue_xmit
1412  0.4855  handle_IRQ_event
1266  0.4353  ip_rcv
1026  0.3528  ip_queue_xmit
1004  0.3452  __do_softirq
1001  0.3442  ip_local_deliver
906   0.3115  csum_partial
902   0.3102  ip_route_input
889   0.3057  __d_lookup
880   0.3026  kmem_cache_alloc
847   0.2912  tcp_v4_do_rcv
841   0.2892  netif_receive_skb
830   0.2854  tcp_sendmsg
819   0.2816  system_call
788   0.2710  kfree_skbmem
780   0.2682  tcp_transmit_skb
742   0.2551  preempt_schedule
731   0.2514  __tcp_push_pending_frames
699   0.2404  __link_path_walk
687   0.2362  pfifo_fast_dequeue
672   0.2311  local_bh_enable
657   0.2259  __alloc_skb
650   0.2235  net_rx_action
627   0.2156  pfifo_fast_enqueue
583   0.2005  sock_wfree
571   0.1963  get_unused_fd
547   0.1881  tcp_parse_options
546   0.1877  copy_user_generic_string
533   0.1833  _atomic_dec_and_lock
529   0.1819  number
524   0.1802  ret_from_intr
509   0.1750  skb_clone
507   0.1743  fget
507   0.1743  tcp_rcv_established
498   0.1712  __kfree_skb
492   0.1692  tcp_poll
481   0.1654  rt_hash_code
471   0.1620  dput
454   0.1561  sock_def_readable
422   0.1451  mask_and_ack_8259A
421   0.1448  sysret_check
419   0.1441  __fput
413   0.1420  exit_idle
396   0.1362  ip_append_data
374   0.1286  sock_poll
371   0.1276  tcp_data_queue
359   0.1234  __tcp_select_window
356   0.1224  tcp_recvmsg
348   0.1197  lock_timer_base
340   0.1169  cache_alloc_refill
338   0.1162  thread_return
319   0.1097  sys_close
318   0.1093  do_path_lookup
318   0.1093  ret_from_sys_call
311   0.1069  vsnprintf
307   0.1056  eth_type_trans
303   0.1042  find_next_zero_bit
302   0.1038  __mod_timer
298   0.1025  d_alloc
296   0.1018  rb_erase
293   0.1007  call_softirq
290   0.0997  __dentry_open
276   0.0949  cache_grow
274   0.0942  __ip_route_output_key
273   0.0939  try_to_wake_up
258   0.0887  dentry_iput
258   0.0887  sk_stream_mem_schedule
257   0.0884  do_lookup
244   0.0839  strncpy_from_user
234   0.0805  do_generic_mapping_read
231   0.0794  memcmp
229   0.0787  tcp_current_mss
228   0.0784  tcp_rtt_estimator
214   0.0736  restore_args
205   0.0705  sys_recvfrom
204   0.0701  fput
203   0.0698  tcp_send_fin
200   0.0688  release_sock
193   0.0664  memcpy_c
191   0.0657  common_interrupt
189   0.0650  fget_light
185   0.0636  skb_checksum
182   0.0626  generic_drop_inode
180   0.0619  do_sys_open
174   0.0598  get_page_from_freelist
168   0.0578  link_path_walk
165   0.0567  schedule_timeout
163   0.0560  del_timer
162   0.0557  rb_insert_color
160   0.0550  percpu_counter_mod
159   0.0547  __up_read
155   0.0533  expand_files
154   0.0530  sys_fcntl
150   0.0516  tcp_v4_send_check
146   0.0502  fd_install
145   0.0499  bictcp_cong_avoid
143   0.0492  call_rcu
141   0.0485  __down_read
141   0.0485  sock_close
140   0.0481  copy_page_c
138   0.0475  __skb_checksum_complete
138   0.0475  lookup_mnt
137   0.0471  getname
132   0.0454  generic_permission
131   0.0450  find_get_page
130   0.0447  __do_page_cache_readahead
130   0.0447  update_send_head
127   0.0437  get_empty_filp
126   0.0433  __path_lookup_intent_open
124   0.0

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar


* Eric Dumazet <[EMAIL PROTECTED]> wrote:

> On my machines (again ...), ab is the slow thing... not the 'server'

Evgeniy said that both in the epoll and the kevent case the server side 
CPU was 98%-100% busy - so inefficiencies on the client side do not 
matter that much - the server is saturated. It's that "kevent is 25% 
faster than epoll" claim that i'm probing mainly.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Eric Dumazet

On Thursday 01 March 2007 15:16, Ingo Molnar wrote:
> * Eric Dumazet <[EMAIL PROTECTED]> wrote:
> > I can tell you that the problem (at least on my machine) comes from :
> >
> > gettimeofday(&tm, NULL);
> >
> > in evserver_epoll.c
>
> yeah, that's another difference - especially if it's something like an
> Athlon64 and gettimeofday falls back to pm-timer, that could explain the
> performance difference. That's why i repeatedly asked Evgeniy to use the
> /very same/ client function for both the epoll and the kevent test and
> redo the measurements. The numbers are still highly suspect - and we are
> already down from the prior claim of kevent being almost twice as fast
> to a 25% difference.

Also, ab is quite lame... Maybe we could use a epoll based 'stresser' 

On my machines (again ...), ab is the slow thing... not the 'server'

Some small differences in behavior could have a big impact on ab (and you 
could think there is a problem on the remote side)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 02:32:42PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > [...] that is why number for kevent is higher - it uses edge-triggered 
> > handler (which you asked to remove from epoll), [...]
> 
> no - i did not 'ask' it to be removed from epoll, i only pointed out 
> that with edge-triggered the results were highly unreliable here and 
> that with level-triggered it worked better. Just to make sure: if you 
> put back edge-triggered into evserver_epoll.c, do you get the same 
> numbers, and is CPU utilization still the same 98-100%?

No.
_Now_ it is about 1500-2000 req/sec with 10-20% CPU utilization. 
Number of 'Total transferred' and 'HTML transferred' does
not equal to 8 multiplied by size of the page.

That are strange tests actually - I managed to get 9000 requests per
second from epoll server (only once!) and 8900 from kevent (two times
only), sometimes they both drop down to 2300-2700 req/s.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar

* Eric Dumazet <[EMAIL PROTECTED]> wrote:

> I can tell you that the problem (at least on my machine) comes from :
> 
> gettimeofday(&tm, NULL);
> 
> in evserver_epoll.c

yeah, that's another difference - especially if it's something like an 
Athlon64 and gettimeofday falls back to pm-timer, that could explain the 
performance difference. That's why i repeatedly asked Evgeniy to use the 
/very same/ client function for both the epoll and the kevent test and 
redo the measurements. The numbers are still highly suspect - and we are 
already down from the prior claim of kevent being almost twice as fast 
to a 25% difference.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Eric Dumazet

On Thursday 01 March 2007 14:30, Evgeniy Polyakov wrote:
> On Thu, Mar 01, 2007 at 02:11:18PM +0100, Ingo Molnar ([EMAIL PROTECTED]) 
> wrote:
> > ok?
>
> I undesrtood you couple of mails ago.
> No problem, I can put processing into the same function called from
> different servers :)
>
> > Btw., am i correct that in this particular 'ab' test, the 'immediately'
> > flag is always zero, i.e. kweb_kevent_remove() is always called?
>
> Yes.
>
> > Ingo

I can tell you that the problem (at least on my machine) comes from :

gettimeofday(&tm, NULL);

in evserver_epoll.c


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> [...] that is why number for kevent is higher - it uses edge-triggered 
> handler (which you asked to remove from epoll), [...]

no - i did not 'ask' it to be removed from epoll, i only pointed out 
that with edge-triggered the results were highly unreliable here and 
that with level-triggered it worked better. Just to make sure: if you 
put back edge-triggered into evserver_epoll.c, do you get the same 
numbers, and is CPU utilization still the same 98-100%?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 02:11:18PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> ok?

I undesrtood you couple of mails ago.
No problem, I can put processing into the same function called from
different servers :)

> Btw., am i correct that in this particular 'ab' test, the 'immediately' 
> flag is always zero, i.e. kweb_kevent_remove() is always called?

Yes.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 01:34:23PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > Document Length:3521 bytes
> 
> > Concurrency Level:  8000
> > Time taken for tests:   16.686737 seconds
> > Complete requests:  8
> > Failed requests:0
> > Write errors:   0
> > Total transferred:  30976 bytes
> > HTML transferred:   28168 bytes
> > Requests per second:4794.23 [#/sec] (mean)
> 
> > Concurrency Level:  8000
> > Time taken for tests:   12.366775 seconds
> > Complete requests:  8
> > Failed requests:0
> > Write errors:   0
> > Total transferred:  317047104 bytes
> > HTML transferred:   288306522 bytes
> > Requests per second:6468.95 [#/sec] (mean)
> 
> i'm wondering - how can the 'Total transferred' and 'HTML transferred' 
> numbers be different?
>
> Since document length is 3521, and the number of requests is 8, the 
> correct 'HTML transferred' is 28168 - which is the epoll result. The 
> kevent result shows more bytes transferred, which suggests that the 
> kevent loop is probably incorrect somewhere.
> 
> this might be some benign thing, but the /first/ thing you /have to/ do 
> before claiming that 'kevent is 25% faster than epoll' is to make sure 
> the results are totally reliable.

Kevent sent additional 525 pages ((311792800-30976)/3872) - that is why 
number for kevent is higher - it uses edge-triggered handler (which you 
asked to remove from epoll), which can produce false-positives, for 
absolute result in that case ret_data must be checked where poll flags 
were stored (before). 'ab' does not count additional data as new requests 
and does not count them in 'requests per second'.
Even if it could do so, additional 500 requests can not provide 35%
higher rate.

For example, lighttpd results are the same for kevent and epoll and
'Total transferred' and 'HTML transferred' numbers change between runs both 
for epoll and kevent.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> > i dont care whether they are separate or not - but you have not 
> > replied to the request that there be a handle_web_request() function 
> > in /both/ files, which is precisely the same function. I didnt ask 
> > you to merge the two files - i only asked for the two web handling 
> > functions to be one and the same function.
> 
> They are not the same in general - if kevent is ready immediately, there 
> will not be its removing from kevent tree, but current kevent server has 
> it always not-immediately for lighttpd tests - so functions are the same:
> open()
> sendfile()
> cork_off
> close(fd)
> close(s)
> remove_event_from_the_kernel
> 
> with the same parameters.

you /STILL/ dont understand. I'm only talking about evserver_epoll.c and 
evserver_kevent.c. Not about lighttpd. Not about historic reasons. I 
simply suggested a common-sense change:

| | Would it be so hard to introduce a single handle_web_request() 
| | function that is exactly the same in the two tests? All the queueing 
| | details (which are of course different in the epoll and the kevent 
| | case) should be in the client function, which calls 
| | handle_web_request().

i.e. put remove_event_from_the_kernel() (kweb_kevent_remove()) and 
evtest_remove()) into a SEPARATE client function, which calls the 
/common/ handle_web_request(sock) function. You can do the 
immediate-removal in that separate, kevent-specific client function - 
but the socket function, handle_web_request(sock) should be /perfectly 
identical/ in the two files.

I.e.:

static inline int handle_web_request(int s)
{
int err, fd, on = 0;
off_t offset = 0;
int count = 40960;
char path[] = "/tmp/index.html";
char buf[4096];

err = recv(s, buf, sizeof(buf), 0);
if (err <= 0)
return err;

fd = open(path, O_RDONLY);
if (fd == -1)
return fd;

err = sendfile(s, fd, &offset, count);
if (err < 0) {
ulog_err("Failed send %d bytes: fd=%d.\n", count, s);
close(fd);
return err;
}

setsockopt(s, SOL_TCP, TCP_CORK, &on, sizeof(on));
close(fd);
close(s); /* No keepalive */

return 0;
}

And in evserver_epoll.c do this:

static int evtest_callback_client(int s)
{
int err = handle_web_request(s);
if (err)
evtest_remove(s);
return err;
}

and in evserver_kevent.c do this:

static int kweb_callback_client(struct ukevent *e, int im)
{
int err = handle_web_request(e->id.raw[0]);
if (err || !im)
kweb_kevent_remove(e);
return err;
}

ok?

Btw., am i correct that in this particular 'ab' test, the 'immediately' 
flag is always zero, i.e. kweb_kevent_remove() is always called?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Eric Dumazet

On Thursday 01 March 2007 12:47, Evgeniy Polyakov wrote:
>
> Could you provide at least remote way to find it?
>

Sure :)

> I only found the same problem at
> http://lkml.org/lkml/2006/10/27/3
>
> but without any hits to solve the problem.
>
> I will try CVS oprofile, if it works I will provide details of course.
>

# cat CVS/Root
CVS/Root::pserver:[EMAIL PROTECTED]:/cvsroot/oprofile

# cvs diff >/tmp/oprofile.diff

Hope it helps
Index: libop/op_alloc_counter.c
===
RCS file: /cvsroot/oprofile/oprofile/libop/op_alloc_counter.c,v
retrieving revision 1.8
diff -r1.8 op_alloc_counter.c
14a15,16
> #include 
> #include 
133c135
< 			return 0;
---
> 			continue;
145a148,183
> /* determine which directories are counter directories
>  */
> static int perfcounterdir(const struct dirent * entry)
> {
> 	return (isdigit(entry->d_name[0]));
> }
> 
> 
> /**
>  * @param mask pointer where to place bit mask of unavailable counters
>  *
>  * return >= 0 number of counters that are available
>  *< 0  could not determine number of counters
>  *
>  */
> static int op_get_counter_mask(u32 * mask)
> {
> 	struct dirent **counterlist;
> 	int count, i;
> 	/* assume nothing is available */
> 	u32 available=0;
> 
> 	count = scandir("/dev/oprofile", &counterlist, perfcounterdir,
> 			alphasort);
> 	if (count < 0)
> 		/* unable to determine bit mask */
> 		return -1;
> 	/* convert to bit map (0 where counter exists) */
> 	for (i=0; i 		available |= 1 << atoi(counterlist[i]->d_name);
> 		free(counterlist[i]);
> 	}
> 	*mask=~available;
> 	free(counterlist);
> 	return count;
> }
152a191
> 	u32 unavailable_counters = 0;
154c193,195
< 	nr_counters = op_get_nr_counters(cpu_type);
---
> 	nr_counters = op_get_counter_mask(&unavailable_counters);
> 	if (nr_counters < 0) 
> 		nr_counters = op_get_nr_counters(cpu_type);
162c203,204
< 	if (!allocate_counter(ctr_arc, nr_events, 0, 0, counter_map)) {
---
> 	if (!allocate_counter(ctr_arc, nr_events, 0, unavailable_counters,
> 			  counter_map)) {

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 01:43:36PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > I separated epoll and kevent servers, since originally kevent server 
> > included additional kevent features, but then new ones were added and 
> > I moved slowly to the similar to epoll case.
> 
> i dont care whether they are separate or not - but you have not replied 
> to the request that there be a handle_web_request() function in /both/ 
> files, which is precisely the same function. I didnt ask you to merge 
> the two files - i only asked for the two web handling functions to be 
> one and the same function.

They are not the same in general - if kevent is ready immediately, there 
will not be its removing from kevent tree, but current kevent server has 
it always not-immediately for lighttpd tests - so functions are the same:
open()
sendfile()
cork_off
close(fd)
close(s)
remove_event_from_the_kernel

with the same parameters.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> I separated epoll and kevent servers, since originally kevent server 
> included additional kevent features, but then new ones were added and 
> I moved slowly to the similar to epoll case.

i dont care whether they are separate or not - but you have not replied 
to the request that there be a handle_web_request() function in /both/ 
files, which is precisely the same function. I didnt ask you to merge 
the two files - i only asked for the two web handling functions to be 
one and the same function.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> Document Length:3521 bytes

> Concurrency Level:  8000
> Time taken for tests:   16.686737 seconds
> Complete requests:  8
> Failed requests:0
> Write errors:   0
> Total transferred:  30976 bytes
> HTML transferred:   28168 bytes
> Requests per second:4794.23 [#/sec] (mean)

> Concurrency Level:  8000
> Time taken for tests:   12.366775 seconds
> Complete requests:  8
> Failed requests:0
> Write errors:   0
> Total transferred:  317047104 bytes
> HTML transferred:   288306522 bytes
> Requests per second:6468.95 [#/sec] (mean)

i'm wondering - how can the 'Total transferred' and 'HTML transferred' 
numbers be different?

Since document length is 3521, and the number of requests is 8, the 
correct 'HTML transferred' is 28168 - which is the epoll result. The 
kevent result shows more bytes transferred, which suggests that the 
kevent loop is probably incorrect somewhere.

this might be some benign thing, but the /first/ thing you /have to/ do 
before claiming that 'kevent is 25% faster than epoll' is to make sure 
the results are totally reliable.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 12:28:00PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> I used the CVS version of oprofile plus a patch you can find in the mailing 
> list archives. Dont remember exactly, since I hit this some months ago

Ugh, I started - but CVS compilation requires about 40mb of additional
libs (according to debian testing dependencies on my very light
installation), so with my miserable 1-1.6 kb/sec do not expect it today :)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 12:47:35PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > > I also changed client socket to nonblocking mode with the same result 
> > > in epoll server. If you will find it broken, please send me corrected 
> > > to test too.
> > 
> > this line in evserver_kevent.c looks a bit fishy:
> 
> this one in evserver_kevent.c looks problematic too:
> 
> shutdown(s, SHUT_RDWR);
> close(s);
> 
> as evserver_epoll.c only does:
> 
> close(s);
> 
> again, there might be TCP control flow differences due to this. [ Or the 
> removal of this shutdown() call might be a small speedup for the kevent 
> case ;) ]

:)

> Also, the order of fd and socket close() is different in the two cases. 
> It shouldnt make any difference - but that too just makes the results 
> harder to trust. Would it be so hard to introduce a single 
> handle_web_request() function that is exactly the same in the two tests? 
> All the queueing details (which are of course different in the epoll and 
> the kevent case) should be in the client function, which calls 
> handle_web_request().

I've removed shutdown - things are the same.

Sometimes kevent performance drops to lower numbers and its graph of
times needed to handle events has high platoes (with and without
shutdown - it was always), like this:

Percentage of the requests served within a certain time (ms)
50%128
66%486
75%505
80%507
90%732
95%   3087  // something is wrong at this point
98%   9058
99%   9072
100%  15562 (longest request)

it is possible that threre are some other bugs in the server though,
which prevent sockets from being quicly closed and thus its processing
time increases - I do not know for sure the root of that event.

I separated epoll and kevent servers, since originally kevent server 
included additional kevent features, but then new ones were added 
and I moved slowly to the similar to epoll case.

Current version of the server was a pre-test one for lighttpd patches,
so essentially it should be like epoll except minor details.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 12:41:37PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > I also changed client socket to nonblocking mode with the same result 
> > in epoll server. If you will find it broken, please send me corrected 
> > to test too.
> 
> this line in evserver_kevent.c looks a bit fishy:
> 
> err = recv(s, buf, 100, 0);
> 
> because on the evserver_epoll.c side the following is done:
> 
> err = recv(s, buf, 4096, 0);
> 
> now, for 'ab', the request size is 76 bytes, so it should fit fine 
> functionality-wise. But, the TCP stack might decide differently of 
> whether to return with a partial packet depending on how much data is 
> requested. I dont know whether it actually makes a difference in the TCP 
> flow decisions, and whether it makes a performance difference in your 
> test, but safest would be to use 4096 in both cases.

Well, that would be quite strange - as far as I known linux network
stack (for which kevent was originally created to support network AIO),
there should not be any difference.

Anyway, I've reran the test with the same values:

# ab -c8000 -n8 http://192.168.0.48/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.0.48 (be patient)
Completed 8000 requests
Completed 16000 requests
Completed 24000 requests
Completed 32000 requests
Completed 4 requests
Completed 48000 requests
Completed 56000 requests
Completed 64000 requests
Completed 72000 requests
Finished 8 requests


Server Software:Apache/1.3.27
Server Hostname:192.168.0.48
Server Port:80

Document Path:  /
Document Length:3521 bytes

Concurrency Level:  8000
Time taken for tests:   18.398381 seconds
Complete requests:  8
Failed requests:0
Write errors:   0
Total transferred:  338738048 bytes
HTML transferred:   308031164 bytes
Requests per second:4348.21 [#/sec] (mean)
Time per request:   1839.838 [ms] (mean)
Time per request:   0.230 [ms] (mean, across all concurrent
requests)
Transfer rate:  17979.73 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:  148  795 196.98083599
Processing:   824  882  39.7878 986
Waiting:   59  426 212.6423 914
Total:   1073 1678 200.8   16734579

Percentage of the requests served within a certain time (ms)
50%   1673
66%   1674
75%   1678
80%   1686
90%   1852
95%   1861
98%   1864
99%   1865
100%   4579 (longest request)

Essentially the same result (in limits of some inaccuracy).

> in general, please make sure the exact same system calls are done in the 
> client function. (except of course for the event queueing syscalls 
> themselves)

Yes, that should be done of course.
I even have a plan to create the same binary for both, but have also in
plans to turn some kevent optimization (mainly readiness-on-submit, when
requested event (secv/send/anything) is ready immediately - kevent
supports to return that event in the submission syscall without
additional overhead by reading it from ring or queue).

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > I also changed client socket to nonblocking mode with the same result 
> > in epoll server. If you will find it broken, please send me corrected 
> > to test too.
> 
> this line in evserver_kevent.c looks a bit fishy:

this one in evserver_kevent.c looks problematic too:

shutdown(s, SHUT_RDWR);
close(s);

as evserver_epoll.c only does:

close(s);

again, there might be TCP control flow differences due to this. [ Or the 
removal of this shutdown() call might be a small speedup for the kevent 
case ;) ]

Also, the order of fd and socket close() is different in the two cases. 
It shouldnt make any difference - but that too just makes the results 
harder to trust. Would it be so hard to introduce a single 
handle_web_request() function that is exactly the same in the two tests? 
All the queueing details (which are of course different in the epoll and 
the kevent case) should be in the client function, which calls 
handle_web_request().

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 12:28:00PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Thursday 01 March 2007 12:20, Evgeniy Polyakov wrote:
> > On Thu, Mar 01, 2007 at 12:14:44PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
> wrote:
> > > On Thursday 01 March 2007 11:59, Evgeniy Polyakov wrote:
> > > > Yes, it is about 98-100% in both cases.
> > > > I've just re-run tests on my amd64 test machine without debug options:
> > > >
> > > > epoll   4794.23
> > > > kevent  6468.95
> > >
> > > It would be valuable if you could post oprofile results
> > > (CPU_CLK_UNHALTED) for both tests.
> >
> > I can't - oprofile does not work on this x86_64 machine:
> >
> 
> Yes, this is a known problem, but you can make it works, as I did.
> 
> Please :)

I can not resist :)

> I used the CVS version of oprofile plus a patch you can find in the mailing 
> list archives. Dont remember exactly, since I hit this some months ago

Could you provide at least remote way to find it?

I only found the same problem at 
http://lkml.org/lkml/2006/10/27/3

but without any hits to solve the problem.

I will try CVS oprofile, if it works I will provide details of course.

My tree is based on rc1 and has this latest commit:
commit b5bf28cde894b3bb3bd25c13a7647020562f9ea0
Author: Linus Torvalds <[EMAIL PROTECTED]>
Date:   Wed Feb 21 11:21:44 2007 -0800

There are no commits after that data with word 'oprofile' in
git-whatchanged at least.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> I also changed client socket to nonblocking mode with the same result 
> in epoll server. If you will find it broken, please send me corrected 
> to test too.

this line in evserver_kevent.c looks a bit fishy:

err = recv(s, buf, 100, 0);

because on the evserver_epoll.c side the following is done:

err = recv(s, buf, 4096, 0);

now, for 'ab', the request size is 76 bytes, so it should fit fine 
functionality-wise. But, the TCP stack might decide differently of 
whether to return with a partial packet depending on how much data is 
requested. I dont know whether it actually makes a difference in the TCP 
flow decisions, and whether it makes a performance difference in your 
test, but safest would be to use 4096 in both cases.

in general, please make sure the exact same system calls are done in the 
client function. (except of course for the event queueing syscalls 
themselves)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 12:27:00PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > I've uploaded them to:
> > 
> > http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c
> > http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c
> 
> thanks.
> 
> > I also changed client socket to nonblocking mode with the same result 
> > in epoll server. [...]
> 
> what does this mean exactly? Did you change this line in 
> evserver_epoll.c:
> 
> //fcntl(cs, F_SETFL, O_NONBLOCK);
> 
> to:
> 
> fcntl(cs, F_SETFL, O_NONBLOCK);
> 
> and the result was the same?

Yep.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar


* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> I've uploaded them to:
> 
> http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c
> http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c

thanks.

> I also changed client socket to nonblocking mode with the same result 
> in epoll server. [...]

what does this mean exactly? Did you change this line in 
evserver_epoll.c:

//fcntl(cs, F_SETFL, O_NONBLOCK);

to:

fcntl(cs, F_SETFL, O_NONBLOCK);

and the result was the same?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Eric Dumazet

On Thursday 01 March 2007 12:20, Evgeniy Polyakov wrote:
> On Thu, Mar 01, 2007 at 12:14:44PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> > On Thursday 01 March 2007 11:59, Evgeniy Polyakov wrote:
> > > Yes, it is about 98-100% in both cases.
> > > I've just re-run tests on my amd64 test machine without debug options:
> > >
> > > epoll 4794.23
> > > kevent6468.95
> >
> > It would be valuable if you could post oprofile results
> > (CPU_CLK_UNHALTED) for both tests.
>
> I can't - oprofile does not work on this x86_64 machine:
>

Yes, this is a known problem, but you can make it works, as I did.

Please :)

I used the CVS version of oprofile plus a patch you can find in the mailing 
list archives. Dont remember exactly, since I hit this some months ago
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 12:14:44PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Thursday 01 March 2007 11:59, Evgeniy Polyakov wrote:
> 
> > Yes, it is about 98-100% in both cases.
> > I've just re-run tests on my amd64 test machine without debug options:
> >
> > epoll   4794.23
> > kevent  6468.95
> >
> 
> It would be valuable if you could post oprofile results (CPU_CLK_UNHALTED) 
> for 
> both tests.

I can't - oprofile does not work on this x86_64 machine:

#opcontrol --setup --vmlinux=/home/s0mbre/aWork/git/linux-2.6.kevent/vmlinux
# opcontrol --start
Using default event: CPU_CLK_UNHALTED:10:0:1:1
/usr/bin/opcontrol: line 994: /dev/oprofile/0/enabled: No such file or
directory
/usr/bin/opcontrol: line 994: /dev/oprofile/0/event: No such file or
directory
/usr/bin/opcontrol: line 994: /dev/oprofile/0/count: No such file or
directory
/usr/bin/opcontrol: line 994: /dev/oprofile/0/kernel: No such file or
directory
/usr/bin/opcontrol: line 994: /dev/oprofile/0/user: No such file or
directory
/usr/bin/opcontrol: line 994: /dev/oprofile/0/unit_mask: No such file or
directory

# ls -l /dev/oprofile/
total 0
drwxr-xr-x 1 root root 0 2007-03-01 09:41 1
drwxr-xr-x 1 root root 0 2007-03-01 09:41 2
drwxr-xr-x 1 root root 0 2007-03-01 09:41 3
-rw-r--r-- 1 root root 0 2007-03-01 09:41 backtrace_depth
-rw-r--r-- 1 root root 0 2007-03-01 09:41 buffer
-rw-r--r-- 1 root root 0 2007-03-01 09:41 buffer_size
-rw-r--r-- 1 root root 0 2007-03-01 09:41 buffer_watershed
-rw-r--r-- 1 root root 0 2007-03-01 09:41 cpu_buffer_size
-rw-r--r-- 1 root root 0 2007-03-01 09:41 cpu_type
-rw-rw-rw- 1 root root 0 2007-03-01 09:41 dump
-rw-r--r-- 1 root root 0 2007-03-01 09:41 enable
-rw-r--r-- 1 root root 0 2007-03-01 09:41 pointer_size
drwxr-xr-x 1 root root 0 2007-03-01 09:41 stats

> Thank you

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 11:11:02AM +0100, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
> > > > > 10% gain in speed is NOT worth major complexity increase.
> > > > 
> > > > Should I create a patch to remove rb-tree implementation?
> > > 
> > > If you can replace them with something simpler, and no worse than 10%
> > > slower in worst case, then go ahead. (We actually tried to do that at
> > > some point, only to realize that efence stresses vm subsystem in very
> > > unexpected/unfriendly way).
> > 
> > Agh, only 10% in the worst case.
> > I think you can not even imagine what tricks network uses to get at
> > least aditional 1% out of the box.
> 
> Yep? Feel free to rewrite networking to assembly on Eugenix. That
> should get you 1% improvement. If you reserve few registers to be only
> used by kernel (not allowed by userspace), you can speedup networking
> 5%, too. Ouch and you could turn off MMU, that is sure way to get few
> more percent improvement in your networking case.

It is not _my_ networking, but taht one you use everyday in every Linux
box. Notice which tricks are used to remove single byte from sk_buff.
It is called optimization, and if it does us a single plus it must be
implemented. Not all people have magical fear of new things.

> > Using such logic you can just abandon any further development, since it
> > work as is right now.
> 
> Stop trying to pervert my logic.

Ugh? :)
I just say in simple words your 'we do not need something if adds 10%,
but is complex to understand'.

> > > > That practice is stupid IMO.
> > > 
> > > Too bad. Now you can start Linux fork called Eugenix.
> > > 
> > > (But really, Linux is not "maximum performance at any cost". Linux is
> > > "how fast can we get that while keeping it maintainable?").
> > 
> > Should I read it like: we do not understand what it is and thus we do
> > not want it?
> 
> Actually, yes, that's a concern. If your code is so crappy that we
> can't understand it, guess what, it is not going to be merged. Notice
> that someone will have to maintain your code if you get hit by bus.
> 
> If your code is so complex that it is almost impossible to use from
> userspace, that is good enough reason not to be merged. "But it is 3%
> faster if..." is not a good-enough argument.

Is it enough for you?

epoll   4794.23 req/sec
kevent  6468.95 req/sec

And we are not saying about other kevent features like ability to
deliver essentially any event through its queue or shared ring (and a
some of its ideas are being slowly implemented in syslet/threadlet code,
btw).

Even if kevent is as fast as epoll, it allows to work with any kind of
events (signals, timers, aio completion, io events and any other you
like) with one queue/ring, which removes races and does _simplofy_
development, since there is no need to create different models to handle
different events.

> > > That is why, while arguing syslets vs. kevents, you need need to argue
> > > not "kevents are faster because they avoid context switch overhead",
> > > but "kevents are _so much_ faster that it is worth the added
> > > complexity". And Ingo seems to showing you they are not _so much_
> > > faster.
> > 
> > Threadlets behave much worse without event driven model, events can
> > behave worse without backed threads, they are mutually compensating.
> 
> I think Ingo demonstrated unoptimized threadlets to be within 5% to
> the speed of kevent. Demonstrate that kevents are twice faster than
> syslets on reasonable test case, and I guess we'll listen...

That was compared to epoll, not kevent.

But I repeat again - kevent is not only epoll, it can do a lot of ther
things which does improve performance and simplify development - did you
saw terrible hacks in libevent to handle signals without race in polling
loop? It is not needed anymore completely - one event loop, one event
structure, completely unified interface for all operations.
Some kevent features are slowly being implemented in syslet/threadlet
async code too, and it looks like I see where things will end up :), but
likely I do not care about new 'kevent', I just wanted that that was
said half a year ago, when I started its resending again, but Ingo
already said his definitive word :)

>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Evgeniy Polyakov

On Thu, Mar 01, 2007 at 12:00:22PM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> * Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> 
> > I've just re-run tests on my amd64 test machine without debug options:
> > 
> > epoll4794.23
> > kevent   6468.95
> 
> could you please post the two URLs for the exact evserver code used for 
> these measurements? (even if you did so already in the past - best to 
> have them always together with the numbers) Thanks!

I've uploaded them to:

http://tservice.net.ru/~s0mbre/archive/kevent/evserver_epoll.c
http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c

I also changed client socket to nonblocking mode with the same result in
epoll server. If you will find it broken, please send me corrected to
test too.

>   Ingo

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Eric Dumazet

On Thursday 01 March 2007 11:59, Evgeniy Polyakov wrote:

> Yes, it is about 98-100% in both cases.
> I've just re-run tests on my amd64 test machine without debug options:
>
> epoll 4794.23
> kevent6468.95
>

It would be valuable if you could post oprofile results (CPU_CLK_UNHALTED) for 
both tests.

Thank you
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/13] Syslets, "Threadlets", generic AIO support, v3

2007-03-01 Thread Ingo Molnar

* Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> I've just re-run tests on my amd64 test machine without debug options:
> 
> epoll4794.23
> kevent   6468.95

could you please post the two URLs for the exact evserver code used for 
these measurements? (even if you did so already in the past - best to 
have them always together with the numbers) Thanks!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 313 matches

Mail list logo