Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-07 Thread Jeff Garzik

Nate Diller wrote:

Indesiciveness has certainly been an issue here, but I remember akpm
and Ulrich both giving concrete suggestions.  I was particularly
interested in Andrew's request to explain and justify the differences
between kevent and BSD's kqueue interface.  Was there a discussion
that I missed?  I am very interested to see your work on this
mechanism merged, because you've clearly emphasized performance and
shown impressive results.  But it seems like we lose out on a lot by
throwing out all the applications that already use kqueue.



kqueue looks pretty nice, the filter/note models in particular.  I don't 
see anything about ring buffers though.


I also wonder about the asynchronous event side (send), not just the 
event reception side.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-07 Thread Jeff Garzik

David Miller wrote:

From: Pavel Machek [EMAIL PROTECTED]
Date: Fri, 3 Nov 2006 09:57:12 +0100


Not sure what you are smoking, but there's unsigned long in *bsd
version, lets rewrite it from scratch sounds like very bad idea. What
about fixing that one bit you don't like?


I disagree, it's more like since we have to be structure incompatible
anyways, let's design something superior if we can.


Definitely agreed.

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-05 Thread Pavel Machek
Hi!

On Fri 2006-11-03 12:13:02, Evgeniy Polyakov wrote:
 On Fri, Nov 03, 2006 at 09:57:12AM +0100, Pavel Machek ([EMAIL PROTECTED]) 
 wrote:
   So, kqueue API and structures can not be usd in Linux.
  
  Not sure what you are smoking, but there's unsigned long in *bsd
  version, lets rewrite it from scratch sounds like very bad idea. What
  about fixing that one bit you don't like?
 
 It is not about what I dislike, but about what is broken or not.
 Putting u64 instead of a long or some kind of that _is_ incompatible
 already, so why should we even use it?

Well.. u64 vs unsigned long *is* binary incompatible, but it is
similar enough that it is going to be compatible at source level, or
maybe userland app will need *minor* ifdefs... That's better than two
completely different versions...

 And, btw, what we are talking about? Is it about the whole kevent
 compared to kqueue in kernelspace, or just about what structure is being
 transferred between kernelspace and userspace?
 I'm sure, it was some kind of a joke to 'not rewrite *bsd from scratch
 and use kqueue in Linux kernel as is'.

No, it is probably not possible to take code from BSD kernel and just
port it. But keeping same/similar userland interface would be nice.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-05 Thread Evgeniy Polyakov
On Sun, Nov 05, 2006 at 12:19:33PM +0100, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
 Hi!
 
 On Fri 2006-11-03 12:13:02, Evgeniy Polyakov wrote:
  On Fri, Nov 03, 2006 at 09:57:12AM +0100, Pavel Machek ([EMAIL PROTECTED]) 
  wrote:
So, kqueue API and structures can not be usd in Linux.
   
   Not sure what you are smoking, but there's unsigned long in *bsd
   version, lets rewrite it from scratch sounds like very bad idea. What
   about fixing that one bit you don't like?
  
  It is not about what I dislike, but about what is broken or not.
  Putting u64 instead of a long or some kind of that _is_ incompatible
  already, so why should we even use it?
 
 Well.. u64 vs unsigned long *is* binary incompatible, but it is
 similar enough that it is going to be compatible at source level, or
 maybe userland app will need *minor* ifdefs... That's better than two
 completely different versions...
 
  And, btw, what we are talking about? Is it about the whole kevent
  compared to kqueue in kernelspace, or just about what structure is being
  transferred between kernelspace and userspace?
  I'm sure, it was some kind of a joke to 'not rewrite *bsd from scratch
  and use kqueue in Linux kernel as is'.
 
 No, it is probably not possible to take code from BSD kernel and just
 port it. But keeping same/similar userland interface would be nice.

It is not only probably, but not even unlikely - it is impossible to get
FreeBSD kqueue code and port it - that port will be completely different
system.
It is impossible to have the same event structure, one should create
#if defined kqueue
fill all members of the structure
#else if defined kevent
fill different members name, since Linux does not even have some types
#endif

*BSD kevent (structure transferred between userspace and kernelspace)
struct kevent {
uintptr_t ident; /* identifier for this event */
short filter;/* filter for event */
u_short   flags; /* action flags for kqueue */
u_int fflags;/* filter flag value */
intptr_t  data; /* filter data value */
void  *udata;   /* opaque user data identifier */
};

You must fill all fields differently due to above.
Just an example: Linux kevent has extended ID field which is grouped
into type.event, kqueue has different pointer indent and short filter.

Linux kevent does not have filters, but instead it has generic storages
of events which can be processed in any way origin of the storage wants
(this for example allows to create aio_sendfile() (which is dropped from
patchset currently) which no other system in the wild has).

There are too many differences. It is just different systems.
If both can be described by sentence system which handles events, it
does not mean that they are the same and can use the structures or even
have similar design. 

Kevent is not kqueue in any way (although there are certain
similarities), so they can not share anything.

   
 Pavel
 -- 
 (english) http://www.livejournal.com/~pavelmachek
 (cesky, pictures) 
 http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-04 Thread Evgeniy Polyakov
On Fri, Nov 03, 2006 at 07:49:16PM +0100, Oleg Verych ([EMAIL PROTECTED]) wrote:
  applications can be found on project's homepage.
  There is a link to archive there, where you can find plenty of sources.
 
 But no single makefile. Or what CC and options do not mater really?
 You can easily find in your server's apache logs, my visit of that
 archive in the day of my message (today i just confirmed my assertions):
 browser lynx, host flower.upol.cz.

If you can not compile that sources, than you should not use kevent for
a while. Definitely.
Options are pretty simple: -W -Wall -I$(path_to_kernel_tree)/include

  You likely do not know, but it is a bit risky business to patch all
  existing applications to show that approach is correct, if
  implementation is not completed.
 
 Fortunately to me, `lighthttpd' is real-life *and* in the benchmark
 area also. Just see that site how much there was measured: different OSes,
 special tunning. *That* is i'm talking about. Epoll _wrapper_ there,
 is 3461 byte long, your answer to _me_ 2580. People are bringing you a
 test bed, with all set up ready to use; need less code, go on, comment
 needless out!

So what?
People bring me tons of various stuff, and I prefer to use my own for
tests. If _you_ need it, _you_ can always patch any sources you like.

  You likely do not know, but after I first time announced kevents in
  February I changed interfaces 4 times - and it is just interfaces, not
  including numerous features added/removed by developer's requests.
 
 I think that called open source, linux kernel case.

You missed the point - I'm not going to patch tons of existing
applications when I'm asked to change an interface once per month.

When all requested features are implemented I definitely with patch some
popular web-server to show how kevent is used.

   There were some comments about laking much of such programs, answers were
   was in prev. e-mail, need to update them, something like that.
   Trivial web server sources url, mentioned in benchmark isn't pointed
   in patch advertisement. If it was, should i actually try that new
   *trivial* wheel?
  
  Answer is trivial - there is archive where one can find a source code
  (filenames are posted regulary). Should I create a rpm? For what glibc
  version?
 
 Hmm. Let me answer on that dup with stuff from LKML archive. That
 will reveal, that my guesses were told by The Big Jury to you already:
 
 [^0] Message-ID: [EMAIL PROTECTED]
 [^1] Message-ID: [EMAIL PROTECTED],
  Message-ID: [EMAIL PROTECTED]
 
 more than 10 takes ago.

And? Please provide a link to archive.

   Saying that, i want to give you some short examples, i know.
   *Linux kernel - userspace*:
   o Alexey Kuznetsov  networking - (excellent) iproute set of 
   utilities;
  
  iproute documentation was way too bad when Alexey presented it first 
  time :)
 
 As example, after have read some books on TCP/IP and Ethernet, internal
 help of `ip' was all i needed to know.

:)) i.e. it is ok for you to 'read some books on TCP/IP and Ethernet' to
understand how utility works, and it is not ok to determine how to
compile my sources? Do not compile my sources.

  Btw, show me splice() 'shiny' application? Does lighttpd use it?
  Or move_pages().
 
 You know who proposed that, and you know how many (few) releases ago.

And why lighttpd still do not use it?
You should start to blame authors of the splice() for that.
You will not? Then I can not consider your words in my direction as
serious.

   To make a little hint to you, Evgeniy, why don't you find a little
   animal in the open source zoo to implement little interface to
   proposed kernel subsystem and then show it to The Big Jury (not me),
   we have here? And i can not see, how you've managed to implement
   something like that having almost nothing on the test basket.
   Very *suspicious* ch.
  
  There are always people who do not like something, what can I do with
 
 I didn't think, that my message was offensive. Also i didn't even say,
 that you have not bothered feed your code to scripts/Lindent.

You do not use kevent, why do you care about indent of the userspace
tools?

 []
  I created trivial web servers, which send single static page and use
  various event handling schemes, and I test new subsystem with new tools,
  when tests are completed and all requested features are implemented it
  is time to work on different more complex users.
 
 Please, see [^0],
 
  So let's at least complete what we have right now, so no developer's
  efforts could be wasted writing empty chars in various places.
 
 and [^1].
 
 [ Please do not answer just to answer, cc list is big, no one from   ]
 [ The Big Jury seems to care. (well, Jonathan does, but he wasn't in cc) ]

This thread is just to answer for the sake of answers - there is
completely no sense in it.
You blame me that I did not create some benchmarks you like, but I do not
care about it. I created usefull patch and test is in the 

Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-04 Thread Evgeniy Polyakov
On Fri, Nov 03, 2006 at 07:49:16PM +0100, Oleg Verych ([EMAIL PROTECTED]) wrote:
 [ Please do not answer just to answer, cc list is big, no one from   ]
 [ The Big Jury seems to care. (well, Jonathan does, but he wasn't in cc) ]
 
 Friendly, Oleg.

Just in case some misunderstanding happend: I do not want to insult
anyone who is against kevent, I just do not understand cases, when
people require me to do something to convince them in rude manner.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-03 Thread Pavel Machek
Hi!

  returns, which thread are you referring to?  Nicholas Miell, in The
  Proposed Linux kevent API thread, seems to think that there are no
  advantages over kqueue to justify the incompatibility, an argument you
  made no effort to refute.  I've also read the Kevent wiki at
  linux-net.osdl.org, but it too is lacking in any direct comparisons
  (even theoretical, let alone benchmarks) of the flexibility,
  performance, etc. between the two.
  
  I'm not arguing that you've done a bad design, I'm asking you to brag
  about the things you improved on vs. kqueue.  Your emphasis on
  unifying all the different event types into one interface is really
  cool, fill me in on why that can't be effectively done with the kqueue
  compatability and I also will advocate for kevent inclusion.
 
 kqueue just can not be used as is in Linux (_maybe_ *bsd has different
 types, not those which I found in /usr/include in my FC5 and Debian
 distro). It will not work on x86_64 for example. Some kind of a pointer
 or unsigned long in structures which are transferred between kernelspace
 and userspace is so much questionable, than it is much better even do
 not see there... (if I would not have so political correctness, I would
 describe it in a much different words actually).
 So, kqueue API and structures can not be usd in Linux.

Not sure what you are smoking, but there's unsigned long in *bsd
version, lets rewrite it from scratch sounds like very bad idea. What
about fixing that one bit you don't like?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-03 Thread David Miller
From: Pavel Machek [EMAIL PROTECTED]
Date: Fri, 3 Nov 2006 09:57:12 +0100

 Not sure what you are smoking, but there's unsigned long in *bsd
 version, lets rewrite it from scratch sounds like very bad idea. What
 about fixing that one bit you don't like?

I disagree, it's more like since we have to be structure incompatible
anyways, let's design something superior if we can.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-03 Thread Evgeniy Polyakov
On Fri, Nov 03, 2006 at 09:57:12AM +0100, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
  So, kqueue API and structures can not be usd in Linux.
 
 Not sure what you are smoking, but there's unsigned long in *bsd
 version, lets rewrite it from scratch sounds like very bad idea. What
 about fixing that one bit you don't like?

It is not about what I dislike, but about what is broken or not.
Putting u64 instead of a long or some kind of that _is_ incompatible
already, so why should we even use it?
And, btw, what we are talking about? Is it about the whole kevent
compared to kqueue in kernelspace, or just about what structure is being
transferred between kernelspace and userspace?
I'm sure, it was some kind of a joke to 'not rewrite *bsd from scratch
and use kqueue in Linux kernel as is'.

   Pavel
 -- 
 (english) http://www.livejournal.com/~pavelmachek
 (cesky, pictures) 
 http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-03 Thread Evgeniy Polyakov
On Fri, Nov 03, 2006 at 10:42:04AM +0800, zhou drangon ([EMAIL PROTECTED]) 
wrote:
 As for the VFS system, when we introduce the AIO machinism, we add aio_read,
 aio_write, etc... to file ops, and then we make the read, write op to
 call aio_read,
 aio_write, so that we only remain one implement in kernel.
 Can we do event machinism the same way?
 when kevent is robust enough, can we implement epoll/select/io_submit etc...
 base on kevent ??
 In this way, we can simplified the kernel, and epoll can gain
 improvement from kevent.

There is AIO implementaion on top of kevent, although it was confirmed
that it has a good design, except minor API layering changes, it was
postponed for a while.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-03 Thread Oleg Verych
On Wed, Nov 01, 2006 at 09:57:46PM +0300, Evgeniy Polyakov wrote:
 On Wed, Nov 01, 2006 at 06:20:43PM +, Oleg Verych ([EMAIL PROTECTED]) 
 wrote:
[] 
  Where's real-life application to do configure  make  make install?
 
 Your real life or mine as developer?
 I fortunately do not know anything about your real life, but my real life

To do not further shift conversation in no technical way, think of my
sentence as question *and* as definition.

 applications can be found on project's homepage.
 There is a link to archive there, where you can find plenty of sources.

But no single makefile. Or what CC and options do not mater really?
You can easily find in your server's apache logs, my visit of that
archive in the day of my message (today i just confirmed my assertions):
browser lynx, host flower.upol.cz.

 You likely do not know, but it is a bit risky business to patch all
 existing applications to show that approach is correct, if
 implementation is not completed.

Fortunately to me, `lighthttpd' is real-life *and* in the benchmark
area also. Just see that site how much there was measured: different OSes,
special tunning. *That* is i'm talking about. Epoll _wrapper_ there,
is 3461 byte long, your answer to _me_ 2580. People are bringing you a
test bed, with all set up ready to use; need less code, go on, comment
needless out!

 You likely do not know, but after I first time announced kevents in
 February I changed interfaces 4 times - and it is just interfaces, not
 including numerous features added/removed by developer's requests.

I think that called open source, linux kernel case.

  There were some comments about laking much of such programs, answers were
  was in prev. e-mail, need to update them, something like that.
  Trivial web server sources url, mentioned in benchmark isn't pointed
  in patch advertisement. If it was, should i actually try that new
  *trivial* wheel?
 
 Answer is trivial - there is archive where one can find a source code
 (filenames are posted regulary). Should I create a rpm? For what glibc
 version?

Hmm. Let me answer on that dup with stuff from LKML archive. That
will reveal, that my guesses were told by The Big Jury to you already:

[^0] Message-ID: [EMAIL PROTECTED]
[^1] Message-ID: [EMAIL PROTECTED],
 Message-ID: [EMAIL PROTECTED]

more than 10 takes ago.

  Saying that, i want to give you some short examples, i know.
  *Linux kernel - userspace*:
  o Alexey Kuznetsov  networking - (excellent) iproute set of utilities;
 
 iproute documentation was way too bad when Alexey presented it first 
 time :)

As example, after have read some books on TCP/IP and Ethernet, internal
help of `ip' was all i needed to know.

 Btw, show me splice() 'shiny' application? Does lighttpd use it?
 Or move_pages().

You know who proposed that, and you know how many (few) releases ago.
 
  To make a little hint to you, Evgeniy, why don't you find a little
  animal in the open source zoo to implement little interface to
  proposed kernel subsystem and then show it to The Big Jury (not me),
  we have here? And i can not see, how you've managed to implement
  something like that having almost nothing on the test basket.
  Very *suspicious* ch.
 
 There are always people who do not like something, what can I do with

I didn't think, that my message was offensive. Also i didn't even say,
that you have not bothered feed your code to scripts/Lindent.

[]
 I created trivial web servers, which send single static page and use
 various event handling schemes, and I test new subsystem with new tools,
 when tests are completed and all requested features are implemented it
 is time to work on different more complex users.

Please, see [^0],

 So let's at least complete what we have right now, so no developer's
 efforts could be wasted writing empty chars in various places.

and [^1].

[ Please do not answer just to answer, cc list is big, no one from   ]
[ The Big Jury seems to care. (well, Jonathan does, but he wasn't in cc) ]

Friendly, Oleg.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-02 Thread Nate Diller

On 11/1/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote:

On Wed, Nov 01, 2006 at 06:12:41PM -0800, Nate Diller ([EMAIL PROTECTED]) wrote:
 Indesiciveness has certainly been an issue here, but I remember akpm
 and Ulrich both giving concrete suggestions.  I was particularly
 interested in Andrew's request to explain and justify the differences
 between kevent and BSD's kqueue interface.  Was there a discussion
 that I missed?  I am very interested to see your work on this
 mechanism merged, because you've clearly emphasized performance and
 shown impressive results.  But it seems like we lose out on a lot by
 throwing out all the applications that already use kqueue.

It looks you missed that discussion - freebsd kqueue has fields in the
kevent structure which have diffent sizes in 32 and 64 bit environments.


Are you saying that the *only* reason we choose not to be
source-compatible with BSD is the 32 bit userland on 64 bit arch
problem?  I've followed every thread that gmail 'kqueue' search
returns, which thread are you referring to?  Nicholas Miell, in The
Proposed Linux kevent API thread, seems to think that there are no
advantages over kqueue to justify the incompatibility, an argument you
made no effort to refute.  I've also read the Kevent wiki at
linux-net.osdl.org, but it too is lacking in any direct comparisons
(even theoretical, let alone benchmarks) of the flexibility,
performance, etc. between the two.

I'm not arguing that you've done a bad design, I'm asking you to brag
about the things you improved on vs. kqueue.  Your emphasis on
unifying all the different event types into one interface is really
cool, fill me in on why that can't be effectively done with the kqueue
compatability and I also will advocate for kevent inclusion.

NATE
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-02 Thread zhou drangon

2006/11/2, Eric Dumazet [EMAIL PROTECTED]:

zhou drangon a écrit :
 performance is great, and we are exciting at the result.

 I want to know why there can be so much improvement, can we improve
 epoll too ?

Why did you remove most of CC addresses but lkml ?
Dont do that please...

I seldom reply to the mailing list, Sorry for this.


Good question :)

Hum, I think I can look into epoll and see how it can be improved (if necessary)


I have an other question.
As for the VFS system, when we introduce the AIO machinism, we add aio_read,
aio_write, etc... to file ops, and then we make the read, write op to
call aio_read,
aio_write, so that we only remain one implement in kernel.
Can we do event machinism the same way?
when kevent is robust enough, can we implement epoll/select/io_submit etc...
base on kevent ??
In this way, we can simplified the kernel, and epoll can gain
improvement from kevent.


This is not to say we dont need kevent ! Please Evgeniy continue your work !

Yes! We are expecting for you greate work.

I create an userland event-driven framework for my application.
but I have to use multiple thread to receive event, epoll to wait most event,
and io_getevent to wait disk AIO event, I hope we can get a universal
event machinism
to make the code elegance.


Just to remind you that according to
http://www.xmailserver.org/linux-patches/nio-improve.html David Libenzi had to
wait 18 months before epoll being officialy added into kernel.

At that time, many applications were using epoll, and we were patching our
kernels for that.


I cooked a very simple program (attached in this mail), using pipes and epoll,
and got 250.000 events received per second on an otherwise lightly loaded
machine (dual opteron 246 , 2GHz, 1MB cache per cpu) with 10.000 pipes (20.000
handles)

It could be nice to add support for other event providers in this program
(AF_INET  AF_UNIX sockets for example), and also add support for kevent, so
that we really can compare epoll/kevent without a complex setup.
I should extend the program to also add/remove sources during lifetime, not
only insert at setup time.

# gcc -O2 -o epoll_pipe_bench epoll_pipe_bench.c -lpthread
# ulimit -n 100
# epoll_pipe_bench -n 1
^C after a while...

oprofile results say that ep_poll_callback() and sys_epoll_wait() use 20% of
cpu time.
Even if we gain a two factor in cpu time or cache usage, we wont eliminate
other costs...

oprofile results gave :

Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 5
samples  %symbol name
2015420  11.1309  ep_poll_callback
1867431  10.3136  pipe_writev
1791872   9.8963  sys_epoll_wait
1357297   7.4962  fget_light
1277515   7.0556  pipe_readv
9984475.5143  current_fs_time
8015974.4271  __mark_inode_dirty
7552684.1713  __wake_up
5870653.2423  __write_lock_failed
5829313.2195  system_call
2971321.6410  iov_fault_in_pages_read
2961361.6355  sys_write
2901061.6022  __wake_up_common
2706921.4950  bad_pipe_w
2615161.4443  do_pipe
2572081.4205  tg3_start_xmit_dma_bug
2549171.4079  pipe_poll
2529251.3969  copy_user_generic_c
2342121.2935  generic_pipe_buf_map
2286591.2629  ret_from_sys_call
2125411.1738  sysret_check
1665290.9197  sys_read
1600380.8839  vfs_write
1510910.8345  pipe_ioctl
1363010.7528  file_update_time
1071730.5919  tg3_poll
77846 0.4299  ipt_do_table
75081 0.4147  schedule
73059 0.4035  vfs_read
69787 0.3854  get_task_comm
63923 0.3530  memcpy
60019 0.3315  touch_atime
57490 0.3175  eventpoll_release_file
56152 0.3101  tg3_write_flush_reg32
54468 0.3008  rw_verify_area
47833 0.2642  generic_pipe_buf_unmap
4 0.2639  __switch_to
44106 0.2436  bad_pipe_r
41824 0.2310  proc_nr_files
41319 0.2282  pipe_iov_copy_from_user


Eric



/*
 * How to stress epoll
 *
 * This program uses many pipes and two threads.
 * First we open as many pipes we can. (see ulimit -n)
 * Then we create a worker thread.
 * The worker thread will send bytes to random pipes.
 * The main thread uses epoll to collect ready pipes and read them.
 * Each second, a number of collected bytes is printed on stderr
 *
 * Usage : epoll_bench [-n X]
 */
#include pthread.h
#include stdlib.h
#include errno.h
#include stdio.h
#include string.h
#include sys/epoll.h
#include signal.h
#include unistd.h
#include sys/time.h

int nbpipes = 1024;

struct pipefd {
int fd[2];
} *tab;

int epoll_fd;

static int alloc_pipes()
{
int i;

epoll_fd = epoll_create(nbpipes);
if (epoll_fd == -1) {
perror(epoll_create);
return -1;
}
tab = malloc(sizeof(struct pipefd) * nbpipes);
if (tab ==NULL) {
perror(malloc);
return -1;
}
for (i = 0 ; i  nbpipes ; i++) {
struct epoll_event ev;

[take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Evgeniy Polyakov

Generic event handling mechanism.

Consider for inclusion.

Changes from 'take21' patchset:
 * minor cleanups (different return values, removed unneded variables, 
whitespaces and so on)
 * fixed bug in kevent removal in case when kevent being removed
   is the same as overflow_kevent (spotted by Eric Dumazet)

Changes from 'take20' patchset:
 * new ring buffer implementation
 * removed artificial limit on possible number of kevents
With this release and fixed userspace web server it was possible to 
achive 3960+ req/s with client connection rate of 4000 con/s
over 100 Mbit lan, data IO over network was about 10582.7 KB/s, which
is too close to wire speed if we get into account headers and the like.

Changes from 'take19' patchset:
 * use __init instead of __devinit
 * removed 'default N' from config for user statistic
 * removed kevent_user_fini() since kevent can not be unloaded
 * use KERN_INFO for statistic output

Changes from 'take18' patchset:
 * use __init instead of __devinit
 * removed 'default N' from config for user statistic
 * removed kevent_user_fini() since kevent can not be unloaded
 * use KERN_INFO for statistic output

Changes from 'take17' patchset:
 * Use RB tree instead of hash table. 
At least for a web sever, frequency of addition/deletion of new kevent 
is comparable with number of search access, i.e. most of the time 
events 
are added, accesed only couple of times and then removed, so it 
justifies 
RB tree usage over AVL tree, since the latter does have much slower 
deletion 
time (max O(log(N)) compared to 3 ops), 
although faster search time (1.44*O(log(N)) vs. 2*O(log(N))). 
So for kevents I use RB tree for now and later, when my AVL tree 
implementation 
is ready, it will be possible to compare them.
 * Changed readiness check for socket notifications.

With both above changes it is possible to achieve more than 3380 req/second 
compared to 2200, 
sometimes 2500 req/second for epoll() for trivial web-server and httperf client 
on the same
hardware.
It is possible that above kevent limit is due to maximum allowed kevents in a 
time limit, which is
4096 events.

Changes from 'take16' patchset:
 * misc cleanups (__read_mostly, const ...)
 * created special macro which is used for mmap size (number of pages) 
calculation
 * export kevent_socket_notify(), since it is used in network protocols which 
can be 
built as modules (IPv6 for example)

Changes from 'take15' patchset:
 * converted kevent_timer to high-resolution timers, this forces timer API 
update at
http://linux-net.osdl.org/index.php/Kevent
 * use struct ukevent* instead of void * in syscalls (documentation has been 
updated)
 * added warning in kevent_add_ukevent() if ring has broken index (for testing)

Changes from 'take14' patchset:
 * added kevent_wait()
This syscall waits until either timeout expires or at least one event
becomes ready. It also commits that @num events from @start are processed
by userspace and thus can be be removed or rearmed (depending on it's 
flags).
It can be used for commit events read by userspace through mmap interface.
Example userspace code (evtest.c) can be found on project's homepage.
 * added socket notifications (send/recv/accept)

Changes from 'take13' patchset:
 * do not get lock aroung user data check in __kevent_search()
 * fail early if there were no registered callbacks for given type of kevent
 * trailing whitespace cleanup

Changes from 'take12' patchset:
 * remove non-chardev interface for initialization
 * use pointer to kevent_mring instead of unsigned longs
 * use aligned 64bit type in raw user data (can be used by high-res timer if 
needed)
 * simplified enqueue/dequeue callbacks and kevent initialization
 * use nanoseconds for timeout
 * put number of milliseconds into timer's return data
 * move some definitions into user-visible header
 * removed filenames from comments

Changes from 'take11' patchset:
 * include missing headers into patchset
 * some trivial code cleanups (use goto instead of if/else games and so on)
 * some whitespace cleanups
 * check for ready_callback() callback before main loop which should save us 
some ticks

Changes from 'take10' patchset:
 * removed non-existent prototypes
 * added helper function for kevent_registered_callbacks
 * fixed 80 lines comments issues
 * added shared between userspace and kernelspace header instead of embedd them 
in one
 * core restructuring to remove forward declarations
 * s o m e w h i t e s p a c e c o d y n g s t y l e c l e a n u p
 * use vm_insert_page() instead of remap_pfn_range()

Changes from 'take9' patchset:
 * fixed -nopage method

Changes from 'take8' patchset:
 * fixed mmap release bug
 * use module_init() instead of late_initcall()
 * use better structures for timer notifications

Changes from 'take7' patchset:
 * new mmap interface (not tested, waiting for other changes to be acked)
- 

Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Pavel Machek
Hi!

 Generic event handling mechanism.
 
 Consider for inclusion.
 
 Changes from 'take21' patchset:

We are not interrested in how many times you spammed us, nor we want
to know what was wrong in previous versions. It would be nice to have
short summary of what this is good for, instead.

Pavel
-- 
Thanks, Sharp!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Evgeniy Polyakov
On Wed, Nov 01, 2006 at 02:06:14PM +0100, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
 Hi!
 
  Generic event handling mechanism.
  
  Consider for inclusion.
  
  Changes from 'take21' patchset:
 
 We are not interrested in how many times you spammed us, nor we want
 to know what was wrong in previous versions. It would be nice to have
 short summary of what this is good for, instead.

Let me guess, short explaination in subsequent emails is not enough...
If changelog will be removed, then how people will detect what happend 
after previous release?

Kevent is a generic subsytem which allows to handle event notifications.
It supports both level and edge triggered events. It is similar to
poll/epoll in some cases, but it is more scalable, it is faster and
allows to work with essentially eny kind of events.

Events are provided into kernel through control syscall and can be read
back through mmaped ring or syscall.
Kevent update (i.e. readiness switching) happens directly from internals
of the appropriate state machine of the underlying subsytem (like
network, filesystem, timer or any other).

I will put that text into introduction message.

   Pavel
 -- 
 Thanks, Sharp!

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread James Morris
On Wed, 1 Nov 2006, Pavel Machek wrote:

 Hi!
 
  Generic event handling mechanism.
  
  Consider for inclusion.
  
  Changes from 'take21' patchset:
 
 We are not interrested in how many times you spammed us, nor we want
 to know what was wrong in previous versions. It would be nice to have
 short summary of what this is good for, instead.

I'm interested in knowing which version the patches belong to and what has 
changed (geez, it's rare enough that someone actually bothers to do this 
with an updated patchset, and to complain about it?)



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Pavel Machek
Hi!

   Generic event handling mechanism.
   
   Consider for inclusion.
   
   Changes from 'take21' patchset:
  
  We are not interrested in how many times you spammed us, nor we want
  to know what was wrong in previous versions. It would be nice to have
  short summary of what this is good for, instead.
 
 Let me guess, short explaination in subsequent emails is not
 enough...

Yes.

 Kevent is a generic subsytem which allows to handle event notifications.
 It supports both level and edge triggered events. It is similar to
 poll/epoll in some cases, but it is more scalable, it is faster and
 allows to work with essentially eny kind of events.

Quantifying how much more scalable would be nice, as would be some
example where it is useful. (It makes my webserver twice as fast on
monster 64-cpu box).

 Events are provided into kernel through control syscall and can be read
 back through mmaped ring or syscall.
 Kevent update (i.e. readiness switching) happens directly from internals
 of the appropriate state machine of the underlying subsytem (like
 network, filesystem, timer or any other).
 
 I will put that text into introduction message.

Thanks.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Evgeniy Polyakov
On Wed, Nov 01, 2006 at 05:05:51PM +0100, Pavel Machek ([EMAIL PROTECTED]) 
wrote:
 Hi!

Hi Pavel.

  Kevent is a generic subsytem which allows to handle event notifications.
  It supports both level and edge triggered events. It is similar to
  poll/epoll in some cases, but it is more scalable, it is faster and
  allows to work with essentially eny kind of events.
 
 Quantifying how much more scalable would be nice, as would be some
 example where it is useful. (It makes my webserver twice as fast on
 monster 64-cpu box).

Trivial kevent web-server can handle 3960+ req/sec on Xeon 2.4Ghz with
1Gb RAM, epoll based - 2200-2500 req/sec.
100 Mbit wire is filled almost 100% (10582.7 KB/s of data without
TCP and below headers).
More benchmarks created by me and Johann Borck can be found on project's 
homepage as long as all my sources used in tests.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Oleg Verych

Hallo, Evgeniy Polyakov.

On 2006-11-01, you wrote:
[]
 Quantifying how much more scalable would be nice, as would be some
 example where it is useful. (It makes my webserver twice as fast on
 monster 64-cpu box).

 Trivial kevent web-server can handle 3960+ req/sec on Xeon 2.4Ghz with
[...]

Seriously. I'm seeing that patches also. New, shiny, always ready for
inclusion. But considering kernel (linux in this case) as not thing
for itself, i want to ask following question.

Where's real-life application to do configure  make  make install?

There were some comments about laking much of such programs, answers were
was in prev. e-mail, need to update them, something like that.
Trivial web server sources url, mentioned in benchmark isn't pointed
in patch advertisement. If it was, should i actually try that new
*trivial* wheel?

Saying that, i want to give you some short examples, i know.
*Linux kernel - userspace*:
o Alexey Kuznetsov  networking - (excellent) iproute set of utilities;
o Maxim Krasnyansky tun net driver - vtun daemon application;

*Glibc with mister Drepper* has huge set of tests, please search for
`tst*' files in the sources.

To make a little hint to you, Evgeniy, why don't you find a little
animal in the open source zoo to implement little interface to
proposed kernel subsystem and then show it to The Big Jury (not me),
we have here? And i can not see, how you've managed to implement
something like that having almost nothing on the test basket.
Very *suspicious* ch.

One, that comes in mind is lighthttpd http://www.lighttpd.net/.
It had sub-interface for event systems like select,poll,epoll, when i
checked its sources last time. And it is mature, btw.

Cheers.

[ -*- OT -*-   ]
[ I wouldn't write all this, unless saw your opinion about the ]
[ reportbug (part of the Debian Bug Tracking System) this week.]
[ While i'm nobody here, imho, the first thing about good programmer   ]
[ must be, that he is excellent user.  ]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Evgeniy Polyakov
On Wed, Nov 01, 2006 at 06:20:43PM +, Oleg Verych ([EMAIL PROTECTED]) wrote:
 
 Hallo, Evgeniy Polyakov.

Hello, Oleg.

 On 2006-11-01, you wrote:
 []
  Quantifying how much more scalable would be nice, as would be some
  example where it is useful. (It makes my webserver twice as fast on
  monster 64-cpu box).
 
  Trivial kevent web-server can handle 3960+ req/sec on Xeon 2.4Ghz with
 [...]
 
 Seriously. I'm seeing that patches also. New, shiny, always ready for
 inclusion. But considering kernel (linux in this case) as not thing
 for itself, i want to ask following question.
 
 Where's real-life application to do configure  make  make install?

Your real life or mine as developer?
I fortunately do not know anything about your real life, but my real life
applications can be found on project's homepage.
There is a link to archive there, where you can find plenty of sources.
You likely do not know, but it is a bit risky business to patch all
existing applications to show that approach is correct, if
implementation is not completed.
You likely do not know, but after I first time announced kevents in
February I changed interfaces 4 times - and it is just interfaces, not
including numerous features added/removed by developer's requests.

 There were some comments about laking much of such programs, answers were
 was in prev. e-mail, need to update them, something like that.
 Trivial web server sources url, mentioned in benchmark isn't pointed
 in patch advertisement. If it was, should i actually try that new
 *trivial* wheel?

Answer is trivial - there is archive where one can find a source code
(filenames are posted regulary). Should I create a rpm? For what glibc
version?

 Saying that, i want to give you some short examples, i know.
 *Linux kernel - userspace*:
 o Alexey Kuznetsov  networking - (excellent) iproute set of utilities;

iproute documentation was way too bad when Alexey presented it first 
time :)

 o Maxim Krasnyansky tun net driver - vtun daemon application;

 *Glibc with mister Drepper* has huge set of tests, please search for
 `tst*' files in the sources.

Btw, show me splice() 'shiny' application? Does lighttpd use it?
Or move_pages().

 To make a little hint to you, Evgeniy, why don't you find a little
 animal in the open source zoo to implement little interface to
 proposed kernel subsystem and then show it to The Big Jury (not me),
 we have here? And i can not see, how you've managed to implement
 something like that having almost nothing on the test basket.
 Very *suspicious* ch.

There are always people who do not like something, what can I do with
it? I present the code, we discuss it, I ask for inclusion (since it is
the only way to get feedback), something requires changes, it is changed
and so on - it is development process.
I created 'little animal in the open source zoo' by myself to show how
simple kevents are.

 One, that comes in mind is lighthttpd http://www.lighttpd.net/.
 It had sub-interface for event systems like select,poll,epoll, when i
 checked its sources last time. And it is mature, btw.

As I already told several times, I changed only interfaces 4 times
already, since no one seems to know what we really want and how
interface should look like. You suggest to patch lighttpd? Well, it is
doable, but then I will be asked to change apache and nginx. And then
someone will suggest to change order of parameters. Will you help me
rewrite userspace? No, you will not. You asks for something without
providing anything back (not getting into account code, but discussion,
ideas, testing time, nothing), and you do it in ultimate manner.
Btw, kevent also support AIO notifications - do you suggest to patch
reactor/proactor for tests?
It supports network AIO - do you suggest to write support for that into
apache?
What about timers? It is possible to rewrite all POSIX timers users to
usem instead.
There is feature request for userspace events and singal delivery - what
to do with that?

I created trivial web servers, which send single static page and use
various event handling schemes, and I test new subsystem with new tools,
when tests are completed and all requested features are implemented it
is time to work on different more complex users.

So let's at least complete what we have right now, so no developer's
efforts could be wasted writing empty chars in various places.

 Cheers.
 
 [ -*- OT -*-   ]
 [ I wouldn't write all this, unless saw your opinion about the ]
 [ reportbug (part of the Debian Bug Tracking System) this week.]
 [ While i'm nobody here, imho, the first thing about good programmer   ]
 [ must be, that he is excellent user.  ]
 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Nate Diller

On 11/1/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote:

On Wed, Nov 01, 2006 at 06:20:43PM +, Oleg Verych ([EMAIL PROTECTED]) wrote:

 Hallo, Evgeniy Polyakov.

Hello, Oleg.

 On 2006-11-01, you wrote:
 []
  Quantifying how much more scalable would be nice, as would be some
  example where it is useful. (It makes my webserver twice as fast on
  monster 64-cpu box).
 
  Trivial kevent web-server can handle 3960+ req/sec on Xeon 2.4Ghz with
 [...]

 Seriously. I'm seeing that patches also. New, shiny, always ready for
 inclusion. But considering kernel (linux in this case) as not thing
 for itself, i want to ask following question.

 Where's real-life application to do configure  make  make install?

Your real life or mine as developer?
I fortunately do not know anything about your real life, but my real life
applications can be found on project's homepage.
There is a link to archive there, where you can find plenty of sources.
You likely do not know, but it is a bit risky business to patch all
existing applications to show that approach is correct, if
implementation is not completed.
You likely do not know, but after I first time announced kevents in
February I changed interfaces 4 times - and it is just interfaces, not
including numerous features added/removed by developer's requests.

 There were some comments about laking much of such programs, answers were
 was in prev. e-mail, need to update them, something like that.
 Trivial web server sources url, mentioned in benchmark isn't pointed
 in patch advertisement. If it was, should i actually try that new
 *trivial* wheel?

Answer is trivial - there is archive where one can find a source code
(filenames are posted regulary). Should I create a rpm? For what glibc
version?

 Saying that, i want to give you some short examples, i know.
 *Linux kernel - userspace*:
 o Alexey Kuznetsov  networking - (excellent) iproute set of utilities;

iproute documentation was way too bad when Alexey presented it first
time :)

 o Maxim Krasnyansky tun net driver - vtun daemon application;

 *Glibc with mister Drepper* has huge set of tests, please search for
 `tst*' files in the sources.

Btw, show me splice() 'shiny' application? Does lighttpd use it?
Or move_pages().

 To make a little hint to you, Evgeniy, why don't you find a little
 animal in the open source zoo to implement little interface to
 proposed kernel subsystem and then show it to The Big Jury (not me),
 we have here? And i can not see, how you've managed to implement
 something like that having almost nothing on the test basket.
 Very *suspicious* ch.

There are always people who do not like something, what can I do with
it? I present the code, we discuss it, I ask for inclusion (since it is
the only way to get feedback), something requires changes, it is changed
and so on - it is development process.
I created 'little animal in the open source zoo' by myself to show how
simple kevents are.

 One, that comes in mind is lighthttpd http://www.lighttpd.net/.
 It had sub-interface for event systems like select,poll,epoll, when i
 checked its sources last time. And it is mature, btw.

As I already told several times, I changed only interfaces 4 times
already, since no one seems to know what we really want and how
interface should look like.


Indesiciveness has certainly been an issue here, but I remember akpm
and Ulrich both giving concrete suggestions.  I was particularly
interested in Andrew's request to explain and justify the differences
between kevent and BSD's kqueue interface.  Was there a discussion
that I missed?  I am very interested to see your work on this
mechanism merged, because you've clearly emphasized performance and
shown impressive results.  But it seems like we lose out on a lot by
throwing out all the applications that already use kqueue.

NATE
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take22 0/4] kevent: Generic event handling mechanism.

2006-11-01 Thread Evgeniy Polyakov
On Wed, Nov 01, 2006 at 06:12:41PM -0800, Nate Diller ([EMAIL PROTECTED]) wrote:
 Indesiciveness has certainly been an issue here, but I remember akpm
 and Ulrich both giving concrete suggestions.  I was particularly
 interested in Andrew's request to explain and justify the differences
 between kevent and BSD's kqueue interface.  Was there a discussion
 that I missed?  I am very interested to see your work on this
 mechanism merged, because you've clearly emphasized performance and
 shown impressive results.  But it seems like we lose out on a lot by
 throwing out all the applications that already use kqueue.

It looks you missed that discussion - freebsd kqueue has fields in the 
kevent structure which have diffent sizes in 32 and 64 bit environments.

 NATE

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html