Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-09-04 Thread Sébastien Dugué
  Hi,

  just came back from vacation, sorry for the delay.

On Sat, 2006-08-12 at 23:59 +0530, Suparna Bhattacharya wrote:
 BTW, if anyone would like to be dropped off this growing cc list, please
 let us know.
 
 On Fri, Aug 11, 2006 at 12:45:55PM -0700, Ulrich Drepper wrote:
  Sébastien Dugué wrote:
  aio completion notification
  
  I looked over this now but I don't think I understand everything.  Or I
  don't see how it all is integrated.  And no, I'm not looking at the
  proposed glibc code since would mean being tainted.
 
 Oh, I didn't realise that. 
 I'll make an attempt to clarify parts that I understand based on what I
 have gleaned from my reading of the code and intent, but hopefully Sebastien,
 Ben, Zach et al will be able to pitch in for a more accurate and complete
 picture.
 
  
  
   Details:
   ---
   
 A struct sigevent *aio_sigeventp is added to struct iocb in
   include/linux/aio_abi.h
   
 An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
   include/linux/aio.h:
   
 - IO_NOTIFY_SIGNAL means that the signal is to be sent to the
   requesting thread 
   
 - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
   specifi thread.
  
  This has been proved to be sufficient in the timer code which basically
  has the same problem.  But why do you need separate constants?  We have
  the various SIGEV_* constants, among them SIGEV_THREAD_ID.  Just use
  these constants for the values of ki_notify.
  
 
 I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
 part of the ABI, but only internal to the kernel implementation. I think
 Zach had suggested inferring THREAD_ID notification if the pid specified
 is not zero. But, I don't see why -sigev_notify couldn't used directly
 (just like the POSIX timers code does) thus doing away with the 
 new constants altogether. Sebestian/Laurent, do you recall?

  As I see it, those IO_NOTIFY_* constants are uneeded and we could use
-sigev_notify directly. I will change this so that we use the same
mechanism as the POSIX timers code.

 
  
 The following fields are added to struct kiocb in include/linux/aio.h:
   
 - pid_t ki_pid: target of the signal
   
 - __u16 ki_signo: signal number
   
 - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
IO_NOTIFY_THREAD_ID
   
 - uid_t ki_uid, ki_euid: filled with the submitter credentials
  
  These two fields aren't needed for the POSIX interfaces.  Where does the
  requirement come from?  I don't say they should be removed, they might
  be useful, but if the costs are non-negligible then they could go away.
 
 I'm guessing they are being used for validation of permissions at the time
 of sending the signal, but maybe saving the task pointer in the iocb instead
 of the pid would suffice ?

  IIRC, Ben added these for that exact reason. Is this really needed?
Ben?

 
  
  
 - check whether the submitting thread wants to be notified directly
   (sigevent-sigev_notify_thread_id is 0) or wants the signal to be sent
   to another thread.
   In the latter case a check is made to assert that the target thread
   is in the same thread group
  
  Is this really how it's implemented?  This is not how it should be.
  Either a signal is sent to a specific thread in the same process (this
  is what SIGEV_THREAD_ID is for) or the signal is sent to a calling
  process.  Sending a signal to the process means that from the kernel's
  POV any thread which doesn't have the signal blocked can receive it.
  The final decision is made by the kernel.  There is no mechanism to send
  the signal to another process.
 
 The code seems to be set up to call specific_send_sig_info() in the case
 of *_THREAD_ID , and __group_send_sig_info() otherwise. So I think the
 intended behaviour is as you describe it should be (__group_send_sig_info
 does the equivalent of sending a signal to the process and so any thread
 which doesn't have signals blocked can receive it, while 
 specific_send_sig_info
 sends it to a particular thread). 
 
 But, I should really leave it to Sebestian to confirm that.

  That's right, but I think that part needs to be reworked to follow
the same logic as the POSIX timers.


 listio support
   
  
  I really don't understand the kernel interface for this feature.
 
 I'm sorry this is confusing. This probably means that we need to
 separate the external interface description more clearly and completely
 from the internals.
 
  
  
   Details:
   ---
   
 An IOCB_CMD_GROUP is added to the IOCB_CMD enum in 
   include/linux/aio_abi.h
   
 A struct lio_event is added in include/linux/aio.h
   
 A struct lio_event *ki_lio is added to struct iocb in 
   include/linux/aio.h
  
  So you have a pointer in the structure for the individual requests.  I
  assume you use the atomic counter to trigger the final delivery.  I
  further

Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-09-04 Thread Sébastien Dugué
On Sat, 2006-08-12 at 15:28 -0400, Jakub Jelinek wrote:
 On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote:
   I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
   part of the ABI, but only internal to the kernel implementation. I think
   Zach had suggested inferring THREAD_ID notification if the pid specified
   is not zero. But, I don't see why -sigev_notify couldn't used directly
   (just like the POSIX timers code does) thus doing away with the 
   new constants altogether. Sebestian/Laurent, do you recall?
  
  I suggest to model the implementation after the timer code which does
  exactly what we need.
 
 Yeah, and if at all possible we want to use just one helper thread for
 SIGEV_THREAD notification of timers/aio/etc., so it really should behave the
 same as timer thread notification.
 

  That's exactly what is done in libposix-aio.

  Sébastien.

-- 
-

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
   http://sourceforge.net/projects/paiol

-

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-09-04 Thread Sébastien Dugué
On Sat, 2006-08-12 at 12:10 -0700, Ulrich Drepper wrote:
 Suparna Bhattacharya wrote:
  I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
  part of the ABI, but only internal to the kernel implementation. I think
  Zach had suggested inferring THREAD_ID notification if the pid specified
  is not zero. But, I don't see why -sigev_notify couldn't used directly
  (just like the POSIX timers code does) thus doing away with the 
  new constants altogether. Sebestian/Laurent, do you recall?
 
 I suggest to model the implementation after the timer code which does
 exactly what we need.
 

  Will do.

 
  I'm guessing they are being used for validation of permissions at the time
  of sending the signal, but maybe saving the task pointer in the iocb instead
  of the pid would suffice ?
 
 Why should any verification be necessary?  The requests are generated in
 the same process which will receive the notification.  Even if the POSIX
 process (aka, kernel process group) changes the IDs the notifications
 should be set.  The key is that notifications cannot be sent to another
 POSIX process.
 
 Adding this as a feature just makes things so much more complicated.
 

  Agreed.

  Sébastien.


-- 
-

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
   http://sourceforge.net/projects/paiol

-

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/4] kevent: AIO, aio_sendfile() implementation.

2006-07-28 Thread Sébastien Dugué
On Thu, 2006-07-27 at 08:28 -0700, Badari Pulavarty wrote:
 Sébastien Dugué wrote:
  On Wed, 2006-07-26 at 09:22 -0700, Badari Pulavarty wrote:

  Ulrich Drepper wrote:
  
  Christoph Hellwig wrote:


  My personal opinion on existing AIO is that it is not the right design.
  Benjamin LaHaise agree with me (if I understood him right),


  I completely agree with that aswell.
  
  
  I agree, too, but the current code is not the last of the line.  Suparna
  has a st of patches which make the current kernel aio code work much
  better and especially make it really usable to implement POSIX AIO.
 
  In Ottawa we were talking about submitting it and Suparna will.  We just
  thought about a little longer timeframe.  I guess it could be
  accelerated since he mostly has the patch done.  But I don't know her
  schedule.
 
  Important here is, don't base any decision on the current aio
  implementation.


  Ulrich,
 
  Suparna mentioned your interest in making POSIX glibc aio work with 
  kernel-aio at OLS.
  We thought taking a re-look at the (kernel side) work BULL did, would be 
  a nice starting
  point. I re-based those patches to 2.6.18-rc2 and sent it to Zach Brown 
  for review before
  sending them out to list.
 
  These patches does NOT make AIO any cleaner. All they do is add 
  functionality to support
  POSIX AIO easier. These are
 
  [ PATCH 1/3 ]  Adding signal notification for event completion
 
  [ PATCH 2/3 ]  lio (listio) completion semantics
 
  [ PATCH 3/3 ] cancel_fd support
  
 
Badari,
 
Thanks for refreshing those patches, they have been sitting here
  for quite some time now and collected dust.
 
I also think Suparna's patchset for doing buffered AIO would be
  a real plus here.
 

  Suparna explained these in the following article:
 
  http://lwn.net/Articles/148755/
 
  If you think, this is a reasonable direction/approach for the kernel and 
  you would take care
  of glibc side of things - I can spend time on these patches, getting 
  them to reasonable shape
  and push for inclusion.
  
 
Ulrich, I you want to have a look at how those patches are put to
  use in libposix-aio, have a look at http://sourceforge.net/projects/paiol.
 
It could be a starting point for glibc.
 
Thanks,
 
Sébastien.
 

 Sebastien,
 
 Suparna mentioned at Ulrich wants us to concentrate on kernel-side 
 support, so that he
 can look at glibc side of things (along with other work he is already 
 doing). So, if we
 can get an agreement on what kind of kernel support is needed - we can 
 focus our
 efforts on kernel side first and leave glibc enablement to capable hands 
 of Uli :)
 

  That's fine with me. 

  Sébastien.

-- 
-

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
   http://sourceforge.net/projects/paiol

-

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/4] kevent: AIO, aio_sendfile() implementation.

2006-07-28 Thread Sébastien Dugué
On Thu, 2006-07-27 at 11:44 -0700, Ulrich Drepper wrote:
 Badari Pulavarty wrote:
  Before we spend too much time cleaning up and merging into mainline -
  I would like an agreement that what we add is good enough for glibc
  POSIX AIO.
 
 I haven't seen a description of the interface so far.  Would be good if
 it existed.  But I briefly mentioned one quirk in the interface about
 which Suparna wasn't sure whether it's implemented/implementable in the
 current interface.
 
 If a lio_listio call is made the individual requests are handle just as
 if they'd be issue separately.  I.e., the notification specified in the
 individual aiocb is performed when the specific request is done.  Then,
 once all requests are done, another notification is made, this time
 controlled by the sigevent parameter if lio_listio.
 
 
 Another feature which I always wanted: the current lio_listio call
 returns in blocking mode only if all requests are done.  In non-blocking
 mode it returns immediately and the program needs to poll the aiocbs.
 What is needed is something in the middle.  For instance, if multiple
 read requests are issued the program might be able to start working as
 soon as one request is satisfied.  I.e., a call similar to lio_listio
 would be nice which also takes another parameter specifying how many of
 the NENT aiocbs have to finish before the call returns.

  You're right here, that definitely would be a plus.


-- 
-

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
   http://sourceforge.net/projects/paiol

-

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/4] kevent: AIO, aio_sendfile() implementation.

2006-07-28 Thread Sébastien Dugué
On Thu, 2006-07-27 at 14:02 -0700, Badari Pulavarty wrote:
 On Thu, 2006-07-27 at 11:44 -0700, Ulrich Drepper wrote:
  Badari Pulavarty wrote:
   Before we spend too much time cleaning up and merging into mainline -
   I would like an agreement that what we add is good enough for glibc
   POSIX AIO.
  
  I haven't seen a description of the interface so far.  Would be good if
  it existed.  But I briefly mentioned one quirk in the interface about
  which Suparna wasn't sure whether it's implemented/implementable in the
  current interface.
 
 Sebastien, could you provide a description of interfaces you are
 adding ? Since you did all the work, it would be appropriate for
 you to do it :)
 

  I will clean up what description I have and send it soon.

  Sébastien.


-- 
-

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
   http://sourceforge.net/projects/paiol

-

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/4] kevent: AIO, aio_sendfile() implementation.

2006-07-28 Thread Sébastien Dugué
On Thu, 2006-07-27 at 14:02 -0700, Badari Pulavarty wrote:
 On Thu, 2006-07-27 at 11:44 -0700, Ulrich Drepper wrote:
  Badari Pulavarty wrote:
   Before we spend too much time cleaning up and merging into mainline -
   I would like an agreement that what we add is good enough for glibc
   POSIX AIO.
  
  I haven't seen a description of the interface so far.  Would be good if
  it existed.  But I briefly mentioned one quirk in the interface about
  which Suparna wasn't sure whether it's implemented/implementable in the
  current interface.
 
 Sebastien, could you provide a description of interfaces you are
 adding ? Since you did all the work, it would be appropriate for
 you to do it :)
 

  Here are the descriptions for the AIO completion notification and
listio patches. Hope I did not leave out too much.

  Sébastien.

-- 
-

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
   http://sourceforge.net/projects/paiol

-


 aio completion notification

Summary:
---

  The current 2.6 kernel does not support notification of user space via
an RT signal upon an asynchronous IO completion. The POSIX specification
states that when an AIO request completes, a signal can be delivered to
the application as notification.

  The aioevent patch adds a struct sigevent *aio_sigeventp to the iocb.
The relevant fields (pid, signal number and value) are stored in the kiocb
for use when the request completes.

  That sigevent structure is filled by the application as part of the AIO
request preparation. Upon request completion, the kernel notifies the
application using those sigevent parameters. If SIGEV_NONE has been specified,
then the old behaviour is retained and the application must rely on polling
the completion queue using io_getevents().

Details:
---

  A struct sigevent *aio_sigeventp is added to struct iocb in
include/linux/aio_abi.h

  An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
include/linux/aio.h:

- IO_NOTIFY_SIGNAL means that the signal is to be sent to the
  requesting thread 

- IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
  specifi thread.

  The following fields are added to struct kiocb in include/linux/aio.h:

- pid_t ki_pid: target of the signal

- __u16 ki_signo: signal number

- __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
   IO_NOTIFY_THREAD_ID

- uid_t ki_uid, ki_euid: filled with the submitter credentials

- sigval_t ki_sigev_value: value stuffed in siginfo

  these fields are only valid if ki_signo != 0.



  In io_submit_one(), if the application provided a sigevent then
iocb_setup_sigevent() is called which does the following:

- save current-uid and current-euid in the kiocb fields ki_uid and
  ki_euid for use in the completion path to check permissions

- check access to the user sigevent

- extract the needed fields from the sigevent (pid, signo, and value).
  If the signal number passed from userspace is 0 then no notification
  is to occur and ki_signo is set to 0

- check whether the submitting thread wants to be notified directly
  (sigevent-sigev_notify_thread_id is 0) or wants the signal to be sent
  to another thread.
  In the latter case a check is made to assert that the target thread
  is in the same thread group

- fill in the kiocb fields (ki_pid, ki_signo, ki_notify and 
ki_sigev_value)
  for that request.

  Upon request completion, in aio_complete(), if ki_signo is not 0, then
__aio_send_signal() is called which sends the signal as follows:

- fill in the siginfo struct to be sent to the application

- check whether we have permission to signal the given thread

- send the signal

listio support


Summary:
---
  
  The lio patch adds POSIX listio completion notification support. It builds
on support provided by the aio event patch and adds an IOCB_CMD_GROUP
command to sys_io_submit().

  The purpose of IOCB_CMD_GROUP is to group together the following requests in
the list up to the end of the list.

  As part of listio submission, the user process prepends to a list of requests
an empty special aiocb with an aio_lio_opcode of IOCB_CMD_GROUP, filling only
the aio_sigevent fields.



Details:
---

  An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h

  A struct lio_event is added in include/linux/aio.h

  A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h


 In sys_io_submit(), upon detecting such an IOCB_CMD_GROUP marker iocb