Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-09-04 Thread Sébastien Dugué
On Sat, 2006-08-12 at 15:28 -0400, Jakub Jelinek wrote:
> On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote:
> > > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > > part of the ABI, but only internal to the kernel implementation. I think
> > > Zach had suggested inferring THREAD_ID notification if the pid specified
> > > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > > (just like the POSIX timers code does) thus doing away with the 
> > > new constants altogether. Sebestian/Laurent, do you recall?
> > 
> > I suggest to model the implementation after the timer code which does
> > exactly what we need.
> 
> Yeah, and if at all possible we want to use just one helper thread for
> SIGEV_THREAD notification of timers/aio/etc., so it really should behave the
> same as timer thread notification.
> 

  That's exactly what is done in libposix-aio.

  Sébastien.

-- 
-

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
   http://sourceforge.net/projects/paiol

-

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-09-04 Thread Sébastien Dugué
On Sat, 2006-08-12 at 12:10 -0700, Ulrich Drepper wrote:
> Suparna Bhattacharya wrote:
> > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > part of the ABI, but only internal to the kernel implementation. I think
> > Zach had suggested inferring THREAD_ID notification if the pid specified
> > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > (just like the POSIX timers code does) thus doing away with the 
> > new constants altogether. Sebestian/Laurent, do you recall?
> 
> I suggest to model the implementation after the timer code which does
> exactly what we need.
> 

  Will do.

> 
> > I'm guessing they are being used for validation of permissions at the time
> > of sending the signal, but maybe saving the task pointer in the iocb instead
> > of the pid would suffice ?
> 
> Why should any verification be necessary?  The requests are generated in
> the same process which will receive the notification.  Even if the POSIX
> process (aka, kernel process group) changes the IDs the notifications
> should be set.  The key is that notifications cannot be sent to another
> POSIX process.
> 
> Adding this as a feature just makes things so much more complicated.
> 

  Agreed.

  Sébastien.


-- 
-

  Sébastien DuguéBULL/FREC:B1-247
  phone: (+33) 476 29 77 70  Bullcom: 229-7770

  mailto:[EMAIL PROTECTED]

  Linux POSIX AIO: http://www.bullopensource.org/posix
   http://sourceforge.net/projects/paiol

-

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-09-04 Thread Sébastien Dugué
  Hi,

  just came back from vacation, sorry for the delay.

On Sat, 2006-08-12 at 23:59 +0530, Suparna Bhattacharya wrote:
> BTW, if anyone would like to be dropped off this growing cc list, please
> let us know.
> 
> On Fri, Aug 11, 2006 at 12:45:55PM -0700, Ulrich Drepper wrote:
> > Sébastien Dugué wrote:
> > >aio completion notification
> > 
> > I looked over this now but I don't think I understand everything.  Or I
> > don't see how it all is integrated.  And no, I'm not looking at the
> > proposed glibc code since would mean being tainted.
> 
> Oh, I didn't realise that. 
> I'll make an attempt to clarify parts that I understand based on what I
> have gleaned from my reading of the code and intent, but hopefully Sebastien,
> Ben, Zach et al will be able to pitch in for a more accurate and complete
> picture.
> 
> > 
> > 
> > > Details:
> > > ---
> > > 
> > >   A struct sigevent *aio_sigeventp is added to struct iocb in
> > > include/linux/aio_abi.h
> > > 
> > >   An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
> > > include/linux/aio.h:
> > > 
> > >   - IO_NOTIFY_SIGNAL means that the signal is to be sent to the
> > > requesting thread 
> > > 
> > >   - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
> > > specifi thread.
> > 
> > This has been proved to be sufficient in the timer code which basically
> > has the same problem.  But why do you need separate constants?  We have
> > the various SIGEV_* constants, among them SIGEV_THREAD_ID.  Just use
> > these constants for the values of ki_notify.
> > 
> 
> I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> part of the ABI, but only internal to the kernel implementation. I think
> Zach had suggested inferring THREAD_ID notification if the pid specified
> is not zero. But, I don't see why ->sigev_notify couldn't used directly
> (just like the POSIX timers code does) thus doing away with the 
> new constants altogether. Sebestian/Laurent, do you recall?

  As I see it, those IO_NOTIFY_* constants are uneeded and we could use
->sigev_notify directly. I will change this so that we use the same
mechanism as the POSIX timers code.

> 
> > 
> > >   The following fields are added to struct kiocb in include/linux/aio.h:
> > > 
> > >   - pid_t ki_pid: target of the signal
> > > 
> > >   - __u16 ki_signo: signal number
> > > 
> > >   - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
> > >  IO_NOTIFY_THREAD_ID
> > > 
> > >   - uid_t ki_uid, ki_euid: filled with the submitter credentials
> > 
> > These two fields aren't needed for the POSIX interfaces.  Where does the
> > requirement come from?  I don't say they should be removed, they might
> > be useful, but if the costs are non-negligible then they could go away.
> 
> I'm guessing they are being used for validation of permissions at the time
> of sending the signal, but maybe saving the task pointer in the iocb instead
> of the pid would suffice ?

  IIRC, Ben added these for that exact reason. Is this really needed?
Ben?

> 
> > 
> > 
> > >   - check whether the submitting thread wants to be notified directly
> > > (sigevent->sigev_notify_thread_id is 0) or wants the signal to be sent
> > > to another thread.
> > > In the latter case a check is made to assert that the target thread
> > > is in the same thread group
> > 
> > Is this really how it's implemented?  This is not how it should be.
> > Either a signal is sent to a specific thread in the same process (this
> > is what SIGEV_THREAD_ID is for) or the signal is sent to a calling
> > process.  Sending a signal to the process means that from the kernel's
> > POV any thread which doesn't have the signal blocked can receive it.
> > The final decision is made by the kernel.  There is no mechanism to send
> > the signal to another process.
> 
> The code seems to be set up to call specific_send_sig_info() in the case
> of *_THREAD_ID , and __group_send_sig_info() otherwise. So I think the
> intended behaviour is as you describe it should be (__group_send_sig_info
> does the equivalent of sending a signal to the process and so any thread
> which doesn't have signals blocked can receive it, while 
> specific_send_sig_info
> sends it to a particular thread). 
> 
> But, I should really leave it to Sebestian to confirm that.

  That's right, but I think that part needs to be reworked to follow
the same logic as the POSIX timers.


> > >   listio support
> > > 
> > 
> > I really don't understand the kernel interface for this feature.
> 
> I'm sorry this is confusing. This probably means that we need to
> separate the external interface description more clearly and completely
> from the internals.
> 
> > 
> > 
> > > Details:
> > > ---
> > > 
> > >   An IOCB_CMD_GROUP is added to the IOCB_CMD enum in 
> > > include/linux/aio_abi.h
> > > 
> > >   A struct lio_event is added in include/linux/aio.h
> > > 
> > >   A struct 

Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-08-14 Thread Nicholas Miell
On Mon, 2006-08-14 at 09:38 -0700, Ulrich Drepper wrote:
> Suparna Bhattacharya wrote:
> > Is there a (remote) possibility that the thread could have died and its
> > pid got reused by a new thread in another process ? Or is there a mechanism
> > that prevents such a possibility from arising (not just in NPTL library,
> > but at the kernel level) ?
> 
> The UID/GID won't help you with dying processes.  What if the same user
> creates a process with the same PID?  That process will not expect the
> notification and mustn't receive it.  If you cannot detect whether the
> issuing process died you have problems which cannot be solved with a
> uid/gid pair.
> 
> 

Eric W. Biederman sent a series of patches that introduced a struct
task_ref specifically to solve this sort of problem on January 28 of
this year, but I don't think it went anywhere.


-- 
Nicholas Miell <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-08-14 Thread Ulrich Drepper
Suparna Bhattacharya wrote:
> Is there a (remote) possibility that the thread could have died and its
> pid got reused by a new thread in another process ? Or is there a mechanism
> that prevents such a possibility from arising (not just in NPTL library,
> but at the kernel level) ?

The UID/GID won't help you with dying processes.  What if the same user
creates a process with the same PID?  That process will not expect the
notification and mustn't receive it.  If you cannot detect whether the
issuing process died you have problems which cannot be solved with a
uid/gid pair.


> AIO for pipes should not be a problem - Chris Mason had a patch, so we can
> just bring it up to the current levels, possibly with some additional
> improvements.

Good.


> I'm not sure what would be the right thing to do for the sockets case. While
> we could put together a patch for basic aio_read/write (based on the same
> model used for files), given the whole ongoing kevent effort, its not yet
> clear to me what would make the most sense ...  
> 
> Ben had a patch to do a fallback to kernel threads for AIO operations that
> are not yet supported natively. I had some concerns about the approach, but
> I guess he had intended it as an interim path for cases like this.

A fallback solution would be sufficient.  Nobody _should_ use POSIX AIO
for networking but people do and just giving them something that works
is good enough.  It cannot really be worse than the userlevel emulation
we have know.

The alternative, separately and sequentially handling network sockets at
userlevel is horrible.  We'd have to go over every file descriptor and
check whether it's a socket and then take if out of the request list for
the kernel.  Then they need to be handled separately before or after the
kernel AIO code.  This would punish unduly all the 99.9% of the programs
which don't use POSIX  AIO for network I/O.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-08-14 Thread Suparna Bhattacharya
On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote:
> Suparna Bhattacharya wrote:
> > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > part of the ABI, but only internal to the kernel implementation. I think
> > Zach had suggested inferring THREAD_ID notification if the pid specified
> > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > (just like the POSIX timers code does) thus doing away with the 
> > new constants altogether. Sebestian/Laurent, do you recall?
> 
> I suggest to model the implementation after the timer code which does
> exactly what we need.

Agreed.

> 
> 
> > I'm guessing they are being used for validation of permissions at the time
> > of sending the signal, but maybe saving the task pointer in the iocb instead
> > of the pid would suffice ?
> 
> Why should any verification be necessary?  The requests are generated in
> the same process which will receive the notification.  Even if the POSIX
> process (aka, kernel process group) changes the IDs the notifications
> should be set.  The key is that notifications cannot be sent to another
> POSIX process.

Is there a (remote) possibility that the thread could have died and its
pid got reused by a new thread in another process ? Or is there a mechanism
that prevents such a possibility from arising (not just in NPTL library,
but at the kernel level) ?

I think the timer code saves a reference to the task pointer instead of
the pid, which is what I was suggesting above (instead of the euid checks),
as way to avoid the above situation.

> 
> Adding this as a feature just makes things so much more complicated.
> 
> 
> > So I think the
> > intended behaviour is as you describe it should be
> 
> Then the documentation needs to be adjusted.

*Nod*

> 
> 
> > The way it works (and better ideas are welcome) is that, since the 
> > io_submit()
> > syscall already accepts an array of iocbs[], no new syscall was introduced.
> > To implement lio_listio, one has to set up such an array, with the first 
> > iocb
> > in the array having the special (new) grouping opcode of IOCB_CMD_GROUP 
> > which
> > specifies the sigev notification to be associated with group completion
> > (a NULL value of the sigev notification pointer would imply equivalent of
> > LIO_WAIT).
> 
> OK, this seems OK.  We have to construct the iocb arrays dynamically anyway.
> 
> 
> > My thought here was that it should be possible to include M as a parameter
> > to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control
> > block ... then whatever semantics are agreed upon can be implemented.
> 
> If you have room for the parameter this is fine.  For the beginning we
> can enforce the number to be the same as the total number of requests.
> 

Sounds good.

> 
> > Let us know what you think about the listio interface ... hopefully the
> > other issues are mostly simple to resolve.
> 
> It should be fine and I would support adding all this assuming the
> normal file support (as opposed to direct I/O only) is added, too.

OK. I updated my patchset against 2618-rc3 just after OLS.

> 
> 
> But I have one last question: sockets, pipes and the like are already
> supported, right?  If this is not the case we have a problem with the
> currently proposed  lio_listio interface.

AIO for pipes should not be a problem - Chris Mason had a patch, so we can
just bring it up to the current levels, possibly with some additional
improvements.

I'm not sure what would be the right thing to do for the sockets case. While
we could put together a patch for basic aio_read/write (based on the same
model used for files), given the whole ongoing kevent effort, its not yet
clear to me what would make the most sense ...  

Ben had a patch to do a fallback to kernel threads for AIO operations that
are not yet supported natively. I had some concerns about the approach, but
I guess he had intended it as an interim path for cases like this.

Suggestions would be much appreciated ?  DaveM, Ingo, Andrew ?

Regards
Suparna

> 
> -- 
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
> 



-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-08-12 Thread Jakub Jelinek
On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote:
> > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > part of the ABI, but only internal to the kernel implementation. I think
> > Zach had suggested inferring THREAD_ID notification if the pid specified
> > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > (just like the POSIX timers code does) thus doing away with the 
> > new constants altogether. Sebestian/Laurent, do you recall?
> 
> I suggest to model the implementation after the timer code which does
> exactly what we need.

Yeah, and if at all possible we want to use just one helper thread for
SIGEV_THREAD notification of timers/aio/etc., so it really should behave the
same as timer thread notification.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

2006-08-12 Thread Ulrich Drepper
Suparna Bhattacharya wrote:
> I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> part of the ABI, but only internal to the kernel implementation. I think
> Zach had suggested inferring THREAD_ID notification if the pid specified
> is not zero. But, I don't see why ->sigev_notify couldn't used directly
> (just like the POSIX timers code does) thus doing away with the 
> new constants altogether. Sebestian/Laurent, do you recall?

I suggest to model the implementation after the timer code which does
exactly what we need.


> I'm guessing they are being used for validation of permissions at the time
> of sending the signal, but maybe saving the task pointer in the iocb instead
> of the pid would suffice ?

Why should any verification be necessary?  The requests are generated in
the same process which will receive the notification.  Even if the POSIX
process (aka, kernel process group) changes the IDs the notifications
should be set.  The key is that notifications cannot be sent to another
POSIX process.

Adding this as a feature just makes things so much more complicated.


> So I think the
> intended behaviour is as you describe it should be

Then the documentation needs to be adjusted.


> The way it works (and better ideas are welcome) is that, since the io_submit()
> syscall already accepts an array of iocbs[], no new syscall was introduced.
> To implement lio_listio, one has to set up such an array, with the first iocb
> in the array having the special (new) grouping opcode of IOCB_CMD_GROUP which
> specifies the sigev notification to be associated with group completion
> (a NULL value of the sigev notification pointer would imply equivalent of
> LIO_WAIT).

OK, this seems OK.  We have to construct the iocb arrays dynamically anyway.


> My thought here was that it should be possible to include M as a parameter
> to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control
> block ... then whatever semantics are agreed upon can be implemented.

If you have room for the parameter this is fine.  For the beginning we
can enforce the number to be the same as the total number of requests.


> Let us know what you think about the listio interface ... hopefully the
> other issues are mostly simple to resolve.

It should be fine and I would support adding all this assuming the
normal file support (as opposed to direct I/O only) is added, too.


But I have one last question: sockets, pipes and the like are already
supported, right?  If this is not the case we have a problem with the
currently proposed  lio_listio interface.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature