Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)
Hi, just came back from vacation, sorry for the delay. On Sat, 2006-08-12 at 23:59 +0530, Suparna Bhattacharya wrote: BTW, if anyone would like to be dropped off this growing cc list, please let us know. On Fri, Aug 11, 2006 at 12:45:55PM -0700, Ulrich Drepper wrote: Sébastien Dugué wrote: aio completion notification I looked over this now but I don't think I understand everything. Or I don't see how it all is integrated. And no, I'm not looking at the proposed glibc code since would mean being tainted. Oh, I didn't realise that. I'll make an attempt to clarify parts that I understand based on what I have gleaned from my reading of the code and intent, but hopefully Sebastien, Ben, Zach et al will be able to pitch in for a more accurate and complete picture. Details: --- A struct sigevent *aio_sigeventp is added to struct iocb in include/linux/aio_abi.h An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in include/linux/aio.h: - IO_NOTIFY_SIGNAL means that the signal is to be sent to the requesting thread - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a specifi thread. This has been proved to be sufficient in the timer code which basically has the same problem. But why do you need separate constants? We have the various SIGEV_* constants, among them SIGEV_THREAD_ID. Just use these constants for the values of ki_notify. I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not part of the ABI, but only internal to the kernel implementation. I think Zach had suggested inferring THREAD_ID notification if the pid specified is not zero. But, I don't see why -sigev_notify couldn't used directly (just like the POSIX timers code does) thus doing away with the new constants altogether. Sebestian/Laurent, do you recall? As I see it, those IO_NOTIFY_* constants are uneeded and we could use -sigev_notify directly. I will change this so that we use the same mechanism as the POSIX timers code. The following fields are added to struct kiocb in include/linux/aio.h: - pid_t ki_pid: target of the signal - __u16 ki_signo: signal number - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or IO_NOTIFY_THREAD_ID - uid_t ki_uid, ki_euid: filled with the submitter credentials These two fields aren't needed for the POSIX interfaces. Where does the requirement come from? I don't say they should be removed, they might be useful, but if the costs are non-negligible then they could go away. I'm guessing they are being used for validation of permissions at the time of sending the signal, but maybe saving the task pointer in the iocb instead of the pid would suffice ? IIRC, Ben added these for that exact reason. Is this really needed? Ben? - check whether the submitting thread wants to be notified directly (sigevent-sigev_notify_thread_id is 0) or wants the signal to be sent to another thread. In the latter case a check is made to assert that the target thread is in the same thread group Is this really how it's implemented? This is not how it should be. Either a signal is sent to a specific thread in the same process (this is what SIGEV_THREAD_ID is for) or the signal is sent to a calling process. Sending a signal to the process means that from the kernel's POV any thread which doesn't have the signal blocked can receive it. The final decision is made by the kernel. There is no mechanism to send the signal to another process. The code seems to be set up to call specific_send_sig_info() in the case of *_THREAD_ID , and __group_send_sig_info() otherwise. So I think the intended behaviour is as you describe it should be (__group_send_sig_info does the equivalent of sending a signal to the process and so any thread which doesn't have signals blocked can receive it, while specific_send_sig_info sends it to a particular thread). But, I should really leave it to Sebestian to confirm that. That's right, but I think that part needs to be reworked to follow the same logic as the POSIX timers. listio support I really don't understand the kernel interface for this feature. I'm sorry this is confusing. This probably means that we need to separate the external interface description more clearly and completely from the internals. Details: --- An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h A struct lio_event is added in include/linux/aio.h A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h So you have a pointer in the structure for the individual requests. I assume you use the atomic counter to trigger the final delivery. I further
Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)
On Sat, 2006-08-12 at 15:28 -0400, Jakub Jelinek wrote: On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote: I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not part of the ABI, but only internal to the kernel implementation. I think Zach had suggested inferring THREAD_ID notification if the pid specified is not zero. But, I don't see why -sigev_notify couldn't used directly (just like the POSIX timers code does) thus doing away with the new constants altogether. Sebestian/Laurent, do you recall? I suggest to model the implementation after the timer code which does exactly what we need. Yeah, and if at all possible we want to use just one helper thread for SIGEV_THREAD notification of timers/aio/etc., so it really should behave the same as timer thread notification. That's exactly what is done in libposix-aio. Sébastien. -- - Sébastien DuguéBULL/FREC:B1-247 phone: (+33) 476 29 77 70 Bullcom: 229-7770 mailto:[EMAIL PROTECTED] Linux POSIX AIO: http://www.bullopensource.org/posix http://sourceforge.net/projects/paiol - - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)
On Sat, 2006-08-12 at 12:10 -0700, Ulrich Drepper wrote: Suparna Bhattacharya wrote: I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not part of the ABI, but only internal to the kernel implementation. I think Zach had suggested inferring THREAD_ID notification if the pid specified is not zero. But, I don't see why -sigev_notify couldn't used directly (just like the POSIX timers code does) thus doing away with the new constants altogether. Sebestian/Laurent, do you recall? I suggest to model the implementation after the timer code which does exactly what we need. Will do. I'm guessing they are being used for validation of permissions at the time of sending the signal, but maybe saving the task pointer in the iocb instead of the pid would suffice ? Why should any verification be necessary? The requests are generated in the same process which will receive the notification. Even if the POSIX process (aka, kernel process group) changes the IDs the notifications should be set. The key is that notifications cannot be sent to another POSIX process. Adding this as a feature just makes things so much more complicated. Agreed. Sébastien. -- - Sébastien DuguéBULL/FREC:B1-247 phone: (+33) 476 29 77 70 Bullcom: 229-7770 mailto:[EMAIL PROTECTED] Linux POSIX AIO: http://www.bullopensource.org/posix http://sourceforge.net/projects/paiol - - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/4] kevent: AIO, aio_sendfile() implementation.
On Thu, 2006-07-27 at 08:28 -0700, Badari Pulavarty wrote: Sébastien Dugué wrote: On Wed, 2006-07-26 at 09:22 -0700, Badari Pulavarty wrote: Ulrich Drepper wrote: Christoph Hellwig wrote: My personal opinion on existing AIO is that it is not the right design. Benjamin LaHaise agree with me (if I understood him right), I completely agree with that aswell. I agree, too, but the current code is not the last of the line. Suparna has a st of patches which make the current kernel aio code work much better and especially make it really usable to implement POSIX AIO. In Ottawa we were talking about submitting it and Suparna will. We just thought about a little longer timeframe. I guess it could be accelerated since he mostly has the patch done. But I don't know her schedule. Important here is, don't base any decision on the current aio implementation. Ulrich, Suparna mentioned your interest in making POSIX glibc aio work with kernel-aio at OLS. We thought taking a re-look at the (kernel side) work BULL did, would be a nice starting point. I re-based those patches to 2.6.18-rc2 and sent it to Zach Brown for review before sending them out to list. These patches does NOT make AIO any cleaner. All they do is add functionality to support POSIX AIO easier. These are [ PATCH 1/3 ] Adding signal notification for event completion [ PATCH 2/3 ] lio (listio) completion semantics [ PATCH 3/3 ] cancel_fd support Badari, Thanks for refreshing those patches, they have been sitting here for quite some time now and collected dust. I also think Suparna's patchset for doing buffered AIO would be a real plus here. Suparna explained these in the following article: http://lwn.net/Articles/148755/ If you think, this is a reasonable direction/approach for the kernel and you would take care of glibc side of things - I can spend time on these patches, getting them to reasonable shape and push for inclusion. Ulrich, I you want to have a look at how those patches are put to use in libposix-aio, have a look at http://sourceforge.net/projects/paiol. It could be a starting point for glibc. Thanks, Sébastien. Sebastien, Suparna mentioned at Ulrich wants us to concentrate on kernel-side support, so that he can look at glibc side of things (along with other work he is already doing). So, if we can get an agreement on what kind of kernel support is needed - we can focus our efforts on kernel side first and leave glibc enablement to capable hands of Uli :) That's fine with me. Sébastien. -- - Sébastien DuguéBULL/FREC:B1-247 phone: (+33) 476 29 77 70 Bullcom: 229-7770 mailto:[EMAIL PROTECTED] Linux POSIX AIO: http://www.bullopensource.org/posix http://sourceforge.net/projects/paiol - - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/4] kevent: AIO, aio_sendfile() implementation.
On Thu, 2006-07-27 at 11:44 -0700, Ulrich Drepper wrote: Badari Pulavarty wrote: Before we spend too much time cleaning up and merging into mainline - I would like an agreement that what we add is good enough for glibc POSIX AIO. I haven't seen a description of the interface so far. Would be good if it existed. But I briefly mentioned one quirk in the interface about which Suparna wasn't sure whether it's implemented/implementable in the current interface. If a lio_listio call is made the individual requests are handle just as if they'd be issue separately. I.e., the notification specified in the individual aiocb is performed when the specific request is done. Then, once all requests are done, another notification is made, this time controlled by the sigevent parameter if lio_listio. Another feature which I always wanted: the current lio_listio call returns in blocking mode only if all requests are done. In non-blocking mode it returns immediately and the program needs to poll the aiocbs. What is needed is something in the middle. For instance, if multiple read requests are issued the program might be able to start working as soon as one request is satisfied. I.e., a call similar to lio_listio would be nice which also takes another parameter specifying how many of the NENT aiocbs have to finish before the call returns. You're right here, that definitely would be a plus. -- - Sébastien DuguéBULL/FREC:B1-247 phone: (+33) 476 29 77 70 Bullcom: 229-7770 mailto:[EMAIL PROTECTED] Linux POSIX AIO: http://www.bullopensource.org/posix http://sourceforge.net/projects/paiol - - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/4] kevent: AIO, aio_sendfile() implementation.
On Thu, 2006-07-27 at 14:02 -0700, Badari Pulavarty wrote: On Thu, 2006-07-27 at 11:44 -0700, Ulrich Drepper wrote: Badari Pulavarty wrote: Before we spend too much time cleaning up and merging into mainline - I would like an agreement that what we add is good enough for glibc POSIX AIO. I haven't seen a description of the interface so far. Would be good if it existed. But I briefly mentioned one quirk in the interface about which Suparna wasn't sure whether it's implemented/implementable in the current interface. Sebastien, could you provide a description of interfaces you are adding ? Since you did all the work, it would be appropriate for you to do it :) I will clean up what description I have and send it soon. Sébastien. -- - Sébastien DuguéBULL/FREC:B1-247 phone: (+33) 476 29 77 70 Bullcom: 229-7770 mailto:[EMAIL PROTECTED] Linux POSIX AIO: http://www.bullopensource.org/posix http://sourceforge.net/projects/paiol - - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/4] kevent: AIO, aio_sendfile() implementation.
On Thu, 2006-07-27 at 14:02 -0700, Badari Pulavarty wrote: On Thu, 2006-07-27 at 11:44 -0700, Ulrich Drepper wrote: Badari Pulavarty wrote: Before we spend too much time cleaning up and merging into mainline - I would like an agreement that what we add is good enough for glibc POSIX AIO. I haven't seen a description of the interface so far. Would be good if it existed. But I briefly mentioned one quirk in the interface about which Suparna wasn't sure whether it's implemented/implementable in the current interface. Sebastien, could you provide a description of interfaces you are adding ? Since you did all the work, it would be appropriate for you to do it :) Here are the descriptions for the AIO completion notification and listio patches. Hope I did not leave out too much. Sébastien. -- - Sébastien DuguéBULL/FREC:B1-247 phone: (+33) 476 29 77 70 Bullcom: 229-7770 mailto:[EMAIL PROTECTED] Linux POSIX AIO: http://www.bullopensource.org/posix http://sourceforge.net/projects/paiol - aio completion notification Summary: --- The current 2.6 kernel does not support notification of user space via an RT signal upon an asynchronous IO completion. The POSIX specification states that when an AIO request completes, a signal can be delivered to the application as notification. The aioevent patch adds a struct sigevent *aio_sigeventp to the iocb. The relevant fields (pid, signal number and value) are stored in the kiocb for use when the request completes. That sigevent structure is filled by the application as part of the AIO request preparation. Upon request completion, the kernel notifies the application using those sigevent parameters. If SIGEV_NONE has been specified, then the old behaviour is retained and the application must rely on polling the completion queue using io_getevents(). Details: --- A struct sigevent *aio_sigeventp is added to struct iocb in include/linux/aio_abi.h An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in include/linux/aio.h: - IO_NOTIFY_SIGNAL means that the signal is to be sent to the requesting thread - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a specifi thread. The following fields are added to struct kiocb in include/linux/aio.h: - pid_t ki_pid: target of the signal - __u16 ki_signo: signal number - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or IO_NOTIFY_THREAD_ID - uid_t ki_uid, ki_euid: filled with the submitter credentials - sigval_t ki_sigev_value: value stuffed in siginfo these fields are only valid if ki_signo != 0. In io_submit_one(), if the application provided a sigevent then iocb_setup_sigevent() is called which does the following: - save current-uid and current-euid in the kiocb fields ki_uid and ki_euid for use in the completion path to check permissions - check access to the user sigevent - extract the needed fields from the sigevent (pid, signo, and value). If the signal number passed from userspace is 0 then no notification is to occur and ki_signo is set to 0 - check whether the submitting thread wants to be notified directly (sigevent-sigev_notify_thread_id is 0) or wants the signal to be sent to another thread. In the latter case a check is made to assert that the target thread is in the same thread group - fill in the kiocb fields (ki_pid, ki_signo, ki_notify and ki_sigev_value) for that request. Upon request completion, in aio_complete(), if ki_signo is not 0, then __aio_send_signal() is called which sends the signal as follows: - fill in the siginfo struct to be sent to the application - check whether we have permission to signal the given thread - send the signal listio support Summary: --- The lio patch adds POSIX listio completion notification support. It builds on support provided by the aio event patch and adds an IOCB_CMD_GROUP command to sys_io_submit(). The purpose of IOCB_CMD_GROUP is to group together the following requests in the list up to the end of the list. As part of listio submission, the user process prepends to a list of requests an empty special aiocb with an aio_lio_opcode of IOCB_CMD_GROUP, filling only the aio_sigevent fields. Details: --- An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h A struct lio_event is added in include/linux/aio.h A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h In sys_io_submit(), upon detecting such an IOCB_CMD_GROUP marker iocb