Re: [RFC] Heads up on a series of AIO patchsets
generic_write_checks() are done in the submission path, not repeated during retries, so such types of checks are not intended to run in the aio thread. Ah, I see, I was missing the short-cut which avoids re-running parts of the write path if we're in a retry. if (!is_sync_kiocb(iocb) && kiocbIsRestarted(iocb)) { /* nothing to transfer, may just need to sync data */ return ocount; It's pretty subtle that this has to be placed before the first significant current reference and that nothing in the path can return -EIOCBRETRY until after all of the significant current references. In totally unrelated news, I noticed that current->io_wait is set to NULL instead of ¤t->__wait after having run the iocb. I wonder if it shouldn't be saved and restored instead. And maybe update the comment over is_sync_wait()? Just an observation. That is great and I look forward to it :) I am, however assuming that whatever implementation you come up will have a different interface from current linux aio -- i.e. a next generation aio model, that will be easily integratable with kevents etc. Yeah, that's the hope. Which takes me back to Ingo's point - lets have the new evolve parallely with the old, if we can, and not hold up the patches for POSIX AIO to start using kernel AIO, or for epoll to integrate with AIO. Sure, though there are only so many hours in a day :). OK, I just took a quick look at your blog and I see that you are basically implementing Linus' microthreads scheduling approach - one year since we had that discussion. Yeah. I wanted to see what it would look like. Glad to see that you found a way to make it workable ... We, that remains to be seen. If nothing else we'll at least hav code to point at when discussing it. If we all agree it's not the right way and dismiss the notion, fine, that's progress :). (I'm guessing that you are copying over the part of the stack in use at the time of every switch, is that correct ? That was my first pass, yeah. It turned the knob a little too far towards the "invasive but efficient" direction for my taste. I'm now giving it a try by having full stacks for each blocked op, we'll see how that goes. At what point do you do the allocation of the saved stacks ? I was allocating at block-time to keep memory consumption down, but I think my fiddling around with it convinced me that isn't workable. - z - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
On Tue 2007-01-02 16:18:40, Kent Overstreet wrote: > >> Any details? > > > >Well, one path I tried I couldn't help but post a blog > >entry about > >for my friends. I'm not sure it's the direction I'll > >take with linux- > >kernel, but the fundamentals are there: the api should > >be the > >syscall interface, and there should be no difference > >between sync and > >async behaviour. > > > >http://www.zabbo.net/?p=72 > > Any code you're willing to let people play with? I could > at least have > real test cases, and a library to go along with it as it > gets > finished. > > Another pie in the sky idea: > One thing that's been bugging me lately (working on a 9p > server), is > sendfile is hard to use in practice because you need > packet headers > and such, and they need to go out at the same time. splice()? Pavel -- Thanks for all the (sleeping) penguins. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
On Tue, Jan 02, 2007 at 02:38:13PM -0700, Dan Williams ([EMAIL PROTECTED]) wrote: > Would you have time to comment on the approach I have taken to > implement a standard asynchronous memcpy interface? It seems it would > be a good complement to what you are proposing. The entity that > describes the aio operation could take advantage of asynchronous > engines to carry out copies or other transforms (maybe an acrypto tie > in as well). > > Here is the posting for 2.6.19. There has since been updates for > 2.6.20, but the overall approach remains the same. > intro: http://marc.theaimsgroup.com/?l=linux-raid&m=116491661527161&w=2 > async_tx: http://marc.theaimsgroup.com/?l=linux-raid&m=116491753318175&w=2 My first impression is that it has too many lists :) Looks good, but IMHO there are steps to implement further. I have not found there any kind of scheduler - what if system has two async engines? What if sync engine faster than async in some cases (and it is indeed the case for small buffers), and should be selected that time? What if you will want to add additional transformations for some devices like crypto processing or checksumming? I would just create a driver for low-level engine, and exported its functionality - iop3_async_copy(), iop3_async_checksum(), iop3_async_crypto_1(), iop3_async_crypto_2() and so on. There will be a lot of potential users of exactly that functionality, but not stricly hardcoded higher layer operations like raidX. More generic solution must be used to select appropriate device. We had a very brief discussion about asynchronous crypto layer (acrypto) and how its ideas could be used for async dma engines - user should not even know how his data has been transferred - it calls async_copy(), which selects appropriate device (and sync copy is just an additional usual device in that case) from the list of devices, exported its functionality, selection can be done in millions of different ways from getting the fisrt one from the list (this is essentially how your approach is implemented right now), or using special (including run-time updated) heueristics (like it is done in acrypto). Thinking further, async_copy() is just a usual case for async class of operations. So the same above logic must be applied on this layer too. But 'layers are the way to design protocols, not implement them'. David Miller on netchannels So, user should not even know about layers - it should just say 'copy data from pointer A to pointer B', or 'copy data from pointer A to socket B' or even 'copy it from file "/tmp/file" to "192.168.0.1:80:tcp"', without ever knowing that there are sockets and/or memcpy() calls, and if user requests to perform it asynchronously, it must be later notified (one might expect, that I will prefer to use kevent :) The same approach thus can be used by NFS/SAMBA/CIFS and other users. That is how I start to implement AIO (it looks like it becomes popular): 1. system exports set of operations it supports (send, receive, copy, crypto, ) 2. each operation has subsequent set of suboptions (different crypto types, for example) 3. each operation has set of low-level drivers, which support it (with optional performance or any other parameters) 4. each driver when loaded publishes its capabilities (async copy with speed A, XOR and so on) >From user's point of view its aio_sendfile() or async_copy() will look following: 1. call aio_schedule_pointer(source='0xaabbccdd', dest='0x123456578') 1. call aio_schedule_file_socket(source='/tmp/file', dest='socket') 1. call aio_schedule_file_addr(source='/tmp/file', dest='192.168.0.1:80:tcp') or any other similar call then wait for received descriptor in kevent_get_events() or provide own cookie in each call. Each request is then converted into FIFO of smaller request like 'open file', 'open socket', 'get in user pages' and so on, each of which should be handled on appropriate device (hardware or software), completeness of each request starts procesing of the next one. Reading microthreading design notes I recall comparison of the NPTL and Erlang threading models on Debian site - they are _completely_ different models, NPTL creates real threads, which is supposed (I hope NOT) to be implemented in microthreading design too. It is slow. (Or is it not, Zach, we are intrigued :) It's damn bloody slow to create a thread compared to the correct non-blocking state machine. TUX state machine is similar to what I had in my first kevent based FS and network AIO patchset, and what I will use for current async processing work. A bit of empty words actually, but it can provide some food for thoughts. > Regards, > > Dan -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
On Tue, Jan 02, 2007 at 03:56:09PM -0800, Zach Brown wrote: > Sorry for the delay, I'm finally back from the holiday break :) Welcome back ! > > >(1) The filesystem AIO patchset, attempts to address one part of > >the problem > >which is to make regular file IO, (without O_DIRECT) > >asynchronous (mainly > >the case of reads of uncached or partially cached files, and > >O_SYNC writes). > > One of the properties of the currently implemented EIOCBRETRY aio > path is that ->mm is the only field in current which matches the > submitting task_struct while inside the retry path. Yes and that as I guess you know is to enable the aio worker thread to operate on the caller's address space for copy_from/to_user. The actual io setup and associated checks are expected to have been handled at submission time. > > It looks like a retry-based aio write path would be broken because of > this. generic_write_checks() could run in the aio thread and get its > task_struct instead of that of the submitter. The wrong rlimit will > be tested and SIGXFSZ won't be raised. remove_suid() could check the > capabilities of the aio thread instead of those of the submitter. generic_write_checks() are done in the submission path, not repeated during retries, so such types of checks are not intended to run in the aio thread. Did I miss something here ? > > I don't think EIOCBRETRY is the way to go because of this increased > (and subtle!) complexity. What are the chances that we would have > ever found those bugs outside code review? How do we make sure that > current references don't sneak back in after having initially audited > the paths? The EIOCBRETRY route is not something that is intended to be used blindly, It is just one alternative to implement an aio operation by splitting up responsibility between the submitter and aio threads, where aio threads can run in the caller's address space. > > Take the io_cmd_epoll_wait patch.. > > >issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach > >Brown with > >modifications from Jeff Moyer and me) addresses this problem > >for native > >linux aio in a simple manner. > > It's simple looking, sure. This current flipping didn't even occur > to me while throwing the patch together! > > But that patch ends up calling ->poll (and poll_table->qproc) and > writing to userspace (so potentially calling ->nopage) from the aio Yes of course, but why is that a problem ? The copy_from/to_user/put_user constructs are designed to handle soft failures, and we are already using the caller's ->mm. Do you see a need for any additional asserts() ? If there is something that is needed by ->nopage etc which is not abstracted out within the ->mm, then we would need to fix that instead, for correctness anyway, isn't that so ? Now it is possible that there are minor blocking points in the code and the effect of these would be to hold up / delay subsequent queued aio operations; which is an efficiency issue, but not a correctness concern. > threads. Are we sure that none of them will behave surprisingly > because current changed under them? My take is that we should fix the problems that we see. It is likely that what manifests relatively more easily with AIO is also a subtle problem in other cases. > > It might be safe now, but that isn't really the point. I'd rather we > didn't have yet one more subtle invariant to audit and maintain. > > At the risk of making myself vulnerable to the charge of mentioning > vapourware, I will admit that I've been working on a (slightly mad) > implementation of async syscalls. I've been quiet about it because I > don't want to whip up complicated discussion without being able to > show code that works, even if barely. I mention it now only to make > it clear that I want to be constructive, not just critical :). That is great and I look forward to it :) I am, however assuming that whatever implementation you come up will have a different interface from current linux aio -- i.e. a next generation aio model, that will be easily integratable with kevents etc. Which takes me back to Ingo's point - lets have the new evolve parallely with the old, if we can, and not hold up the patches for POSIX AIO to start using kernel AIO, or for epoll to integrate with AIO. OK, I just took a quick look at your blog and I see that you are basically implementing Linus' microthreads scheduling approach - one year since we had that discussion. Glad to see that you found a way to make it workable ... (I'm guessing that you are copying over the part of the stack in use at the time of every switch, is that correct ? At what point do you do the allocation of the saved stacks ? Sorry I should hold off all these questions till your patch comes out) Regards Suparna > > - z -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in th
Re: [RFC] Heads up on a series of AIO patchsets
> Any details? Well, one path I tried I couldn't help but post a blog entry about for my friends. I'm not sure it's the direction I'll take with linux- kernel, but the fundamentals are there: the api should be the syscall interface, and there should be no difference between sync and async behaviour. http://www.zabbo.net/?p=72 Any code you're willing to let people play with? I could at least have real test cases, and a library to go along with it as it gets finished. Another pie in the sky idea: One thing that's been bugging me lately (working on a 9p server), is sendfile is hard to use in practice because you need packet headers and such, and they need to go out at the same time. Sendfile listio support would fix this, but it's not a general solution. What would be really usefull is a way to say that a certain batch of async ops either all succeed or all fail, and happen atomically; i.e., transactions for syscalls. Probably even harder to do than general async syscalls, but it'd be the best thing since sliced bread... and hey, it seems the logical next step. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
Sorry for the delay, I'm finally back from the holiday break :) (1) The filesystem AIO patchset, attempts to address one part of the problem which is to make regular file IO, (without O_DIRECT) asynchronous (mainly the case of reads of uncached or partially cached files, and O_SYNC writes). One of the properties of the currently implemented EIOCBRETRY aio path is that ->mm is the only field in current which matches the submitting task_struct while inside the retry path. It looks like a retry-based aio write path would be broken because of this. generic_write_checks() could run in the aio thread and get its task_struct instead of that of the submitter. The wrong rlimit will be tested and SIGXFSZ won't be raised. remove_suid() could check the capabilities of the aio thread instead of those of the submitter. I don't think EIOCBRETRY is the way to go because of this increased (and subtle!) complexity. What are the chances that we would have ever found those bugs outside code review? How do we make sure that current references don't sneak back in after having initially audited the paths? Take the io_cmd_epoll_wait patch.. issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach Brown with modifications from Jeff Moyer and me) addresses this problem for native linux aio in a simple manner. It's simple looking, sure. This current flipping didn't even occur to me while throwing the patch together! But that patch ends up calling ->poll (and poll_table->qproc) and writing to userspace (so potentially calling ->nopage) from the aio threads. Are we sure that none of them will behave surprisingly because current changed under them? It might be safe now, but that isn't really the point. I'd rather we didn't have yet one more subtle invariant to audit and maintain. At the risk of making myself vulnerable to the charge of mentioning vapourware, I will admit that I've been working on a (slightly mad) implementation of async syscalls. I've been quiet about it because I don't want to whip up complicated discussion without being able to show code that works, even if barely. I mention it now only to make it clear that I want to be constructive, not just critical :). - z - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
On 12/28/06, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: [ I'm only subscribed to linux-fsdevel@ from above Cc list, please keep this list in Cc: for AIO related stuff. ] On Wed, Dec 27, 2006 at 04:25:30PM +, Christoph Hellwig ([EMAIL PROTECTED]) wrote: > (1) note that there is another problem with the current kevent interface, > and that is that it duplicates the event infrastructure for it's > underlying subsystems instead of reusing existing code (e.g. > inotify, epoll, dio-aio). If we want kevent to be _the_ unified > event system for Linux we need people to help out with straightening > out these even provides as Evgeny seems to be unwilling/unable to > do the work himself and the duplication is simply not acceptable. I would rewrite inotify/epoll to use kevent, but I would strongly prefer that it would be done by peopl who created original interfaces - it is politic decision, not techinical - I do not want to be blamed on each corner that I killed other people work :) FS and network AIO kevent based stuff was dropped from kevent tree in favour of upcoming project (description below). According do AIO - my personal opinion is that AIO should be designed asynchronously in all aspects. Here is brief note on how I plan to iplement it (I plan to start in about a week after New Year vacations). === All existing AIO - both mainline and kevent based lack major feature - they are not fully asyncronous, i.e. they require synchronous set of steps, some of which can be asynchronous. For example aio_sendfile() [1] requires open of the file descriptor and only then aio_sendfile() call. The same applies to mainline AIO and read/write calls. My idea is to create real asyncronous IO - i.e. some entity which will describe set of tasks which should be performed asynchronously (from user point of view, although read and write obviously must be done after open and before close), for example syscall which gets as parameter destination socket and local filename (with optional offset and length fields), which will asynchronously from user point of view open a file and transfer requested part to the destination socket and then return opened file descriptor (or it can be closed if requested). Similar mechanism can be done for read/write calls. This approach as long as asynchronous IO at all requires access to user memory from kernels thread or even interrupt handler (that is where kevent based AIO completes its requests) - it can be done in the way similar to how existing kevent ring buffer implementation and also can use dedicated kernel thread or workqueue to copy data into process memory. Would you have time to comment on the approach I have taken to implement a standard asynchronous memcpy interface? It seems it would be a good complement to what you are proposing. The entity that describes the aio operation could take advantage of asynchronous engines to carry out copies or other transforms (maybe an acrypto tie in as well). Here is the posting for 2.6.19. There has since been updates for 2.6.20, but the overall approach remains the same. intro: http://marc.theaimsgroup.com/?l=linux-raid&m=116491661527161&w=2 async_tx: http://marc.theaimsgroup.com/?l=linux-raid&m=116491753318175&w=2 Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
[ I'm only subscribed to linux-fsdevel@ from above Cc list, please keep this list in Cc: for AIO related stuff. ] On Wed, Dec 27, 2006 at 04:25:30PM +, Christoph Hellwig ([EMAIL PROTECTED]) wrote: > (1) note that there is another problem with the current kevent interface, > and that is that it duplicates the event infrastructure for it's > underlying subsystems instead of reusing existing code (e.g. > inotify, epoll, dio-aio). If we want kevent to be _the_ unified > event system for Linux we need people to help out with straightening > out these even provides as Evgeny seems to be unwilling/unable to > do the work himself and the duplication is simply not acceptable. I would rewrite inotify/epoll to use kevent, but I would strongly prefer that it would be done by peopl who created original interfaces - it is politic decision, not techinical - I do not want to be blamed on each corner that I killed other people work :) FS and network AIO kevent based stuff was dropped from kevent tree in favour of upcoming project (description below). According do AIO - my personal opinion is that AIO should be designed asynchronously in all aspects. Here is brief note on how I plan to iplement it (I plan to start in about a week after New Year vacations). === All existing AIO - both mainline and kevent based lack major feature - they are not fully asyncronous, i.e. they require synchronous set of steps, some of which can be asynchronous. For example aio_sendfile() [1] requires open of the file descriptor and only then aio_sendfile() call. The same applies to mainline AIO and read/write calls. My idea is to create real asyncronous IO - i.e. some entity which will describe set of tasks which should be performed asynchronously (from user point of view, although read and write obviously must be done after open and before close), for example syscall which gets as parameter destination socket and local filename (with optional offset and length fields), which will asynchronously from user point of view open a file and transfer requested part to the destination socket and then return opened file descriptor (or it can be closed if requested). Similar mechanism can be done for read/write calls. This approach as long as asynchronous IO at all requires access to user memory from kernels thread or even interrupt handler (that is where kevent based AIO completes its requests) - it can be done in the way similar to how existing kevent ring buffer implementation and also can use dedicated kernel thread or workqueue to copy data into process memory. It is very interesting task and should greatly speed up workloads of busy web/ftp and other servers, which can work with a huge number of files and huge number of clients. I've put it into TODO list. Someone, please stop the time for several days, so I could create some really good things for the universe. 1. Network AIO http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > unified event system for Linux we need people to help out with > > straightening out these even provides as Evgeny seems to be > > unwilling/unable to do the work himself and the duplication is > > simply not acceptable. > > yeah. The internal machinery should be as unified as possible - but > different sets of APIs can be offered, to make it easy for people to > extend their existing apps in the most straightforward way. just to expand on this: i dont think this should be an impediment to the POSIX AIO patches. We should get some movement into this and should give the capability to glibc and applications. Kernel-internal unification is something we are pretty good at doing after the fact. (and if any of the APIs dies or gets very uncommon we know in which direction to unify) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
* Christoph Hellwig <[EMAIL PROTECTED]> wrote: > The real question here is which interface we want people to use for > these "combined" applications. Evgeny is heavily pushing kevent for > this while other seem to prefer integration epoll into the aio > interface. (1) > > I must admit that kevent seems to be the cleaner way to support this, > although I see some advantages for the aio variant. I do think > however that we should not actively promote two differnt interfaces > long term. i see no fundamental disadvantage from doing both. That way the 'market' of applications will vote. (we have 2 other fundamental types available as well: sync IO and poll() based IO - so it's not like we have the choice between 2 or 1 variant, we have the choice between 4 or 3 variants) > (1) note that there is another problem with the current kevent > interface, and that is that it duplicates the event infrastructure > for it's underlying subsystems instead of reusing existing code > (e.g. inotify, epoll, dio-aio). If we want kevent to be _the_ > unified event system for Linux we need people to help out with > straightening out these even provides as Evgeny seems to be > unwilling/unable to do the work himself and the duplication is > simply not acceptable. yeah. The internal machinery should be as unified as possible - but different sets of APIs can be offered, to make it easy for people to extend their existing apps in the most straightforward way. (In fact i'd like to see all the 'poll table' code to be unified into this as well, if possible - it does not really "poll" anything, it's an event infrastructure as well, used via the naive select() and poll() syscalls. We should fix that naming mistake.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on a series of AIO patchsets
On Wed, Dec 27, 2006 at 09:08:56PM +0530, Suparna Bhattacharya wrote: > (2) Most of these other applications need the ability to process both > network events (epoll) and disk file AIO in the same loop. With POSIX AIO > they could at least sort of do this using signals (yeah, and all > associated > issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach Brown with > modifications from Jeff Moyer and me) addresses this problem for native > linux aio in a simple manner. Tridge has written a test harness to > try out the Samba4 event library modifications to use this. Jeff Moyer > has a modified version of pipetest for comparison. The real question here is which interface we want people to use for these "combined" applications. Evgeny is heavily pushing kevent for this while other seem to prefer integration epoll into the aio interface. (1) I must admit that kevent seems to be the cleaner way to support this, although I see some advantages for the aio variant. I do think however that we should not actively promote two differnt interfaces long term. (1) note that there is another problem with the current kevent interface, and that is that it duplicates the event infrastructure for it's underlying subsystems instead of reusing existing code (e.g. inotify, epoll, dio-aio). If we want kevent to be _the_ unified event system for Linux we need people to help out with straightening out these even provides as Evgeny seems to be unwilling/unable to do the work himself and the duplication is simply not acceptable. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Heads up on a series of AIO patchsets
Here is a quick attempt to summarize where we are heading with a bunch of AIO patches that I'll be posting over the next few days. Because a few of these patches have been hanging around for a bit, and have gone through bursts of iterations from time to time, falling dormant for other phases, the intent of this note is to help pull things together into some coherent picture for folks to comment on the patches and arrive at a decision of some sort. Native linux aio (i.e using libaio) is properly supported (in the sense of being asynchronous) only for files opened with O_DIRECT, which actually suffices for a major (and most visible) user of AIO, i.e. databases. However, for other types of users, e.g. Samba and other applications which use POSIX AIO, there have been several issues outstanding for a while: (1) The filesystem AIO patchset, attempts to address one part of the problem which is to make regular file IO, (without O_DIRECT) asynchronous (mainly the case of reads of uncached or partially cached files, and O_SYNC writes). (2) Most of these other applications need the ability to process both network events (epoll) and disk file AIO in the same loop. With POSIX AIO they could at least sort of do this using signals (yeah, and all associated issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach Brown with modifications from Jeff Moyer and me) addresses this problem for native linux aio in a simple manner. Tridge has written a test harness to try out the Samba4 event library modifications to use this. Jeff Moyer has a modified version of pipetest for comparison. (3) For glibc POSIX AIO to switch to using native AIO (instead of simulation with threads) kernel changes are needed to ensure aio sigevent notification and efficient listio support. Sebestian Dugue's patches for aio sigevent notifications has undergone several review iterations and seems to be in good shape now. His patch for lio_listio is pending discussion on whether to implement it as a separate syscall rather than an additional iocb command. Bharata B Rao has posted a patch with the syscall variation for review. (4) If glibc POSIX AIO switches completely to using native AIO then it would need basic AIO support for various file types - including sockets, pipes etc. Since it no longer will be simulating asynchronous behaviour with threads, it expects the underlying implementation to be asynchronous. Which is still an issue with native linux AIO, but I now think the problem to be tractable without a lot of additional work. While (1) helps the case for regular files, (2) now provides us an alternative infrastructure to simulate this in kernel using async epoll and O_NONBLOCK for all pollable fds, i.e. sockets, pipes etc. This should be good enough for working POSIX AIO. (5) That leaves just one more todo - implementing aio_fsync() in kernel. Please note that all of this work is not in conflict with kevent development. In fact it is my hope that progress made in getting these pieces of the puzzle in place would also help us along the long term goal of eventual convergence. Regards Suparna -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html