On Fri, Dec 12, 2008 at 12:54:21PM +0100, Jens Axboe wrote: > I agree completely. The buffered aio patches got pretty involved though, > it wasn't real pretty in the end. So it never got merged. Looks like the > most realistic way forward is some variant of syslet (or the acall stuff > that Zach has been working on), which is largely a cop out and will > never perform as well.
It'll at least perform better a brand new userland pool of threads for each task that needs aio functionality, and it can be later optimized if we want ;). But I'm surprised, the aio patches in 2.4 were very clean, we didn't have to break filesystems, it was really a nice done work, enterprise quality as demonstrated by the several databases running on it for years. Ironically the O_DIRECT part didn't work at the time... because effectively the O_DIRECT part is more difficult. So 2.6 has the hard stuff done and misses the simpler stuff. I guess the simpler stuff is harder to merge as it has more users. Well I hope it'll be fixed... for kvm/qemu we definitely require aio for buffered reads too (buffered writes aren't a big deal but reads are). For the parent images it makes sense to run them in buffered mode even on servers using O_DIRECT, so basically we can't use linux-aio until this is fixed somehow. In the meantime I think it'd be better to -EINVAL (so the userland thread can fallback to userland thread pool) instead of just behaving synchronously that can break GUI and interactive behavior... > I added CLONE_IO some time ago to avoid that, so it's perfectly possible > to share cfq io contexts with threads or processes even in userspace! It's available in recent kernels I see! so the fix is easy. Only problem is how to pass CLONE_IO to pthread_create... We'll have to make a linux-only change and call clone by hand under some #ifdef CLONE_IO. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html