On mercoledì 28 marzo 2007, Jeff Dike wrote: > [ This patch needs to get into 2.6.21, as it fixes a serious bug > introduced soon after 2.6.20 ] > > Commit 62f96cb01e8de7a5daee472e540f726db2801499 introduced per-devices > queues and locks, which was fine as far as it went, but left in place > a global which controlled access to submitting requests to the host. > This should have been made per-device as well, since it causes I/O > hangs when multiple block devices are in use. > > This patch fixes that by replacing the global with an activity flag in > the device structure in order to tell whether the queue is currently > being run. Finally that variable has a understandable name. However in a mail from Jens Axboe, titled: "Re: [uml-devel] [PATCH 06/11] uml ubd driver: ubd_io_lock usage fixup" , with Date: Mon, 30 Oct 2006 09:26:48 +0100, he suggested removing this flag altogether, so we may explore this for the future:
> > Add some comments about requirements for ubd_io_lock and expand its use. > > > > When an irq signals that the "controller" (i.e. another thread on the > > host, which does the actual requests and is the only one blocked on I/O > > on the host) has done some work, we call again the request function > > ourselves (do_ubd_request). > > > > We now do that with ubd_io_lock held - that's useful to protect against > > concurrent calls to elv_next_request and so on. > > Not only useful, required, as I think I complained about a year or more > ago :-) > > > XXX: Maybe we shouldn't call at all the request function. Input needed on > > this. Are we supposed to plug and unplug the queue? That code > > "indirectly" does that by setting a flag, called do_ubd, which makes the > > request function return (it's a residual of 2.4 block layer interface). > > Sometimes you need to. I'd probably just remove the do_ubd check and > always recall the request function when handling completions, it's > easier and safe. Anyway, the main speedups to do on the UBD driver are: * implement write barriers (so much less fsync) - this is performance killer n.1 * possibly to use the new 2.6 request layout with scatter/gather I/O, and vectorized I/O on the host * while at vectorizing I/O using async I/O * to avoid passing requests on pipes (n.2) - on fast disk I/O becomes cpu-bound. To make a different but related example, with a SpeedScale laptop, it's interesting to double CPU frequency and observe tuntap speed double too. (with 1GHz I get on TCP numbers like 150 Mbit/s - 100 Mbit/s, depending whether UML trasmits or receives data; with 2GHz double rates). Update: I now get 150Mbit / 200Mbit (Uml receives/Uml sends) at 1GHz, and still the double at 2Ghz. This is a different UML though. * using futexes instead of pipes for synchronization (required for previous one). -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/