On mercoledì 28 marzo 2007, Jeff Dike wrote:
> [ This patch needs to get into 2.6.21, as it fixes a serious bug
> introduced soon after 2.6.20 ]
>
> Commit 62f96cb01e8de7a5daee472e540f726db2801499 introduced per-devices
> queues and locks, which was fine as far as it went, but left in place
> a global which controlled access to submitting requests to the host.
> This should have been made per-device as well, since it causes I/O
> hangs when multiple block devices are in use.
>
> This patch fixes that by replacing the global with an activity flag in
> the device structure in order to tell whether the queue is currently
> being run.
Finally that variable has a understandable name. However in a mail from Jens 
Axboe, titled:
"Re: [uml-devel] [PATCH 06/11] uml ubd driver: ubd_io_lock usage fixup" , with 
Date: Mon, 30 Oct 2006 09:26:48 +0100, he suggested removing this flag 
altogether, so we may explore this for the future:

> > Add some comments about requirements for ubd_io_lock and expand its use.
> >
> > When an irq signals that the "controller" (i.e. another thread on the
> > host, which does the actual requests and is the only one blocked on I/O
> > on the host) has done some work, we call again the request function
> > ourselves (do_ubd_request).
> >
> > We now do that with ubd_io_lock held - that's useful to protect against
> > concurrent calls to elv_next_request and so on.
>
> Not only useful, required, as I think I complained about a year or more
> ago :-)
>
> > XXX: Maybe we shouldn't call at all the request function. Input needed on
> >  this. Are we supposed to plug and unplug the queue? That code
> > "indirectly" does that by setting a flag, called do_ubd, which makes the
> > request function return (it's a residual of 2.4 block layer interface).
>
> Sometimes you need to. I'd probably just remove the do_ubd check and
> always recall the request function when handling completions, it's
> easier and safe.

Anyway, the main speedups to do on the UBD driver are:
* implement write barriers (so much less fsync) - this is performance killer 
n.1

* possibly to use the new 2.6 request layout with scatter/gather I/O, and 
vectorized I/O on the host
* while at vectorizing I/O using async I/O

* to avoid passing requests on pipes (n.2) - on fast disk I/O becomes 
cpu-bound.
To make a different but related example, with a SpeedScale laptop, it's 
interesting to double CPU frequency and observe tuntap speed double too. 
(with 1GHz I get on TCP numbers like 150 Mbit/s - 100 Mbit/s, depending 
whether UML trasmits or receives data; with 2GHz double rates).
Update: I now get 150Mbit / 200Mbit (Uml receives/Uml sends) at 1GHz, and 
still the double at 2Ghz.
This is a different UML though.

* using futexes instead of pipes for synchronization (required for previous 
one).

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to