Re: [kvm-devel] regression breaks lowmem reserved RAM
On Tue, Apr 01 2008, Andrea Arcangeli wrote: Looking a bit closer into this regression the reason this can't be right is that dma_addr common default is BLK_BOUNCE_HIGH and most machines have less than 4G. So if you do: if (b_pfn = (min_t(u64, 0x, BLK_BOUNCE_HIGH) PAGE_SHIFT)) dma = 1 that will translate to: if (BLK_BOUNCE_HIGH = BLK_BOUNCE_HIGH) dma = 1 So for 99% of hardware this will trigger unnecessary GFP_DMA allocations and isa pooling operations. Also note how the 32bit code still does b_pfn blk_max_low_pfn. I guess this is what you were looking after. I didn't verify but as far as I can tell, this will stop the regression with isa dma operations at boot for 99% of blkdev/memory combinations out there and I guess this fixes the setups with 4G of ram and 32bit pci cards as well (this also retains symmetry with the 32bit code). Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Thanks Andrea, this looks much saner! -- Jens Axboe - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 4/6] virtio block driver
On Fri, Sep 21 2007, Rusty Russell wrote: On Thu, 2007-09-20 at 15:05 +0200, Jens Axboe wrote: On Thu, Sep 20 2007, Rusty Russell wrote: +static void end_dequeued_request(struct request *req, + struct request_queue *q, int uptodate) +{ + /* And so the insanity of the block layer infects us here. */ + int nsectors = req-hard_nr_sectors; + + if (blk_pc_request(req)) { + nsectors = (req-data_len + 511) 9; + if (!nsectors) + nsectors = 1; + } + if (end_that_request_first(req, uptodate, nsectors)) + BUG(); + add_disk_randomness(req-rq_disk); + end_that_request_last(req, uptodate); +} We have end_queued_request(), lets add end_dequeued_request(). Below. OK, thanks, I'll throw that in the mix and test... I've queued it for 2.6.24 as well, so should be in mainline in that time frame. + vblk-sg[0].page = virt_to_page(vbr-out_hdr); + vblk-sg[0].offset = offset_in_page(vbr-out_hdr); + vblk-sg[0].length = sizeof(vbr-out_hdr); + num = blk_rq_map_sg(q, vbr-req, vblk-sg+1); This wont work for chained sglists, but I gather (I'm so funny) that it wont be an issue for you. How large are your sglists? Hmm, potentially extremely large. What do I need to do for chained sglists? Not a lot, actually. You snipped the problematic part in your reply, though: vblk-sg[num+1].page = virt_to_page(vbr-in_hdr); which assumes that sg is a contigious piece of memory, for chained sg lists that isn't true. sg chaining will be in 2.6.24, so if you really do need large sglists, then that's the way to go. blk_rq_map_sg() maps correctly for you, no changes needed there. But you want to use sg_last() for adding to the end of the sglist. And then use sg_next() to retrieve the next sg segment instead of sg + 1, and for_each_sg() to loop over all segments. Just something to keep in mind, if you plan on merging this post 2.6.23. + if (!do_req(q, vblk, req)) { + /* Queue full? Wait. */ + blk_stop_queue(q); + break; + } Personally I think this bool stuff is foul. You return false/true, but still use ! to test. That is just more confusing than the canonical 0/1 good/bad return imho. Except 0/1 is not canonical in the kernel. Arguably, -errno/0 is canonical. OTOH, bool is clear. -errno/0, then. 1 is typically used for 'failure without specific error number' when -Exx doesn't apply. Like the above :-) But lets just agree to disagree on the bool. if do_req() fails, we assume the queue is full. I shall change the comment to that effect. Thanks! -- Jens Axboe - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 4/6] virtio block driver
On Fri, Sep 21 2007, Rusty Russell wrote: On Thu, 2007-09-20 at 14:27 +0200, Jens Axboe wrote: On Thu, Sep 20 2007, Rusty Russell wrote: The block driver uses scatter-gather lists with sg[0] being the request information (struct virtio_blk_outhdr) with the type, sector and inbuf id. The next N sg entries are the bio itself, then the last sg is the status byte. Whether the N entries are in or out depends on whether it's a read or a write. We accept the normal (SCSI) ioctls: they get handed through to the other side which can then handle it or reply that it's unsupported. It's not clear that this actually works in general, since I don't know if blk_pc_request() requests have an accurate rq_data_dir(). They should, if they imply a data transfer. OK, good. We rely on that to mark input vs output sg elements. I was wondering if there was some weird requests which did both input and output. There very soon will be bidirectional requests with both in and out elements, but they will be clearly marked as such (and the driver needs to be marked capable). So nothing to worry about for now. Although we try to reply -ENOTTY on unsupported commands, the block layer in its infinite wisdom suppressed the error so ioctl(fd, CDROMEJECT) returns success to userspace. How about ever submitting a patch for that, instead of just repeatedly complaining about it? To be fair, I've left the complaint in that same patch, you're just reading it repeatedly 8) That may be the case :-) I shall look through the code and see if I can figure out how to fix it. I'm assuming from your response that there's not some strange reason to preserve current behaviour. It surely sounds like a bug, if you issue ioctl(fd, CDROMEJECT), the driver sees it and returns -ENOTTY, but userland sees a 0 retval. So if you have time, please do poke at it a bit. -- Jens Axboe - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 4/6] virtio block driver
On Thu, Sep 20 2007, Rusty Russell wrote: +static void end_dequeued_request(struct request *req, + struct request_queue *q, int uptodate) +{ + /* And so the insanity of the block layer infects us here. */ + int nsectors = req-hard_nr_sectors; + + if (blk_pc_request(req)) { + nsectors = (req-data_len + 511) 9; + if (!nsectors) + nsectors = 1; + } + if (end_that_request_first(req, uptodate, nsectors)) + BUG(); + add_disk_randomness(req-rq_disk); + end_that_request_last(req, uptodate); +} We have end_queued_request(), lets add end_dequeued_request(). Below. + vblk-sg[0].page = virt_to_page(vbr-out_hdr); + vblk-sg[0].offset = offset_in_page(vbr-out_hdr); + vblk-sg[0].length = sizeof(vbr-out_hdr); + num = blk_rq_map_sg(q, vbr-req, vblk-sg+1); This wont work for chained sglists, but I gather (I'm so funny) that it wont be an issue for you. How large are your sglists? + if (!do_req(q, vblk, req)) { + /* Queue full? Wait. */ + blk_stop_queue(q); + break; + } Personally I think this bool stuff is foul. You return false/true, but still use ! to test. That is just more confusing than the canonical 0/1 good/bad return imho. diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index f6d3994..c1dc23a 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -3719,15 +3719,24 @@ void end_that_request_last(struct request *req, int uptodate) EXPORT_SYMBOL(end_that_request_last); static inline void __end_request(struct request *rq, int uptodate, -unsigned int nr_bytes) +unsigned int nr_bytes, int dequeue) { if (!end_that_request_chunk(rq, uptodate, nr_bytes)) { - blkdev_dequeue_request(rq); + if (dequeue) + blkdev_dequeue_request(rq); add_disk_randomness(rq-rq_disk); end_that_request_last(rq, uptodate); } } +static unsigned int rq_byte_size(struct request *rq) +{ + if (blk_fs_request(rq)) + return rq-hard_nr_sectors 9; + + return rq-data_len; +} + /** * end_queued_request - end all I/O on a queued request * @rq:the request being processed @@ -3741,18 +3750,29 @@ static inline void __end_request(struct request *rq, int uptodate, **/ void end_queued_request(struct request *rq, int uptodate) { - unsigned int nr_bytes; - - if (blk_fs_request(rq)) - nr_bytes = rq-hard_nr_sectors 9; - else - nr_bytes = rq-data_len; - - __end_request(rq, uptodate, nr_bytes); + __end_request(rq, uptodate, rq_byte_size(rq), 1); } EXPORT_SYMBOL(end_queued_request); /** + * end_dequeued_request - end all I/O on a dequeued request + * @rq:the request being processed + * @uptodate: error value or 0/1 uptodate flag + * + * Description: + * Ends all I/O on a request. The request must already have been + * dequeued using blkdev_dequeue_request(), as is normally the case + * for most drivers. + * + **/ +void end_dequeued_request(struct request *rq, int uptodate) +{ + __end_request(rq, uptodate, rq_byte_size(rq), 0); +} +EXPORT_SYMBOL(end_dequeued_request); + + +/** * end_request - end I/O on the current segment of the request * @rq:the request being processed * @uptodate: error value or 0/1 uptodate flag @@ -3773,7 +3793,7 @@ EXPORT_SYMBOL(end_queued_request); **/ void end_request(struct request *req, int uptodate) { - __end_request(req, uptodate, req-hard_cur_sectors 9); + __end_request(req, uptodate, req-hard_cur_sectors 9, 1); } EXPORT_SYMBOL(end_request); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 12ab4d4..4b24265 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -732,6 +732,7 @@ extern int end_that_request_chunk(struct request *, int, int); extern void end_that_request_last(struct request *, int); extern void end_request(struct request *, int); extern void end_queued_request(struct request *, int); +extern void end_dequeued_request(struct request *, int); extern void blk_complete_request(struct request *); /* -- Jens Axboe - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] More virtio users
On Mon, Jun 11 2007, Rusty Russell wrote: On Mon, 2007-06-11 at 08:41 +0200, Jens Axboe wrote: On Sun, Jun 10 2007, Rusty Russell wrote: On Sun, 2007-06-10 at 11:16 +0300, Avi Kivity wrote: Rusty Russell wrote: Lguest doesn't have a framebuffer, so maybe this is a good thing for me to hack on, but I promised myself I'd finish NAPI for the net device, and tag for block device first. If you're touching the block device, passing a request's io priority to the host can be useful. OK, here's the interdiff. I still don't handle non-fs requests, but I haven't seen any yet. I should probably BUG_ON() there and wait for Jens to scream... Ehm no, that would certainly cause screaming :-) Checking for blk_fs_request() and terminating the request if you don't know how to handle it is the correct thing to do, a BUG() would definitely not be. So the problem is that I'd like to handle all of them, but I'm not clear what requests my device can get. I can't see a source of any other type of request. The other main request type is blk_pc_request(). In the data setup it's indentical to blk_fs_request(), there's a bio chain off -bio. It's a byte granularity entity though, so you should check -data_len for the size of it. -cmd[] holds a SCSI cdb, which is the command you are supposed to handle. + blk_queue_prep_rq(vblk-disk-queue, blk_queue_start_tag); + is quite a novel way, I actually had to look that code up to check whether it was correct. I'd much prefer a little wrapper around that, OK, but I got it from the blk_queue_start_tag documentation: * Description: *This can either be used as a stand-alone helper, or possibly be *assigned as the queue prep_rq_fn (in which case struct request *automagically gets a tag assigned). Note that this function *assumes that any type of request can be queued! OK, bad documentation, I'll make a note to fix that! Sorry about that. I'm unsure on the whole ordered tag idea, though. It seems like there are never be multiple requests in the queue with the same tag, so it effectively forces my (userspace) virtio server to serialize and fdatasync on every write request: if (outhdr-type outhdr-tag != vblk-last_tag) { while (vblk-in_progress) handle_io_finish(fd, dev); fdatasync(fd); vblk-last_tag = outhdr-tag; } In fact, AFAICT any implementation of ordered tags will be forced into serial order. Am I better off telling the block layer to drain and flush instead of ordered tags? I can then push that through to the virtio server. Perhaps you are misunderstanding what the tag is? The tag is a unique identifier for a pending request, so you will by definition never have requests sharing a tag value. But the tag is not to be considered as ordered, unless it has the barrier flag set as well. So you only need to serialize on the device side when blk_barrier_rq() is true. -- Jens Axboe - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] More virtio users
On Tue, Jun 12 2007, Rusty Russell wrote: On Mon, 2007-06-11 at 09:33 +0200, Jens Axboe wrote: On Mon, Jun 11 2007, Rusty Russell wrote: So the problem is that I'd like to handle all of them, but I'm not clear what requests my device can get. I can't see a source of any other type of request. The other main request type is blk_pc_request(). In the data setup it's indentical to blk_fs_request(), there's a bio chain off -bio. It's a byte granularity entity though, so you should check -data_len for the size of it. -cmd[] holds a SCSI cdb, which is the command you are supposed to handle. SCSI? I'm even more lost now. Q: So what *are* the commands? They are SCSI commands! Q: Who puts them in my queue? If you want to support SG_IO for instance, you'd have to deal with SCSI commands. I have *never* seen anything but an fs request come through to my driver. You probably only did basic IO to it. Perhaps you are misunderstanding what the tag is? The tag is a unique identifier for a pending request, so you will by definition never have requests sharing a tag value. Yes, I started to suspect that. But the tag is not to be considered as ordered, unless it has the barrier flag set as well. So you only need to serialize on the device side when blk_barrier_rq() is true. blk_barrier_rq(req) is never true. I put a BUG_ON(blk_barrier_rq(req)) in my code and booted. This is using the device as a root ext3 filesystem. I reverted all the tag changes, still no barriers. I added blk_queue_ordered(vblk-disk-queue, QUEUE_ORDERED_DRAIN, NULL);, still no barriers. -o barrier=1 for ext3, it doesn't use barriers by default. Or barrier=flush for reiserfs. XFS defaults to barriers on. -- Jens Axboe - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH RFC 3/3] virtio infrastructure: example block driver
On Mon, Jun 04 2007, Carsten Otte wrote: Jens Axboe wrote: On Fri, Jun 01 2007, Carsten Otte wrote: With regard to compute power needed, almost none. The penalty is latency, not overhead: A small request may sit on the request queue to wait for other work to arrive until the queue gets unplugged. This penality is compensated by the benefit of a good chance that more requests will be merged during this time period. If we have this method both in host and guest, we have twice the penalty with no added benefit. I don't buy that argument. We can easily expose the unplug delay, so you can kill it at what ever level you want. Or you could just do it in the driver right now, but that is a bit hackish. That would be preferable if the device driver can chose the unplug delay, or even better it could be (guest)sysfs tuneable. Right. We probably should make it sysfs configurable in any case, right now it's a (somewhat) policy decision in the kernel with the delay and unplug depth. -- Jens Axboe - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH RFC 3/3] virtio infrastructure: example block driver
On Mon, Jun 04 2007, Rusty Russell wrote: On Mon, 2007-06-04 at 15:43 +0200, Jens Axboe wrote: On Mon, Jun 04 2007, Carsten Otte wrote: Jens Axboe wrote: On Fri, Jun 01 2007, Carsten Otte wrote: With regard to compute power needed, almost none. The penalty is latency, not overhead: A small request may sit on the request queue to wait for other work to arrive until the queue gets unplugged. This penality is compensated by the benefit of a good chance that more requests will be merged during this time period. If we have this method both in host and guest, we have twice the penalty with no added benefit. I don't buy that argument. We can easily expose the unplug delay, so you can kill it at what ever level you want. Or you could just do it in the driver right now, but that is a bit hackish. That would be preferable if the device driver can chose the unplug delay, or even better it could be (guest)sysfs tuneable. Right. We probably should make it sysfs configurable in any case, right now it's a (somewhat) policy decision in the kernel with the delay and unplug depth. The danger is that it's just another knob noone knows how to use. Perhaps simply setting it to 0 for the noop scheduler will cover all known cases? Most people should not fiddle with it, the defaults are there for good reason. I can provide a blk_queue_unplug_thresholds(q, depth, delay) helper that you could use for the virtualized drivers, perhaps that would be better for that use? -- Jens Axboe - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH RFC 3/3] virtio infrastructure: example block driver
On Mon, Jun 04 2007, Carsten Otte wrote: Jens Axboe wrote: Most people should not fiddle with it, the defaults are there for good reason. I can provide a blk_queue_unplug_thresholds(q, depth, delay) helper that you could use for the virtualized drivers, perhaps that would be better for that use? Yea, we should'nt change the defaults without a good reason. That would change things for all device drivers. This interface provides all functionality we need. I think we need a knob in /sys/block/mydevice/queue/ in addition to that. Something like this, totally untested (but trivial, so it should work :-) diff --git a/block/elevator.c b/block/elevator.c index ce866eb..81e2a2d 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -638,7 +638,7 @@ void elv_insert(request_queue_t *q, struct request *rq, int where) int nrq = q-rq.count[READ] + q-rq.count[WRITE] - q-in_flight; - if (nrq = q-unplug_thresh) + if (nrq = q-unplug_thresh || !q-unplug_delay) __generic_unplug_device(q); } } diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c index 6b5173a..aaefb32 100644 --- a/block/ll_rw_blk.c +++ b/block/ll_rw_blk.c @@ -785,6 +785,30 @@ void blk_queue_dma_alignment(request_queue_t *q, int mask) EXPORT_SYMBOL(blk_queue_dma_alignment); /** + * blk_queue_unplug_threshold - set automatic unplug thresholds for the queue + * @q: the request queue for the device + * @depth: the queue depth at which to do unplug + * @delay: maximum unplug timer delay + * + * Description: + *Set the desired unplug depth/threshold and delay for a given queue. + *The block layer has a set of good defaults for this, so this function + *should ONLY be used by drivers for virtualized environments, where + *you could potentially have several layers of queues that each do their + *own delay. + * + *If in doubt, don't use this function! The settings can also be + *tweaked from sysfs. + * + **/ +void blk_queue_unplug_threshold(request_queue_t *q, unsigned int depth, + unsigned long delay) +{ + q-unplug_thresh = depth; + q-unplug_delay = delay; +} + +/** * blk_queue_find_tag - find a request by its tag and queue * @q: The request queue for the device * @tag: The tag of the request @@ -1550,7 +1574,8 @@ void blk_plug_device(request_queue_t *q) return; if (!test_and_set_bit(QUEUE_FLAG_PLUGGED, q-queue_flags)) { - mod_timer(q-unplug_timer, jiffies + q-unplug_delay); + if (q-unplug_delay) + mod_timer(q-unplug_timer, jiffies + q-unplug_delay); blk_add_trace_generic(q, NULL, 0, BLK_TA_PLUG); } } @@ -3975,6 +4000,54 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page) return queue_var_show(max_hw_sectors_kb, (page)); } +static ssize_t queue_unplug_delay_show(struct request_queue *q, char *page) +{ + return queue_var_show(q-unplug_delay, page); +} + +/* + * We don't bother rearming a running timer. It's just not worth it, the + * next unplug will get it right. + */ +static ssize_t queue_unplug_delay_store(struct request_queue *q, + const char *page, size_t count) +{ + unsigned long delay; + int ret; + + ret = queue_var_store(delay, page, count); + + spin_lock_irq(q-queue_lock); + q-unplug_delay = msecs_to_jiffies(delay); + spin_unlock_irq(q-queue_lock); + + return ret; +} + +static ssize_t queue_unplug_depth_show(struct request_queue *q, char *page) +{ + return queue_var_show(q-unplug_thresh, page); +} + +/* + * We don't bother unplugging if we depth was reduced and we just happened + * to have a current queue depth of somewhere in between the old and new + * value. + */ +static ssize_t queue_unplug_depth_store(struct request_queue *q, + const char *page, size_t count) +{ + unsigned long depth; + int ret; + + ret = queue_var_store(depth, page, count); + + spin_lock_irq(q-queue_lock); + q-unplug_thresh = depth; + spin_unlock_irq(q-queue_lock); + + return ret; +} static struct queue_sysfs_entry queue_requests_entry = { .attr = {.name = nr_requests, .mode = S_IRUGO | S_IWUSR }, @@ -4005,12 +4078,26 @@ static struct queue_sysfs_entry queue_iosched_entry = { .store = elv_iosched_store, }; +static struct queue_sysfs_entry queue_unplug_depth_entry = { + .attr = {.name = unplug_depth, .mode = S_IRUGO | S_IWUSR }, + .show = queue_unplug_depth_show, + .store = queue_unplug_depth_store, +}; + +static struct queue_sysfs_entry queue_unplug_delay_entry = { + .attr = {.name = unplug_delay_ms, .mode = S_IRUGO | S_IWUSR }, + .show = queue_unplug_delay_show, + .store = queue_unplug_delay_store, +}; + static struct
Re: [kvm-devel] [PATCH RFC 3/3] virtio infrastructure: example block driver
On Fri, Jun 01 2007, Carsten Otte wrote: Rusty Russell wrote: Now my lack of block-layer knowledge is showing. I would have thought that if we want to do things like ionice(1) to work, we have to do some guest scheduling or pass that information down to the host. Yea that would only work on the host: one can use ionice to set the io niceness of the entire guest. Individual processes inside the guest are opaque to the host, and thus are opaque to its io scheduler. It seems preferable to do that in the host, especially when requests of multiple guests end up on the same physical media (shared access, or partitioned). What's the overhead in doing both? With regard to compute power needed, almost none. The penalty is latency, not overhead: A small request may sit on the request queue to wait for other work to arrive until the queue gets unplugged. This penality is compensated by the benefit of a good chance that more requests will be merged during this time period. If we have this method both in host and guest, we have twice the penalty with no added benefit. I don't buy that argument. We can easily expose the unplug delay, so you can kill it at what ever level you want. Or you could just do it in the driver right now, but that is a bit hackish. On the other hand, if we choose to hook into q-make_request_fn, we do end up doing far more hypercalls: bios do not get merged on the guest side. We must do a hypercall per bio in this scenario, or we'll end up adding latency again. In contrast, we can submit the entire content of the queue with a single hypercall when calling at do_request(). A third way out of that situation is to do queueing between guest and host: on the first bio, guest does a hypercall. When the next bio arrives, guest sees that the host has not finished processing the queue yet and pushes another buffer without doing a notification. We've also implemented this, with the result that our host stack was quick enough to practically always process the bio before the guest had the chance to submit another one. Performance was a nightmare, so we discontinued pursuing that idea. I'd greatly prefer maintaing a request_fn interface for this. The make_request_fn/request_fn call ratio is at least 1, and typically a lot larger (4kb blocks, 128kb request not uncommon - 32). -- Jens Axboe - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel