9] virtual block device driver

Carsten Otte Mon, 14 May 2007 06:24:06 -0700

Avi Kivity wrote:
> Is this the host file descriptor?  If so, we want to use something more 
> abstract (if the host side is in kernel, there will be no fd, or if the 
> device is implemented using >1 files (or <1 files)).
This is indeed the host file descriptor. Host userland uses sys_open 
to retrieve it. I see the beauty of having the remote side in the 
kernel, however I fail to see why we would want to reinvent the wheel: 
asynchronous IO with O_DIRECT (to avoid host caching) does just what 
we want. System call latency adds to the in-kernel approach here.


> We'll want scatter/gather here.
If you want scatter/gather, you have to do request merging in the 
guest and use the do_request function of the block queue. That is 
because in make_request you only have a single chunk at hand.
With do_request, you would do that request merging twice and get twice 
the block device plug latency for nothing. The host is the better 
place to do IO scheduling, because it can optimize over IO from all 
guest machines.
> 
>> +};
>> +
>> +struct vdisk_iocb_container {
>> +    struct iocb iocb;
>> +    struct bio *bio;
>> +    struct vdisk_device *dev;
>> +    int ctx_index;
>> +    unsigned long context;
>> +    struct list_head list;
>> +};
>> +
>> +// from aio_abi.h
>> +typedef enum io_iocb_cmd {
>> +    IO_CMD_PREAD = 0,
>> +    IO_CMD_PWRITE = 1,
>> +
>> +    IO_CMD_FSYNC = 2,
>> +    IO_CMD_FDSYNC = 3,
>> +
>> +    IO_CMD_POLL = 5,
>> +    IO_CMD_NOOP = 6,
>> +} io_iocb_cmd_t;
>>   
> 
> Our own commands, please.  We need READV, WRITEV, and a barrier for 
> journalling filesystems.  FDSYNC should work as a barrier, but is 
> wasteful.  The FSYNC/FDSYNC distinction is meaningless.  POLL/NOOP are 
> irrelevant.
This matches the api of libaio. If userland translates this into 
struct iocp, this makes sense. The barrier however is a general 
problem with this approach: today, the asynchronous IO userspace api 
does not allow to submit a barrier. Therefore, our make_request 
function in the guest returns -ENOTSUPP in the guest which forces the 
file system to wait for IO completion. This does sacrifice some 
performance. The right thing to do would be to add the possibility to 
submit a barrier to the kernel aio interface.

> We want to amortize the hypercall over multiple bios (but maybe you're 
> doing that -- I'm not 100% up to speed on the block layer)
We don't. We do one per bio, and I agree that this is a major 
disadvantage of this approach. Since IO is slow (compared to 
vmenter/vmexit), it pays back from to better IO scheduling. On our 
platform, this approach outperforms the scatter/gather do_request one.

> Any reason not to perform the work directly?
I owe you an answer to this one, I have to revisit our CVS logs to 
find out. We used to call from make_request without workqueue before, 
and I cannot remember why we changed that.

so long,
Carsten

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH/RFC 6/9] virtual block device driver

Reply via email to