Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben: > this patch finally introduces multiread support to virtio-blk. While > multiwrite support was there for a long time, read support was missing. > > To achieve this the patch does several things which might need further > explanation: > > - the whole merge and multireq logic is moved from block.c into > virtio-blk. This is move is a preparation for directly creating a > coroutine out of virtio-blk. > > - requests are only merged if they are strictly sequential, and no > longer sorted. This simplification decreases overhead and reduces > latency. It will also merge some requests which were unmergable before. > > The old algorithm took up to 32 requests, sorted them and tried to merge > them. The outcome was anything between 1 and 32 requests. In case of > 32 requests there were 31 requests unnecessarily delayed. > > On the other hand let's imagine e.g. 16 unmergeable requests followed > by 32 mergable requests. The latter 32 requests would have been split > into two 16 byte requests. > > Last the simplified logic allows for a fast path if we have only a > single request in the multirequest. In this case the request is sent as > ordinary request without multireq callbacks. > > As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number of > merged requests is in the same order while the write latency is obviously > decreased by several percent. > > cmdline: > qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom > ubuntu-14.04.1-server-amd64.iso \ > -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor > stdio > > Before: > virtio0: > rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 > wr_operations=67979 > flush_operations=15335 wr_total_time_ns=540428034217 > rd_total_time_ns=11110520068 > flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531 > > After: > virtio0: > rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 > wr_operations=68578 > flush_operations=15368 wr_total_time_ns=437030089565 > rd_total_time_ns=9836288815 > flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615 > > Some first numbers of improved read performance while booting: > > The Ubuntu 14.04.1 vServer from above: > virtio0: > rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26 > flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478 > flush_total_time_ns=3075496 rd_merged=742 wr_merged=0 > > Windows 2012R2 (booted from iSCSI): > virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 > wr_operations=360 > flush_operations=68 wr_total_time_ns=34344992718 > rd_total_time_ns=134386844669 > flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 > > Signed-off-by: Peter Lieven <p...@kamp.de>
Looks pretty good. The only thing I'm still unsure about are possible integer overflows in the merging logic. Maybe you can have another look there (ideally not only the places I commented on below, but the whole function). > @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, > MultiReqBuffer *mrb) > iov_from_buf(in_iov, in_num, 0, serial, size); > virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); > virtio_blk_free_request(req); > - } else if (type & VIRTIO_BLK_T_OUT) { > - qemu_iovec_init_external(&req->qiov, iov, out_num); > - virtio_blk_handle_write(req, mrb); > - } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) { > - /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */ > - qemu_iovec_init_external(&req->qiov, in_iov, in_num); > - virtio_blk_handle_read(req); > - } else { > + break; > + } > + case VIRTIO_BLK_T_IN: > + case VIRTIO_BLK_T_OUT: > + { > + bool is_write = type & VIRTIO_BLK_T_OUT; > + int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), > + &req->out.sector); > + int max_transfer_length = blk_get_max_transfer_length(req->dev->blk); > + int nb_sectors = 0; > + bool merge = true; > + > + if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) > { > + virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); > + virtio_blk_free_request(req); > + return; > + } > + > + if (is_write) { > + qemu_iovec_init_external(&req->qiov, iov, out_num); > + trace_virtio_blk_handle_write(req, sector_num, > + req->qiov.size / BDRV_SECTOR_SIZE); > + } else { > + qemu_iovec_init_external(&req->qiov, in_iov, in_num); > + trace_virtio_blk_handle_read(req, sector_num, > + req->qiov.size / BDRV_SECTOR_SIZE); > + } > + > + nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; qiov.size is controlled by the guest, and nb_sectors is only an int. Are you sure that this can't overflow? > + block_acct_start(blk_get_stats(req->dev->blk), > + &req->acct, req->qiov.size, > + is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ); > + > + /* merge would exceed maximum number of requests or IOVs */ > + if (mrb->num_reqs == MAX_MERGE_REQS || > + mrb->niov + req->qiov.niov + 1 > IOV_MAX) { > + merge = false; > + } > + > + /* merge would exceed maximum transfer length of backend device */ > + if (max_transfer_length && > + mrb->nb_sectors + nb_sectors > max_transfer_length) { > + merge = false; > + } > + > + /* requests are not sequential */ > + if (mrb->num_reqs && mrb->sector_num + mrb->nb_sectors != > sector_num) { > + merge = false; > + } > + > + /* if we switch from read to write or vise versa we should submit > + * outstanding requests to avoid unnecessary and potential long > delays. > + * Furthermore we share the same struct for read and write merging so > + * submission is a must here. */ > + if (is_write != mrb->is_write) { > + merge = false; > + } > + > + if (!merge) { > + virtio_submit_multireq(req->dev->blk, mrb); > + } > + > + if (mrb->num_reqs == 0) { > + mrb->sector_num = sector_num; > + mrb->is_write = is_write; > + } > + > + mrb->nb_sectors += req->qiov.size / BDRV_SECTOR_SIZE; This one could also be problematic with respect to overflows. > + mrb->reqs[mrb->num_reqs] = req; > + mrb->niov += req->qiov.niov; > + mrb->num_reqs++; > + break; > + } > + default: > virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); > virtio_blk_free_request(req); > } Kevin