On Wed, Jun 13, 2012 at 11:13 AM, mengcong <m...@linux.vnet.ibm.com> wrote: > seq-read seq-write rand-read rand-write > 8k 256k 8k 256k 8k 256k 8k 256k > ---------------------------------------------------------------------------- > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 > > unit: KB/s > seq-read/write = sequential read/write > rand-read/write = random read/write > 8k,256k are blocksize of the IO
What strikes me is how virtio-blk performs significantly worse than bare metal and tcm_vhost for seq-read/seq-write 8k. The good tcm_vhost results suggest that the overhead is not the virtio interface itself, since tcm_vhost implements virtio-scsi. To drill down on the tcm_vhost vs userspace performance gap we need virtio-scsi userspace results. QEMU needs to use the same block device as the tcm-vhost-iblock benchmark. Cong: Is it possible to collect the virtio-scsi userspace results using the same block device as tcm-vhost-iblock and -drive format=raw,aio=native,cache=none? Stefan