Re: Bhyve storage improvements (was: Several bhyve quirks)
Hi, Julian. I had some time to try it out today, but I'm still having issues: I've just made experiment alike to your with making bhyve to work on top of GEOM device instead of preferable dev mode of ZVOL. And I indeed reproduced the problem. But the problem that I see is not related to the block size. The block size is reported to the guest correctly as 4K, and as I can see it works as such at least in FreeBSD guest. The problem is in the way how bhyve inter-operates with block/GEOM devices. bhyve sends requests to the kernel with preadv()/pwritev() calls, specifying scatter/gather lists of buffer addresses provided by the guest. But GEOM code can not handle scatter/gather lists, only sequential buffer, and so single request is split into several. The problem is that splitting happens according to scatter/gather elements, and those elements in general case may not be multiple to the block size, that is fatal for GEOM and any block device. I am not yet sure how to fix this problem. The most straightforward way is to copy the data at some point to collect elements of scatter/gather list into something sequential to pass to GEOM, but that requires additional memory allocation, and the copying is not free. May be some cases could be optimized to work without copying but with some clever page mapping, but that seems absolutely not trivial. -- Alexander Motin ___ freebsd-virtualization@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to freebsd-virtualization-unsubscr...@freebsd.org
Re: Bhyve storage improvements (was: Several bhyve quirks)
On 2015-03-27 09:46:50 +, Alexander Motin said: [snip] Also both virtio-blk and ahci-hd drivers now report to guest logical and physical block sizes of underlying storage, that allow guests properly align partitions and I/Os for best compatibility and performance. Hi Alexander, In a previous reply from Peter Grehan, he said that ahci-hd should already report the correct block size in 10.1. I had some time to try it out today, but I'm still having issues: $ zfs create \ -o compression=off \ -o primarycache=metadata \ -o secondarycache=metadata \ -o volblocksize=4096 \ -o refreservation=none \ -V 10G \ zroot/usr/bhyve/test/img $ geli init -B none -e AES-XTS -K test.key -l 128 -P -s 4096 \ zvol/zroot/usr/bhyve/test/img $ geli attach -p -k test.key zvol/zroot/usr/bhyve/test/img [set up device map, grub-bhyve, etc.] $ bhyve -A -c 1 -H -P -m 256 \ -s 0:0,hostbridge \ -s 1:0,ahci-hd,img.eli \ -s 2:0,ahci-cd,ubuntu-14.10-server-amd64.iso \ -s 31,lpc -l com1,stdio \ test [boot guest to recovery console] $ fdisk -l /dev/sda fdisk: cannot open /dev/sda: Input/output error And syslog shows a lot of errors accessing sda. Note that the actual HDD has 512-byte sectors, so perhaps bhyve is getting the sector size from the hardware and not from geli / ZFS? Julian Hsiao ___ freebsd-virtualization@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to freebsd-virtualization-unsubscr...@freebsd.org
Re: Bhyve storage improvements
On 27.03.2015 18:47, John Nielsen wrote: Does anyone have plans (or know about any) to implement virtio-scsi support in bhyve? That API does support TRIM and should retain most or all of the low-overhead virtio goodness. I was thinking about that (not really a plans yet, just some thoughts), but haven't found a good motivation and understanding of whole possible infrastructure. I am not sure it worth to emulate SCSI protocol in addition to already done ATA in ahci-hd and simple block in virtio-blk just to get another, possibly faster then AHCI, block storage with TRIM/UNMAP. Really good SCSI disk emulation in CTL in kernel takes about 20K lines of code. It is pointless to duplicate it, and may be complicated for administration to just interface to it. Indeed I've seen virtio-blk being faster then ahci-hd in some tests, but those tests were highly synthetic. I haven't tested it on real workloads, but I have feeling that real difference may be not that large. If somebody wants to check -- more benchmarks are highly welcome! From the theoretical side I'd like to notice that both ATA and SCSI protocols on guests go through additional ATA/SCSI infrastructure (CAM in FreeBSD), absent in case pure block virtio-blk, so they have some more overhead by definition. Main potential benefit I see from using virtio-scsi is a possibility to pass through to client not a block device, but some real SCSI device. It can be some local DVD writer, or remote iSCSI storage. The last would be especially interesting for large production installations. But the main problem I see here is booting. To make user-level loader boot the kernel from DVD or iSCSI, bhyve has to implement its own SCSI initiator, like small second copy of CAM in user-level. Booting kernel from some other local block storage and then attaching to remote iSCSI storage for data can be much easier, but it is not convenient. It is possible to nt connect to iSCSI directly from user-level, but to make kernel CAM do it, and then make CAM provide both block layer for booting and SCSI layer for virtio-scsi, but I am not sure that it is very good from security point to make host system to see virtual disks. Though may be it could work if CAM could block kernel/GEOM access to them, alike it is done for ZVOLs now, supporting geom and dev modes. Though that complicates CAM and the whole infrastructure. -- Alexander Motin ___ freebsd-virtualization@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to freebsd-virtualization-unsubscr...@freebsd.org
Re: Bhyve storage improvements (was: Several bhyve quirks)
On Mar 27, 2015, at 10:47 AM, John Nielsen li...@jnielsen.net wrote: On Mar 27, 2015, at 3:46 AM, Alexander Motin m...@freebsd.org wrote: I've always assumed virtio driver emulated driver so it didn't occur to me to try ahci-hd. I've just merged to FreeBSD stable/10 branch set of bhyve changes that should significantly improve situation in the storage area. virtio-blk driver was fixed to work asynchronously and not block virtual CPU, that should fix many problems with performance and interactivity. Both virtio-blk and ahci-hd drivers got ability to execute multiple (up to 8) requests same time, that should proportionally improve parallel random I/O performance on wide storages. At this point virtio-blk is indeed faster then ahci-hd on high IOPS, and they both are faster then before. On the other side ahci-hd driver now got TRIM support to allow freeing unused space on backing ZVOL. Unfortunately there is no any TRIM/UNMAP support in virtio-blk API to allow the same. Also both virtio-blk and ahci-hd drivers now report to guest logical and physical block sizes of underlying storage, that allow guests properly align partitions and I/Os for best compatibility and performance. Mav, thank you very much for all this great work and for the concise summary. TRIM on AHCI makes it compelling for a lot of use cases despite the probable performance hit. Does anyone have plans (or know about any) to implement virtio-scsi support in bhyve? That API does support TRIM and should retain most or all of the low-overhead virtio goodness. Okay, some belated googling reminded me that this has been listed as an open task in the last couple of FreeBSD quarterly status reports and discussed at one or more devsummits. I'd still be interested to know if anyone's actually contemplated or started doing the work though. :) JN ___ freebsd-virtualization@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to freebsd-virtualization-unsubscr...@freebsd.org