Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-04-06 Thread Alexander Motin
Hi, Julian.

 I had some time to try it out today, but I'm still having issues:

I've just made experiment alike to your with making bhyve to work on top
of GEOM device instead of preferable dev mode of ZVOL. And I indeed
reproduced the problem. But the problem that I see is not related to the
block size. The block size is reported to the guest correctly as 4K, and
as I can see it works as such at least in FreeBSD guest.

The problem is in the way how bhyve inter-operates with block/GEOM
devices. bhyve sends requests to the kernel with preadv()/pwritev()
calls, specifying scatter/gather lists of buffer addresses provided by
the guest. But GEOM code can not handle scatter/gather lists, only
sequential buffer, and so single request is split into several. The
problem is that splitting happens according to scatter/gather elements,
and those elements in general case may not be multiple to the block
size, that is fatal for GEOM and any block device.

I am not yet sure how to fix this problem. The most straightforward way
is to copy the data at some point to collect elements of scatter/gather
list into something sequential to pass to GEOM, but that requires
additional memory allocation, and the copying is not free.  May be some
cases could be optimized to work without copying but with some clever
page mapping, but that seems absolutely not trivial.

-- 
Alexander Motin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org


Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-03-31 Thread Julian Hsiao

On 2015-03-27 09:46:50 +, Alexander Motin said:


[snip]

Also both virtio-blk and ahci-hd drivers now report to guest logical and
physical block sizes of underlying storage, that allow guests properly
align partitions and I/Os for best compatibility and performance.


Hi Alexander,

In a previous reply from Peter Grehan, he said that ahci-hd should 
already report the correct block size in 10.1.  I had some time to try 
it out today, but I'm still having issues:


$ zfs create \
   -o compression=off \
   -o primarycache=metadata \
   -o secondarycache=metadata \
   -o volblocksize=4096 \
   -o refreservation=none \
   -V 10G \
   zroot/usr/bhyve/test/img
$ geli init -B none -e AES-XTS -K test.key -l 128 -P -s 4096 \
   zvol/zroot/usr/bhyve/test/img
$ geli attach -p -k test.key zvol/zroot/usr/bhyve/test/img
[set up device map, grub-bhyve, etc.]
$ bhyve -A -c 1 -H -P -m 256 \
   -s 0:0,hostbridge \
   -s 1:0,ahci-hd,img.eli \
   -s 2:0,ahci-cd,ubuntu-14.10-server-amd64.iso \
   -s 31,lpc -l com1,stdio \
   test
[boot guest to recovery console]
$ fdisk -l /dev/sda
fdisk: cannot open /dev/sda: Input/output error

And syslog shows a lot of errors accessing sda.

Note that the actual HDD has 512-byte sectors, so perhaps bhyve is 
getting the sector size from the hardware and not from geli / ZFS?


Julian Hsiao


___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org


Re: Bhyve storage improvements

2015-03-27 Thread Alexander Motin
On 27.03.2015 18:47, John Nielsen wrote:
 Does anyone have plans (or know about any) to implement virtio-scsi support 
 in bhyve? That API does support TRIM and should retain most or all of the 
 low-overhead virtio goodness.

I was thinking about that (not really a plans yet, just some thoughts),
but haven't found a good motivation and understanding of whole possible
infrastructure.

I am not sure it worth to emulate SCSI protocol in addition to already
done ATA in ahci-hd and simple block in virtio-blk just to get another,
possibly faster then AHCI, block storage with TRIM/UNMAP.  Really good
SCSI disk emulation in CTL in kernel takes about 20K lines of code. It
is pointless to duplicate it, and may be complicated for administration
to just interface to it.  Indeed I've seen virtio-blk being faster then
ahci-hd in some tests, but those tests were highly synthetic.  I haven't
tested it on real workloads, but I have feeling that real difference may
be not that large.  If somebody wants to check -- more benchmarks are
highly welcome!  From the theoretical side I'd like to notice that both
ATA and SCSI protocols on guests go through additional ATA/SCSI
infrastructure (CAM in FreeBSD), absent in case pure block virtio-blk,
so they have some more overhead by definition.

Main potential benefit I see from using virtio-scsi is a possibility to
pass through to client not a block device, but some real SCSI device. It
can be some local DVD writer, or remote iSCSI storage. The last would be
especially interesting for large production installations. But the main
problem I see here is booting. To make user-level loader boot the kernel
from DVD or iSCSI, bhyve has to implement its own SCSI initiator, like
small second copy of CAM in user-level. Booting kernel from some other
local block storage and then attaching to remote iSCSI storage for data
can be much easier, but it is not convenient. It is possible to nt
connect to iSCSI directly from user-level, but to make kernel CAM do it,
and then make CAM provide both block layer for booting and SCSI layer
for virtio-scsi, but I am not sure that it is very good from security
point to make host system to see virtual disks. Though may be it could
work if CAM could block kernel/GEOM access to them, alike it is done for
ZVOLs now, supporting geom and dev modes. Though that complicates
CAM and the whole infrastructure.

-- 
Alexander Motin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org


Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-03-27 Thread John Nielsen
On Mar 27, 2015, at 10:47 AM, John Nielsen li...@jnielsen.net wrote:

 On Mar 27, 2015, at 3:46 AM, Alexander Motin m...@freebsd.org wrote:
 
 I've always assumed virtio driver  emulated driver so it didn't occur
 to me to try ahci-hd.
 
 I've just merged to FreeBSD stable/10 branch set of bhyve changes that
 should significantly improve situation in the storage area.
 
 virtio-blk driver was fixed to work asynchronously and not block virtual
 CPU, that should fix many problems with performance and interactivity.
 Both virtio-blk and ahci-hd drivers got ability to execute multiple (up
 to 8) requests same time, that should proportionally improve parallel
 random I/O performance on wide storages.  At this point virtio-blk is
 indeed faster then ahci-hd on high IOPS, and they both are faster then
 before.
 
 On the other side ahci-hd driver now got TRIM support to allow freeing
 unused space on backing ZVOL. Unfortunately there is no any TRIM/UNMAP
 support in virtio-blk API to allow the same.
 
 Also both virtio-blk and ahci-hd drivers now report to guest logical and
 physical block sizes of underlying storage, that allow guests properly
 align partitions and I/Os for best compatibility and performance.
 
 Mav, thank you very much for all this great work and for the concise summary. 
 TRIM on AHCI makes it compelling for a lot of use cases despite the probable 
 performance hit.
 
 Does anyone have plans (or know about any) to implement virtio-scsi support 
 in bhyve? That API does support TRIM and should retain most or all of the 
 low-overhead virtio goodness.

Okay, some belated googling reminded me that this has been listed as an open 
task in the last couple of FreeBSD quarterly status reports and discussed at 
one or more devsummits. I'd still be interested to know if anyone's actually 
contemplated or started doing the work though. :)

JN

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
freebsd-virtualization-unsubscr...@freebsd.org