Re: Several bhyve quirks

2015-03-27 Thread Jason Tubnor
On 28 March 2015 at 10:49, Neel Natu  wrote:
>
> This is fixed in HEAD where the RTC device model defaults to 24-hour time.
>
>> 
>> suggests that I'm on the right track, but it doesn't explain the off-by-one
>> nor the (one time) multi-day offset.
>>
>
> The one-hour offset is a bug due to my interpretation of the 12-hour format.
>
> I am going to fix this in HEAD shortly but here is a patch for 10.1 and 
> earlier:
> https://people.freebsd.org/~neel/patches/bhyve_openbsd_rtc.patch
>

Thanks for this Neel.  I was trying to back port your original HEAD
patch into 10.1 but there were too many quirks to deal with into other
dependent libs.  I didn't have the skills to do this, it is
appreciated that you did it :-)  Thanks!
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Several bhyve quirks

2015-03-27 Thread Neel Natu
Hi Julian,

On Wed, Mar 25, 2015 at 2:24 AM, Julian Hsiao  wrote:
> Hi,
>
> I'm running bhyve on 10.1, mostly with OpenBSD (5.7) guests, and I ran into
> a few strange issues:
>
> 1. The guest RTC is several hours off every time I start bhyve.  The host
> RTC is set to UTC, and /etc/localtime on both the host and guests are set to
> US/Pacific (currently PDT).  I thought maybe bhyve is setting the RTC to the
> local time, and indeed changing TZ environment variable affects the guest's
> RTC.  However, with TZ=UTC the guest is still off by an hour, and to get the
> correct offset I set TZ='UTC+1'; perhaps something's not handling DST
> correctly?
>
> Also, one time the offset was mysteriously tens of hours off (i.e. the guest
> RTS is a day or two ahead), and the condition persisted across multiple host
> and guest reboots.  Unfortunately, the problem went away a few hours later
> and I was unable to reproduce it since.
>

The problem is that in 10.1 (and earlier) bhyve defaulted to a 12-hour
RTC format but some guests like OpenBSD and Linux assume that it is
configured in the 24-hour format.

The 12-hour format indicates PM time by setting the most significant
bit in the 'hour' byte. Since the guest is not prepared to mask this
bit it thinks that the time is 68 hours ahead of the actual time (but
only for PM times - everything goes back to normal during AM times).

This is fixed in HEAD where the RTC device model defaults to 24-hour time.

> 
> suggests that I'm on the right track, but it doesn't explain the off-by-one
> nor the (one time) multi-day offset.
>

The one-hour offset is a bug due to my interpretation of the 12-hour format.

I am going to fix this in HEAD shortly but here is a patch for 10.1 and earlier:
https://people.freebsd.org/~neel/patches/bhyve_openbsd_rtc.patch

> As an aside, the commit message implies that this only affects OpenBSD
> guest, when in fact this probably affects all guests (at least also Linux).
> Perhaps he meant you cannot configure OpenBSD to assume that the RTC is set
> to local time instead of UTC.
>
> 2. What's the preferred solution for minimizing guest clock drift in bhyve?
> Based on some Google searches, I run ntpd in the guests and set
> kern.timecounter.hardware=acpitimer0 instead of the default acpihpet0.
> acpitimer0 drifts by ~600 ppm while acpihpet0 drifts by ~1500 ppm; why?
>

I don't know but I am running experiments that I hope will provide some insight.

best
Neel

> 3. Even moderate guest disk I/O completely kills guest network performance.
> For example, whenever security(8) (security(7) in FreeBSD) runs, guest
> network throughput drops from 150+ Mbps to ~20 Mbps, and jitter from ping
> jumps from <0.01 ms to 100+ ms.  If I try to build something in the guest,
> then network becomes almost unusable.
>
> The network performance degradation only affects the guest that's generating
> the I/O; high I/O on guest B doesn't affect guest A, nor would high I/O on
> the host.
>
> I'm using both virtio-blk and virio-net drivers, and the guests' disk images
> are backed by zvol+geli.  Removing geli has no effect.
>
> There are some commits in CURRENT that suggests improved virtio performance,
> but I'm not comfortable running CURRENT.  Is there a workaround I could use
> for 10.1?
>
> 4. virtio-blk always reports the virtual disk as having 512-byte sectors,
> and so I get I/O errors on OpenBSD guests when the disk image is backed by
> zvol+geli with 4K sector size.  Curiously, this only seems to affect
> zvol+geli; with just zvol it seems to work.  Also, it works either way on
> Linux guests.
>
> ATM I changed the zvol / geli sector size to 512 bytes, which probably made
> #2 worse.  I think this bug / feature is addressed by:
> ,
> but again is there a workaround to force a specific sector size for 10.1?
>
> 5. This may be better directed at OpenBSD but I'll ask here anyway: if I
> enable virtio-rnd then OpenBSD would not boot with "couldn't map interrupt"
> error.  The kernel in bsd.rd will boot, but not the installed kernel (or the
> one built from STABLE; I forgot).  Again, Linux seems unaffected, but I
> couldn't tell if it's actually working.
>
> Julian Hsiao
>
>
> ___
> freebsd-virtualization@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to
> "freebsd-virtualization-unsubscr...@freebsd.org"
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Bhyve storage improvements

2015-03-27 Thread John Nielsen
On Mar 27, 2015, at 11:43 AM, Alexander Motin  wrote:

> On 27.03.2015 18:47, John Nielsen wrote:
>> Does anyone have plans (or know about any) to implement virtio-scsi support 
>> in bhyve? That API does support TRIM and should retain most or all of the 
>> low-overhead virtio goodness.
> 
> I was thinking about that (not really a plans yet, just some thoughts),
> but haven't found a good motivation and understanding of whole possible
> infrastructure.
> 
> I am not sure it worth to emulate SCSI protocol in addition to already
> done ATA in ahci-hd and simple block in virtio-blk just to get another,
> possibly faster then AHCI, block storage with TRIM/UNMAP.  Really good
> SCSI disk emulation in CTL in kernel takes about 20K lines of code. It
> is pointless to duplicate it, and may be complicated for administration
> to just interface to it.  Indeed I've seen virtio-blk being faster then
> ahci-hd in some tests, but those tests were highly synthetic.  I haven't
> tested it on real workloads, but I have feeling that real difference may
> be not that large.  If somebody wants to check -- more benchmarks are
> highly welcome!  From the theoretical side I'd like to notice that both
> ATA and SCSI protocols on guests go through additional ATA/SCSI
> infrastructure (CAM in FreeBSD), absent in case pure block virtio-blk,
> so they have some more overhead by definition.

Agreed, more testing is needed to see how big an effect having TRIM remain 
dependent on AHCI emulation would have on performance.

> Main potential benefit I see from using virtio-scsi is a possibility to
> pass through to client not a block device, but some real SCSI device. It
> can be some local DVD writer, or remote iSCSI storage. The last would be
> especially interesting for large production installations. But the main
> problem I see here is booting. To make user-level loader boot the kernel
> from DVD or iSCSI, bhyve has to implement its own SCSI initiator, like
> small second copy of CAM in user-level. Booting kernel from some other
> local block storage and then attaching to remote iSCSI storage for data
> can be much easier, but it is not convenient. It is possible to nt
> connect to iSCSI directly from user-level, but to make kernel CAM do it,
> and then make CAM provide both block layer for booting and SCSI layer
> for virtio-scsi, but I am not sure that it is very good from security
> point to make host system to see virtual disks. Though may be it could
> work if CAM could block kernel/GEOM access to them, alike it is done for
> ZVOLs now, supporting "geom" and "dev" modes. Though that complicates
> CAM and the whole infrastructure.

Yes, pass-through of disk devices opens up a number of possibilities. Would it 
be feasible to just have bhyve broker between a pass(4) device on the host and 
virtio_scsi(4) in the guest? That would require the guest devices (be they 
local disks, iSCSI LUNs, etc) be connected to the host but I'm not sure that's 
a huge concern. The host will always have a high level of access to the guest's 
data. (Plus, there's nothing preventing a guest from doing its own iSCSI, etc. 
after it boots). Using the existing kernel infrastructure (CAM, iSCSI 
initiator, etc) would also remove the need to duplicate any of that in 
userland, wouldn't it?

The user-level loader is necessary for now but once UEFI support exists in 
bhyve the external loader can go away. Any workarounds like you've described 
above would similarly be temporary.

Using Qemu+KVM on Linux as a comparison point, there are examples of both 
kernel-level and user-level access by the host to guest disks. Local disk 
images (be they raw or qcow2) are obviously manipulated by the Qemu process 
from userland. RBD (Ceph/RADOS network block device) is in userland. SRP (SCSI 
RDMA Protocol) is in kernel. There are a few ways to do host- and/or 
kernel-based iSCSI. There is also a userland option if you link Qemu against 
libiscsi when you build it. If we do ever want userland iSCSI support, libiscsi 
does claim to be "pure POSIX" and to have been tested on FreeBSD, among others.

JN

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Bhyve storage improvements

2015-03-27 Thread Alexander Motin
On 27.03.2015 18:47, John Nielsen wrote:
> Does anyone have plans (or know about any) to implement virtio-scsi support 
> in bhyve? That API does support TRIM and should retain most or all of the 
> low-overhead virtio goodness.

I was thinking about that (not really a plans yet, just some thoughts),
but haven't found a good motivation and understanding of whole possible
infrastructure.

I am not sure it worth to emulate SCSI protocol in addition to already
done ATA in ahci-hd and simple block in virtio-blk just to get another,
possibly faster then AHCI, block storage with TRIM/UNMAP.  Really good
SCSI disk emulation in CTL in kernel takes about 20K lines of code. It
is pointless to duplicate it, and may be complicated for administration
to just interface to it.  Indeed I've seen virtio-blk being faster then
ahci-hd in some tests, but those tests were highly synthetic.  I haven't
tested it on real workloads, but I have feeling that real difference may
be not that large.  If somebody wants to check -- more benchmarks are
highly welcome!  From the theoretical side I'd like to notice that both
ATA and SCSI protocols on guests go through additional ATA/SCSI
infrastructure (CAM in FreeBSD), absent in case pure block virtio-blk,
so they have some more overhead by definition.

Main potential benefit I see from using virtio-scsi is a possibility to
pass through to client not a block device, but some real SCSI device. It
can be some local DVD writer, or remote iSCSI storage. The last would be
especially interesting for large production installations. But the main
problem I see here is booting. To make user-level loader boot the kernel
from DVD or iSCSI, bhyve has to implement its own SCSI initiator, like
small second copy of CAM in user-level. Booting kernel from some other
local block storage and then attaching to remote iSCSI storage for data
can be much easier, but it is not convenient. It is possible to nt
connect to iSCSI directly from user-level, but to make kernel CAM do it,
and then make CAM provide both block layer for booting and SCSI layer
for virtio-scsi, but I am not sure that it is very good from security
point to make host system to see virtual disks. Though may be it could
work if CAM could block kernel/GEOM access to them, alike it is done for
ZVOLs now, supporting "geom" and "dev" modes. Though that complicates
CAM and the whole infrastructure.

-- 
Alexander Motin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-03-27 Thread John Nielsen
On Mar 27, 2015, at 10:47 AM, John Nielsen  wrote:

> On Mar 27, 2015, at 3:46 AM, Alexander Motin  wrote:
> 
>>> I've always assumed virtio driver > emulated driver so it didn't occur
>>> to me to try ahci-hd.
>> 
>> I've just merged to FreeBSD stable/10 branch set of bhyve changes that
>> should significantly improve situation in the storage area.
>> 
>> virtio-blk driver was fixed to work asynchronously and not block virtual
>> CPU, that should fix many problems with performance and interactivity.
>> Both virtio-blk and ahci-hd drivers got ability to execute multiple (up
>> to 8) requests same time, that should proportionally improve parallel
>> random I/O performance on wide storages.  At this point virtio-blk is
>> indeed faster then ahci-hd on high IOPS, and they both are faster then
>> before.
>> 
>> On the other side ahci-hd driver now got TRIM support to allow freeing
>> unused space on backing ZVOL. Unfortunately there is no any TRIM/UNMAP
>> support in virtio-blk API to allow the same.
>> 
>> Also both virtio-blk and ahci-hd drivers now report to guest logical and
>> physical block sizes of underlying storage, that allow guests properly
>> align partitions and I/Os for best compatibility and performance.
> 
> Mav, thank you very much for all this great work and for the concise summary. 
> TRIM on AHCI makes it compelling for a lot of use cases despite the probable 
> performance hit.
> 
> Does anyone have plans (or know about any) to implement virtio-scsi support 
> in bhyve? That API does support TRIM and should retain most or all of the 
> low-overhead virtio goodness.

Okay, some belated googling reminded me that this has been listed as an "open 
task" in the last couple of FreeBSD quarterly status reports and discussed at 
one or more devsummits. I'd still be interested to know if anyone's actually 
contemplated or started doing the work though. :)

JN

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: Bhyve storage improvements (was: Several bhyve quirks)

2015-03-27 Thread John Nielsen
On Mar 27, 2015, at 3:46 AM, Alexander Motin  wrote:

>> I've always assumed virtio driver > emulated driver so it didn't occur
>> to me to try ahci-hd.
> 
> I've just merged to FreeBSD stable/10 branch set of bhyve changes that
> should significantly improve situation in the storage area.
> 
> virtio-blk driver was fixed to work asynchronously and not block virtual
> CPU, that should fix many problems with performance and interactivity.
> Both virtio-blk and ahci-hd drivers got ability to execute multiple (up
> to 8) requests same time, that should proportionally improve parallel
> random I/O performance on wide storages.  At this point virtio-blk is
> indeed faster then ahci-hd on high IOPS, and they both are faster then
> before.
> 
> On the other side ahci-hd driver now got TRIM support to allow freeing
> unused space on backing ZVOL. Unfortunately there is no any TRIM/UNMAP
> support in virtio-blk API to allow the same.
> 
> Also both virtio-blk and ahci-hd drivers now report to guest logical and
> physical block sizes of underlying storage, that allow guests properly
> align partitions and I/Os for best compatibility and performance.

Mav, thank you very much for all this great work and for the concise summary. 
TRIM on AHCI makes it compelling for a lot of use cases despite the probable 
performance hit.

Does anyone have plans (or know about any) to implement virtio-scsi support in 
bhyve? That API does support TRIM and should retain most or all of the 
low-overhead virtio goodness.

JN

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Bhyve storage improvements (was: Several bhyve quirks)

2015-03-27 Thread Alexander Motin
> I've always assumed virtio driver > emulated driver so it didn't occur
> to me to try ahci-hd.

I've just merged to FreeBSD stable/10 branch set of bhyve changes that
should significantly improve situation in the storage area.

virtio-blk driver was fixed to work asynchronously and not block virtual
CPU, that should fix many problems with performance and interactivity.
Both virtio-blk and ahci-hd drivers got ability to execute multiple (up
to 8) requests same time, that should proportionally improve parallel
random I/O performance on wide storages.  At this point virtio-blk is
indeed faster then ahci-hd on high IOPS, and they both are faster then
before.

On the other side ahci-hd driver now got TRIM support to allow freeing
unused space on backing ZVOL. Unfortunately there is no any TRIM/UNMAP
support in virtio-blk API to allow the same.

Also both virtio-blk and ahci-hd drivers now report to guest logical and
physical block sizes of underlying storage, that allow guests properly
align partitions and I/Os for best compatibility and performance.

-- 
Alexander Motin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"