Re: [Qemu-block] [PATCH v6 26/39] blockdev: Allow more options for BB-less BDS tree

2015-10-15 Thread Kevin Wolf
Am 12.10.2015 um 22:00 hat Max Reitz geschrieben:
> Most of the options which blockdev_init() parses for both the
> BlockBackend and the root BDS are valid for just the root BDS as well
> (e.g. read-only). This patch allows specifying these options even if not
> creating a BlockBackend.
> 
> Signed-off-by: Max Reitz 
> ---
>  blockdev.c | 160 
> ++---
>  1 file changed, 154 insertions(+), 6 deletions(-)

> +
> +/* disk I/O throttling */
> +if (throttle_enabled()) {
> +if (!throttling_group) {
> +throttling_group = bdrv_get_node_name(bs);
> +}
> +bdrv_io_limits_enable(bs, throttling_group);
> +bdrv_set_io_limits(bs, );
> +}

A while ago we discussed that we want to move throttling to the BB level
eventually. In that case, it might be wiser not to expose it for non-BB
nodes now.

Otherwise the patch looks good to me.

Kevin



Re: [Qemu-block] [PATCH 1/3] block: Use bdrv_lookup_bs() instead of bdrv_find_node()

2015-10-15 Thread Alberto Garcia
On Wed 14 Oct 2015 03:16:00 PM CEST, Jeff Cody  wrote:
> This is a precursor to making bdrv_find_node() static, and internal
> to block.c
>
> To find a BlockDriverState interface, it can be done via blk_by_name(),
> bdrv_find_node(), and bdrv_lookup_bs().  The latter can take the place
> of the other two, in the instances where we are only concerned with
> the BlockDriverState.
>
> There is no benefit in calling bdrv_find_node() directly.  This patch
> replaces all calls to bdrv_find_node() outside of block.c with
> bdrv_lookup_bs().
>
> Signed-off-by: Jeff Cody 

Reviewed-by: Alberto Garcia 



[Qemu-block] [PATCH] block: fix memory leak in early exit

2015-10-15 Thread Stefan Hajnoczi
The stream block job has two early exit code paths.  They do not free
s->backing_file_str.

Also, the early exits rely on the fact that the coroutine hasn't yielded
yet and was launched from the main thread.  Therefore the coroutine is
guaranteed to be running in the main thread where block_job_completed()
may be called safely.  This is very subtle so it's nice to eliminate the
assumption by unifying the early exit with the normal exit code path.

Cc: Fam Zheng 
Cc: Jeff Cody 
Signed-off-by: Stefan Hajnoczi 
---
 block/stream.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index ab0bd05..1986e9a 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -120,16 +120,16 @@ static void coroutine_fn stream_run(void *opaque)
 int ret = 0;
 int n = 0;
 void *buf;
+bool reached_end = false;
 
 if (!bs->backing_hd) {
-block_job_completed(>common, 0);
-return;
+goto out;
 }
 
 s->common.len = bdrv_getlength(bs);
 if (s->common.len < 0) {
-block_job_completed(>common, s->common.len);
-return;
+ret = s->common.len;
+goto out;
 }
 
 end = s->common.len >> BDRV_SECTOR_BITS;
@@ -207,6 +207,10 @@ wait:
 s->common.offset += n * BDRV_SECTOR_SIZE;
 }
 
+if (sector_num == end) {
+reached_end = true;
+}
+
 if (!base) {
 bdrv_disable_copy_on_read(bs);
 }
@@ -216,10 +220,11 @@ wait:
 
 qemu_vfree(buf);
 
+out:
 /* Modify backing chain and close BDSes in main loop */
 data = g_malloc(sizeof(*data));
 data->ret = ret;
-data->reached_end = sector_num == end;
+data->reached_end = reached_end;
 block_job_defer_to_main_loop(>common, stream_complete, data);
 }
 
-- 
2.4.3




Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-15 Thread Stefan Hajnoczi
On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
> > On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
> >> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> >>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>  +/* start backup job now */
>  +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>  +s->active_disk->backing_blocker);
>  +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>  +s->hidden_disk->backing_blocker);
> >>>
> >>> Why is it safe to unblock these operations?
> >>>
> >>> Why do they have to be blocked for non-replication users?
> >>
> >> hidden_disk and secondary disk are opened as backing file, so it is 
> >> blocked for
> >> non-replication users.
> >> What can I do if I don't unblock it and want to do backup?
> > 
> > CCing Jeff Cody, block jobs maintainer
> > 
> > You need to explain why it is safe remove this protection.  We can't
> > merge code that may be unsafe.
> > 
> > I think we can investigate further by asking: when does QEMU code assume
> > the backing file is read-only?
> 
> The backing file is opened in read-only mode. I want to reopen it in 
> read-write
> mode here in the next version(So the patch 1 will be dropped)
> 
> > 
> > I haven't checked but these cases come to mind:
> > 
> > Operations that move data between BDS in the backing chain (e.g. commit
> > and stream block jobs) will lose or overwrite data if the backing file
> > is being written to by another coroutine.
> > 
> > We need to prevent users from running these operations at the same time.
> 
> Yes, but qemu doesn't provide such API.

This series can't be merged unless it is safe.

Have you looked at op blockers and thought about how to prevent unsafe
operations?

Stefan



Re: [Qemu-block] [Qemu-devel] [PATCH] Revert "blockdev: add note that block_job_cb() must be thread-safe"

2015-10-15 Thread Stefan Hajnoczi
On Wed, Oct 14, 2015 at 10:51:27PM -0400, Fam Zheng wrote:
> 
> 
> - Original Message -
> > On Tue, Oct 13, 2015 at 06:16:15PM +0800, Fam Zheng wrote:
> > > This reverts commit 723c5d93c51bdb3adbc238ce90195c0864aa6cd5.
> > > 
> > > block_job_cb is called by block_job_completed, which is always called in
> > > a main loop bottom half in existing block jobs. So we don't need to
> > > worry about thread-safety here.
> > 
> > This is not correct.  Search for block_job_completed() callers.
> > 
> > For example, block/stream.c has early exit cases that call
> > block_job_completed() from the coroutine (i.e. dispatched from a
> > coroutine in another AioContext+IOThread).
> > 
> > I think you are assuming that all block_job_completed() callers are
> > called from a function scheduled using block_job_defer_to_main_loop().
> 
> No, I'm assuming all block_job_completed() callers are (and they should
> be) from main thread. Even the early exit cases in stream are so, because
> they are in the same thread as stream_start, which is main thread.

You are right, the early block_job_completed() calls in stream.c happen
in the main thread.

I'm concerned that we have no comments or assertions to check that the
assumption holds.  Luckily only stream.c relies on the trick of calling
block_job_completed() directly but knowing it runs from the main thread.

I'll send a patch to fix stream.c.  It needs to be done anyway since
there is a memory leak in the early block_job_completed() stream.c code.
Will CC you.

Stefan



Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-15 Thread Laszlo Ersek
CC'ing Kevin O'Connor

On 10/14/15 13:27, Ian Campbell wrote:
> On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
>>> Can't you just teach SeaBIOS how to deal with your PV disks and then
>>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
>>> it's done for virtio-blk, and it doesn't involve any insanities like
>>> ripping out non-hotpluggable devices.
>>
>> Teaching SeaBIOS to deal with PV disks can be done, in fact we already
>> support PV disks in OVMF. It is possible to boot Windows with OVMF
>> without any IDE disks (patch pending for libxl to create a VM without
>> emulated IDE disks).
> 
> One stumbling block in the past has been how to know when the PV drivers in
> the BIOS are no longer required, such that the ring can be torn down and/or
> the connection etc handed over to the OS driver.
> 
> I think we deal with this in OVMF using ExitBootServices? (TBH I'm not sure 
> how).

Search "XenBusDxe/XenBusDxe.c" in edk2 for "EVT_SIGNAL_EXIT_BOOT_SERVICES".

TBH, the code in NotifyExitBoot() doesn't seem valid. If you check the
UEFI spec (for example, v2.5, but the requirement I'm about to quote is
very old), in the specification of EFI_BOOT_SERVICES.CreateEvent() you find:

EVT_SIGNAL_EXIT_BOOT_SERVICES

This event is to be notified by the system when ExitBootServices()
is invoked. This event is of type EVT_NOTIFY_SIGNAL and should not
be combined with any other event types. The notification function
for this event is not allowed to use the Memory Allocation
Services, or call any functions that use the Memory Allocation
Services and must only call functions that are known not to use
Memory Allocation Services, because these services modify the
current memory map.The notification function must not depend on
timer events since timer services will be deactivated before any
notification functions are called.

NotifyExitBoot() in "XenBusDxe/XenBusDxe.c" calls the
DisconnectController() boot service. That in turn leads to calls to
EFI_DRIVER_BINDING_PROTOCOL.Stop() functions (speaking generally), which
inevitably free memory as part of unbinding the device, thereby breaking
the above requirement.

The right solution is the following:
- when a driver binds a device (a "handle"), a piece of the resources
  allocated for that binding should be a new event, to be signaled at
  ExitBootServices() time. The handler function can be shared by all
  such devices. The context passed to the handler should be the (driver-
  specific) structure that represents the binding and the state of the
  device in general.

- When the driver unbinds the device, the event should be closed. This
  will automatically unregister the callback as well.

- Now, when the callback is entered at all, you can be sure that the
  binding still exists. In this case, you should probe into the various
  fields of the context (the device state, practically), to figure out
  if this device "lives" or is dormant. For simpler devices, the answer
  is always "alive", but some devices could have configuration states
  where they are bound yet not configured (using no hw resources etc).

- In case the device is alive, the action to take is to make it abort
  any in-flight transfers or other operations, and re-set / deconfigure
  it *without* touching any memory allocations.

You can see this in the virtio-net driver in OVMF. In the
"OvmfPkg/VirtioNetDxe" directory, see the "TechNotes.txt",
"DriverBinding.c" and "Events.c" files. The callback function is
VirtioNetExitBoot(), and the event registration / deregistration happens in:

VirtioNetDriverBindingStart()
  VirtioNetSnpPopulate()
gBS->CreateEvent(EVT_SIGNAL_EXIT_BOOT_SERVICES)

vs.

VirtioNetDriverBindingStop()
  VirtioNetSnpEvacuate()
gBS->CloseEvent()

What VirtioNetExitBoot() does is very simple: resetting the virtio
device is a small action, and it covers the responsibilities.


I'll admit that the virtio-scsi and virtio-block drivers play a bit
dirty here. (I've known this for a long time, but been silent about it.)
They should have similar callbacks, but don't (In theory, all devices
that are bound & alive at ExitBootServices() should be re-set, without
touching the memory services.)

There are two mitigating factors here:
- unlike with virtio-net, the scsi and block drivers in OVMF support
  only synchronous operations. When you are not calling their
  functions, there are no transfers in flight. And when something calls
  ExitBootServices(), that thing is not calling virtio-block /
  virtio-scsi functions.

- The first thing the OS-level virtio drivers do (certainly in Linux,
  hopefully in Windows) is a virtio-reset on each virtio device found.
  (This is actually required by the virtio specification, both old and
  new.)

Now, there's one small window for issues here. If something in the guest
scribbled over the memory that, according to QEMU, still hosts a live
virtio ring, between ExitBootServices() and 

Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-15 Thread Wen Congyang
On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
 On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>> +/* start backup job now */
>> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>> +s->active_disk->backing_blocker);
>> +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>> +s->hidden_disk->backing_blocker);
>
> Why is it safe to unblock these operations?
>
> Why do they have to be blocked for non-replication users?

 hidden_disk and secondary disk are opened as backing file, so it is 
 blocked for
 non-replication users.
 What can I do if I don't unblock it and want to do backup?
>>>
>>> CCing Jeff Cody, block jobs maintainer
>>>
>>> You need to explain why it is safe remove this protection.  We can't
>>> merge code that may be unsafe.
>>>
>>> I think we can investigate further by asking: when does QEMU code assume
>>> the backing file is read-only?
>>
>> The backing file is opened in read-only mode. I want to reopen it in 
>> read-write
>> mode here in the next version(So the patch 1 will be dropped)
>>
>>>
>>> I haven't checked but these cases come to mind:
>>>
>>> Operations that move data between BDS in the backing chain (e.g. commit
>>> and stream block jobs) will lose or overwrite data if the backing file
>>> is being written to by another coroutine.
>>>
>>> We need to prevent users from running these operations at the same time.
>>
>> Yes, but qemu doesn't provide such API.
> 
> This series can't be merged unless it is safe.
> 
> Have you looked at op blockers and thought about how to prevent unsafe
> operations?

Hmm, only block jobs will write to the backing file? If so, op blockers can
prevent unsafe operations.

Thanks
Wen Congyang

> 
> Stefan
> .
> 




Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-15 Thread Laszlo Ersek
On 10/14/15 14:48, Paul Durrant wrote:
>> -Original Message-
>> From: Fabio Fantoni [mailto:fabio.fant...@m2r.biz]
>> Sent: 14 October 2015 12:12
>> To: Kevin Wolf; Stefano Stabellini
>> Cc: John Snow; Anthony Perard; qemu-de...@nongnu.org; xen-
>> de...@lists.xen.org; qemu-block@nongnu.org; Paul Durrant
>> Subject: Re: [Qemu-devel] Question about xen disk unplug support for ahci
>> missed in qemu
>>
>>
>>
>> Il 14/10/2015 11:47, Kevin Wolf ha scritto:
>>> [ CC qemu-block ]
>>>
>>> Am 13.10.2015 um 19:10 hat Stefano Stabellini geschrieben:
 On Tue, 13 Oct 2015, John Snow wrote:
> On 10/13/2015 11:55 AM, Fabio Fantoni wrote:
>> I added ahci disk support in libxl and using it for week seems that was
>> ok, after a reply of Stefano Stabellini seems that xen disk unplug
>> support only ide disks:
>>
>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=679f4f8b178e7c66fbc2f39
>> c905374ee8663d5d8
>>
>> Today Paul Durrant told me that even if pv disk is ok also with ahci and
>> the emulated one is offline can be a risk:
>> http://lists.xenproject.org/archives/html/win-pv-devel/2015-
>> 10/msg00021.html
>>
>>
>> I tried to take a fast look in qemu code but I not understand the
>> needed
>> thing for add the xen disk unplug support also for ahci, can someone do
>> it or tell me useful information for do it please?
>>
>> Thanks for any reply and sorry for my bad english.
>>
> I'm not entirely sure what features you need AHCI to support in order
> for Xen to be happy.
>
> I'd guess hotplugging, but where I get confused is that IDE disks don't
> support hotplugging either, so I guess I'm not sure sure what you need.
>
> Stefano, can you help bridge my Xen knowledge gap?

 Hi John,

 we need something like hw/i386/xen/xen_platform.c:unplug_disks but
>> that
 can unplug AHCI disk. And by unplug, I mean "make disappear" like
 pci_piix3_xen_ide_unplug does for ide.
>>> Maybe this would be the right time to stop the craziness with your
>>> hybrid IDE/xendisk setup. It's a horrible thing that would never happen
>>> on real hardware.
> 
> Unfortunately, it's going to be difficult to remove such 'craziness' when you 
> don't know a priori whether the VM has PV drivers or not. 
> 
>>>
>>> Can't you just teach SeaBIOS how to deal with your PV disks and then
>>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
>>> it's done for virtio-blk, and it doesn't involve any insanities like
>>> ripping out non-hotpluggable devices.
>>>
> 
> Does Windows have in-box virtio-blk drivers? If not, how does it boot?

The firmwares have virtio drivers (SeaBIOS and OVMF). The Windows
installer can be booted far enough, on top of the firmware services,
that Windows explicitly asks for a driver disk -- not only for the
installation *target* disk, but even for the CD-ROM that it is being
installed *from*.

In practice, you can install modern Windows (Windows 8 and later
definitely, I *think* maybe even Windows 7), from a virtio-scsi CD-ROM,
to, say, a virtio-block disk, and provide *only* the virtio drivers on a
separate IDE CD-ROM (which you can later remove completely, on the
device level).

The Windows installer's boot loader loads an initial ramdisk into
memory, using firmware services. Then its kernel starts and has access
to a number of basic drivers, like IDE CD-ROM drivers. That is
sufficient to load the virtio-scsi driver from the virtio-win ISO, and
then the installation can continue from the original virtio-scsi CD-ROM.

I always install windows guests like this; it is quite flexible. (In
brief: 1 virtio-block target disk, 1 virtio-scsi CD-ROM holding the
Windows ISO, 1 IDE CD-ROM (to be removed permanently from the guest
config, later) holding the virtio-win driver ISO.)

> 
>>> Hm... How does a reboot of a machine that had its IDE already removed
>>> actually work? Do you restart the qemu process on reboot?
>>>
> 
> The IDE disks are always present during boot, but before the OS
> enumerates them they are 'unplugged' and then PV drivers are used
> instead.

Supporting this in RHEL-5 and RHEL-6 guests was *incredible* pain.

Also, when you consider kexec / kdump in the guest, that was an unending
source of bug reports. If I recall correctly, the PV devices (net and
block) could not be used by the kdump kernel, because the main (crashed)
kernel may have left them in a bad state (and I'm not sure if it was
possible to reinit them.) And the emulated devices were slow... assuming
the kdump kernel didn't try to unplug them first!

(Maybe Vitaly (CC'd) knows more about the current status in this area;
I'm admittedly fuzzy, sorry. My intent is not to spread FUD.)

(I'll also admit that running kdump in the guest (that's already
crashed) is a bad idea in general, given that the host is there right
under it, capable of dumping the guests memory without issues. Maybe not
so flexible for some 

Re: [Qemu-block] [PATCH] block: fix memory leak in early exit

2015-10-15 Thread Fam Zheng
On Thu, 10/15 17:54, Stefan Hajnoczi wrote:
> The stream block job has two early exit code paths.  They do not free
> s->backing_file_str.
> 
> Also, the early exits rely on the fact that the coroutine hasn't yielded
> yet and was launched from the main thread.  Therefore the coroutine is
> guaranteed to be running in the main thread where block_job_completed()
> may be called safely.  This is very subtle so it's nice to eliminate the
> assumption by unifying the early exit with the normal exit code path.
> 
> Cc: Fam Zheng 
> Cc: Jeff Cody 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/stream.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index ab0bd05..1986e9a 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -120,16 +120,16 @@ static void coroutine_fn stream_run(void *opaque)
>  int ret = 0;
>  int n = 0;
>  void *buf;
> +bool reached_end = false;
>  
>  if (!bs->backing_hd) {
> -block_job_completed(>common, 0);
> -return;
> +goto out;
>  }
>  
>  s->common.len = bdrv_getlength(bs);
>  if (s->common.len < 0) {
> -block_job_completed(>common, s->common.len);
> -return;
> +ret = s->common.len;
> +goto out;
>  }
>  
>  end = s->common.len >> BDRV_SECTOR_BITS;
> @@ -207,6 +207,10 @@ wait:
>  s->common.offset += n * BDRV_SECTOR_SIZE;
>  }
>  
> +if (sector_num == end) {
> +reached_end = true;
> +}
> +
>  if (!base) {
>  bdrv_disable_copy_on_read(bs);
>  }
> @@ -216,10 +220,11 @@ wait:
>  
>  qemu_vfree(buf);
>  
> +out:
>  /* Modify backing chain and close BDSes in main loop */
>  data = g_malloc(sizeof(*data));
>  data->ret = ret;
> -data->reached_end = sector_num == end;
> +data->reached_end = reached_end;
>  block_job_defer_to_main_loop(>common, stream_complete, data);
>  }
>  
> -- 
> 2.4.3
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-block] [PATCH] block: fix memory leak in early exit

2015-10-15 Thread Jeff Cody
On Thu, Oct 15, 2015 at 05:54:27PM +0200, Stefan Hajnoczi wrote:
> The stream block job has two early exit code paths.  They do not free
> s->backing_file_str.
> 
> Also, the early exits rely on the fact that the coroutine hasn't yielded
> yet and was launched from the main thread.  Therefore the coroutine is
> guaranteed to be running in the main thread where block_job_completed()
> may be called safely.  This is very subtle so it's nice to eliminate the
> assumption by unifying the early exit with the normal exit code path.
> 
> Cc: Fam Zheng 
> Cc: Jeff Cody 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/stream.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index ab0bd05..1986e9a 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -120,16 +120,16 @@ static void coroutine_fn stream_run(void *opaque)
>  int ret = 0;
>  int n = 0;
>  void *buf;
> +bool reached_end = false;
>  
>  if (!bs->backing_hd) {
> -block_job_completed(>common, 0);
> -return;
> +goto out;
>  }
>  
>  s->common.len = bdrv_getlength(bs);
>  if (s->common.len < 0) {
> -block_job_completed(>common, s->common.len);
> -return;
> +ret = s->common.len;
> +goto out;
>  }
>  
>  end = s->common.len >> BDRV_SECTOR_BITS;
> @@ -207,6 +207,10 @@ wait:
>  s->common.offset += n * BDRV_SECTOR_SIZE;
>  }
>  
> +if (sector_num == end) {
> +reached_end = true;
> +}
> +
>  if (!base) {
>  bdrv_disable_copy_on_read(bs);
>  }
> @@ -216,10 +220,11 @@ wait:
>  
>  qemu_vfree(buf);
>  
> +out:
>  /* Modify backing chain and close BDSes in main loop */
>  data = g_malloc(sizeof(*data));
>  data->ret = ret;
> -data->reached_end = sector_num == end;
> +data->reached_end = reached_end;
>  block_job_defer_to_main_loop(>common, stream_complete, data);
>  }
>  
> -- 
> 2.4.3
>

Reviewed-by: Jeff Cody 



Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-15 Thread Kevin O'Connor
On Fri, Oct 16, 2015 at 01:10:54AM +0200, Laszlo Ersek wrote:
> On 10/14/15 13:27, Ian Campbell wrote:
> > On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
> >>> Can't you just teach SeaBIOS how to deal with your PV disks and then
> >>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
> >>> it's done for virtio-blk, and it doesn't involve any insanities like
> >>> ripping out non-hotpluggable devices.
> >>
> >> Teaching SeaBIOS to deal with PV disks can be done, in fact we already
> >> support PV disks in OVMF. It is possible to boot Windows with OVMF
> >> without any IDE disks (patch pending for libxl to create a VM without
> >> emulated IDE disks).
> > 
> > One stumbling block in the past has been how to know when the PV drivers in
> > the BIOS are no longer required, such that the ring can be torn down and/or
> > the connection etc handed over to the OS driver.
[...]
> > AFAIK the BIOS interfaces do not have anything as reliable as that.
> > 
> > How does virtio deal with this in the BIOS case?
> 
> It doesn't, as far as I can tell.
> 
> I don't think it has to, though! On a BIOS box, you can always boot DOS,
> or another operating system that continues to use the BIOS interfaces
> forever. (Same as if you never call ExitBootServices() in UEFI.)
> 
> Given that no starter pistol gets fired between the firmware and the OS
> on such a platform, they must always respect each other. I guess this
> could occur through the E820 map, or some such.

One can use the "ACPI enable" SMI event to detect this if they really
wanted to.  In SeaBIOS one could do this from
src/fw/smm.c:handle_smi() - however, no other drivers need this
notification today and it would be a bit ugly to have to handle it
from an SMI.  (Assuming Xen were to support SMIs.)

> No clue in what kind of E820 memory SeaBIOS allocates the virtio rings,
> but I guess the Linux kernel stays away from those areas until it's past
> device probing and binding.

In SeaBIOS, the virtio memory is allocated from reserved memory.  (See
the memalign_high() call in src/hw/virtio-pci.c - the "high" memory
zone is taken from reserved memory:
http://seabios.org/Memory_Model#Memory_available_during_initialization
)

What's the reason for the "stumbling block" that requires the BIOS to
tear down the Xen ring prior to the OS being able to replace it?  The
BIOS disk calls are all synchronous, so the ring wont be active when
the OS brings up its own ring.  Is there some low-level interaction
that prevents the OS from just resetting the ring prior to enabling
it?

-Kevin



Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication

2015-10-15 Thread Wen Congyang
On 10/15/2015 10:55 PM, Stefan Hajnoczi wrote:
> On Thu, Oct 15, 2015 at 10:19:17AM +0800, Wen Congyang wrote:
>> On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote:
>>> On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote:
 On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote:
>> +/* start backup job now */
>> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
>> +s->active_disk->backing_blocker);
>> +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE,
>> +s->hidden_disk->backing_blocker);
>
> Why is it safe to unblock these operations?
>
> Why do they have to be blocked for non-replication users?

 hidden_disk and secondary disk are opened as backing file, so it is 
 blocked for
 non-replication users.
 What can I do if I don't unblock it and want to do backup?
>>>
>>> CCing Jeff Cody, block jobs maintainer
>>>
>>> You need to explain why it is safe remove this protection.  We can't
>>> merge code that may be unsafe.
>>>
>>> I think we can investigate further by asking: when does QEMU code assume
>>> the backing file is read-only?
>>
>> The backing file is opened in read-only mode. I want to reopen it in 
>> read-write
>> mode here in the next version(So the patch 1 will be dropped)
>>
>>>
>>> I haven't checked but these cases come to mind:
>>>
>>> Operations that move data between BDS in the backing chain (e.g. commit
>>> and stream block jobs) will lose or overwrite data if the backing file
>>> is being written to by another coroutine.
>>>
>>> We need to prevent users from running these operations at the same time.
>>
>> Yes, but qemu doesn't provide such API.
> 
> This series can't be merged unless it is safe.
> 
> Have you looked at op blockers and thought about how to prevent unsafe
> operations?

What about this solution:
1. unblock it in bdrv_set_backing_hd()
2. block it in qmp_block_commit(), qmp_block_stream(), qmp_block_backup()..., to
   prevent unsafe operations

Thanks
Wen Congyang

> 
> Stefan
> .
> 




Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-15 Thread Fabio Fantoni

Il 14/10/2015 13:06, Stefano Stabellini ha scritto:

On Wed, 14 Oct 2015, Kevin Wolf wrote:

[ CC qemu-block ]

Am 13.10.2015 um 19:10 hat Stefano Stabellini geschrieben:

On Tue, 13 Oct 2015, John Snow wrote:

On 10/13/2015 11:55 AM, Fabio Fantoni wrote:

I added ahci disk support in libxl and using it for week seems that was
ok, after a reply of Stefano Stabellini seems that xen disk unplug
support only ide disks:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=679f4f8b178e7c66fbc2f39c905374ee8663d5d8

Today Paul Durrant told me that even if pv disk is ok also with ahci and
the emulated one is offline can be a risk:
http://lists.xenproject.org/archives/html/win-pv-devel/2015-10/msg00021.html


I tried to take a fast look in qemu code but I not understand the needed
thing for add the xen disk unplug support also for ahci, can someone do
it or tell me useful information for do it please?

Thanks for any reply and sorry for my bad english.


I'm not entirely sure what features you need AHCI to support in order
for Xen to be happy.

I'd guess hotplugging, but where I get confused is that IDE disks don't
support hotplugging either, so I guess I'm not sure sure what you need.

Stefano, can you help bridge my Xen knowledge gap?
  
Hi John,


we need something like hw/i386/xen/xen_platform.c:unplug_disks but that
can unplug AHCI disk. And by unplug, I mean "make disappear" like
pci_piix3_xen_ide_unplug does for ide.

Maybe this would be the right time to stop the craziness with your
hybrid IDE/xendisk setup. It's a horrible thing that would never happen
on real hardware.

I would be quite happy to stop (or even get rid of) the craziness.



Can't you just teach SeaBIOS how to deal with your PV disks and then
only add that to your VM and forget about IDE/AHCI? I mean, that's how
it's done for virtio-blk, and it doesn't involve any insanities like
ripping out non-hotpluggable devices.

Teaching SeaBIOS to deal with PV disks can be done, in fact we already
support PV disks in OVMF. It is possible to boot Windows with OVMF
without any IDE disks (patch pending for libxl to create a VM without
emulated IDE disks).

However we have to be honest that implementing PV disk support in
SeaBIOS is a different magnitude of effort compared to implementing AHCI
"unplug".

I would suggest Fabio to avoid AHCI disks altogether and just use OVMF
with PV disks only and Anthony's patch to libxl to avoid creating any
IDE disks: http://marc.info/?l=xen-devel=144482080812353.

Would that work for you?


Thanks for the advice, I tried it:
https://github.com/Fantu/Xen/commits/rebase/m2r-testing-4.6

I installed W10 pro 64 bit with ide disk, installed the win pv drivers 
and after changed to xvdX instead hdX, is the only change needed, right?
Initial boot is ok (ovmf part about pv disks seems ok) but windows boot 
fails with problem with pv drivers.

In attachment full qemu log with xen_platform trace and domU's xl cfg.

Someone have windows domUs with ovmf and pv disks only working? If yes 
can tell me the difference to understand what can be the problem please?






Hm... How does a reboot of a machine that had its IDE already removed
actually work? Do you restart the qemu process on reboot?

Restart QEMU, yes.


main_channel_link: add main channel client
main_channel_handle_parsed: net test: latency 8.258000 ms, bitrate 727789623 
bps (694.074271 Mbps)
inputs_connect: inputs channel client create
red_dispatcher_set_cursor_peer: 
xen_platform_log xen platform: XEN|DllInitialize: 8.2.0 (80) (17.09.2015)
xen_platform_log xen platform: XEN|AcpiFindRsdp: 0x000EA020
xen_platform_log xen platform: XEN|SystemGetStartOptions:  TESTSIGNING  
NOEXECUTE=OPTIN  NOVGA
xen_platform_log xen platform: XEN|SystemGetVersionInformation: KERNEL: 10.0 
(BUILD 10240) PLATFORM WIN32_NT (x64)
xen_platform_log xen platform: XEN|SystemGetVersionInformation: SUITES:
xen_platform_log xen platform: XEN|SystemGetVersionInformation: - TERMINAL
xen_platform_log xen platform: XEN|SystemGetVersionInformation: - SINGLEUSERTS
xen_platform_log xen platform: XEN|SystemGetVersionInformation: TYPE: 
WORKSTATION
xen_platform_log xen platform: XEN|SystemGetMemoryInformation: RANGE[0] 
.1000 - .0009efff
xen_platform_log xen platform: XEN|SystemGetMemoryInformation: RANGE[1] 
.0010 - .eed94fff
xen_platform_log xen platform: XEN|SystemGetMemoryInformation: RANGE[2] 
.eee12000 - .efe91fff
xen_platform_log xen platform: XEN|SystemGetMemoryInformation: RANGE[3] 
.efef6000 - .effc
xen_platform_log xen platform: XEN|SystemGetMemoryInformation: RANGE[4] 
.efff - .efff
xen_platform_log xen platform: XEN|SystemGetMemoryInformation: RANGE[5] 
0001. - 0001.07eaafff
xen_platform_log xen platform: XEN|AcpiGetXsdt: 0xFC012BA0
xen_platform_log xen platform: XEN|SystemProcessorInformation: > (0:0)
xen_platform_log xen platform: XEN|SystemProcessorInformation: 

Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-15 Thread Anthony PERARD
On Thu, Oct 15, 2015 at 06:27:17PM +0200, Fabio Fantoni wrote:
> Il 14/10/2015 13:06, Stefano Stabellini ha scritto:
> >I would suggest Fabio to avoid AHCI disks altogether and just use OVMF
> >with PV disks only and Anthony's patch to libxl to avoid creating any
> >IDE disks: http://marc.info/?l=xen-devel=144482080812353.
> >
> >Would that work for you?
> 
> Thanks for the advice, I tried it:
> https://github.com/Fantu/Xen/commits/rebase/m2r-testing-4.6
> 
> I installed W10 pro 64 bit with ide disk, installed the win pv drivers and
> after changed to xvdX instead hdX, is the only change needed, right?
> Initial boot is ok (ovmf part about pv disks seems ok) but windows boot
> fails with problem with pv drivers.
> In attachment full qemu log with xen_platform trace and domU's xl cfg.
> 
> Someone have windows domUs with ovmf and pv disks only working? If yes can
> tell me the difference to understand what can be the problem please?

When I worked on the PV disk implementation in OVMF, I was able to boot
a Windows 8 with pv disk only.

I don't have access to the guest configuration I was using, but I think one
difference would be the viridian setting, I'm pretty sure I did not set it.

-- 
Anthony PERARD