Re: [ovirt-devel] [ARM64] Possiblity to support oVirt on ARM64

2020-07-19 Thread Nir Soffer
On Sun, Jul 19, 2020 at 5:04 PM Zhenyu Zheng  wrote:
>
> Hi oVirt,
>
> We are currently trying to make oVirt work on ARM64 platform, since I'm quite 
> new to oVirt community, I'm wondering what is the current status about ARM64 
> support in the oVirt upstream, as I saw the oVirt Wikipedia page mentioned 
> there is an ongoing efforts to support ARM platform. We have a small team 
> here and we are willing to also help to make this work.

Hi Zhenyu,

I think this is a great idea, both supporting more hardware, and
enlarging the oVirt
community.

Regarding hardware support we depend mostly on libvirt and qemu, and I
don't know
that is the status. Adding relevant lists and people.

I don't know about any effort  on oVirt side, but last week I added
arm builds for
ovirt-imageio and it works:
https://copr.fedorainfracloud.org/coprs/nsoffer/ovirt-imageio-preview/build/1555705/

We have many dependendencies, but oVirt itself is mostly python and java, with
tiny bits in C or using ctypes, so it should not be too hard.

I think the first thing is getting some hardware for testing. Do you
have such hardware,
or have some contacts that can help to get hardware contribution for this?

Nir




Re: [PATCH] Qemu: migration: Not bind RAM info with active migration status

2020-07-19 Thread zhukeqian
Hi Daniel,

On 2020/7/17 22:33, Daniel Henrique Barboza wrote:
> 
> 
> On 7/15/20 3:18 AM, Keqian Zhu wrote:
>> For that Qemu supports returning incoming migration info since its commit
>> 65ace0604551 (migration: add postcopy total blocktime into query-migrate),
> 
> 
> It is worth saying that this QEMU commit that decoupled the RAM
> status from the active migration status is live since early 2018:
> 
> $ git show 65ace0604551
> commit 65ace060455122a461cdc9302238b914084bcd42
> Author: Alexey Perevalov 
> Date:   Thu Mar 22 21:17:27 2018 +0300
> 
> migration: add postcopy total blocktime into query-migrate
> 
> $ git describe 65ace0604551
> v2.12.0-6-g65ace06045
> 
> 
> I am not sure if we care about removing a migration failure check for
> QEMU 2.12 when we're waiting for 5.1 to come out. My guess is that we
> do care, but  not enough to demand a "if (QEMU <= 2.12)" in this logic.
> I'll also assume that the existing failure check is doing more harm than
> good nowadays, so:
> 
Thanks for your review.

Thanks,
Keqian
> 
> Reviewed-by: Daniel Henrique Barboza 
> 
> 
>> which may contains active status, but without RAM info. Drop this binding
>> relationship check in libvirt.
>>
>> Signed-off-by: Keqian Zhu 
>> ---
> 
> 
> 
> 
> 
>>   src/qemu/qemu_monitor_json.c | 88 +---
>>   1 file changed, 42 insertions(+), 46 deletions(-)
>>
>> diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
>> index d808c4b55b..ba8e340742 100644
>> --- a/src/qemu/qemu_monitor_json.c
>> +++ b/src/qemu/qemu_monitor_json.c
>> @@ -3547,56 +3547,52 @@ 
>> qemuMonitorJSONGetMigrationStatsReply(virJSONValuePtr reply,
>>   case QEMU_MONITOR_MIGRATION_STATUS_PRE_SWITCHOVER:
>>   case QEMU_MONITOR_MIGRATION_STATUS_DEVICE:
>>   ram = virJSONValueObjectGetObject(ret, "ram");
>> -if (!ram) {
>> -virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> -   _("migration was active, but no RAM info was 
>> set"));
>> -return -1;
>> -}
>> +if (ram) {
>> +if (virJSONValueObjectGetNumberUlong(ram, "transferred",
>> + &stats->ram_transferred) < 
>> 0) {
>> +virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> +   _("migration was active, but RAM 
>> 'transferred' "
>> + "data was missing"));
>> +return -1;
>> +}
>> +if (virJSONValueObjectGetNumberUlong(ram, "remaining",
>> + &stats->ram_remaining) < 
>> 0) {
>> +virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> +   _("migration was active, but RAM 'remaining' 
>> "
>> + "data was missing"));
>> +return -1;
>> +}
>> +if (virJSONValueObjectGetNumberUlong(ram, "total",
>> + &stats->ram_total) < 0) {
>> +virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> +   _("migration was active, but RAM 'total' "
>> + "data was missing"));
>> +return -1;
>> +}
>>   -if (virJSONValueObjectGetNumberUlong(ram, "transferred",
>> - &stats->ram_transferred) < 0) {
>> -virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> -   _("migration was active, but RAM 'transferred' "
>> - "data was missing"));
>> -return -1;
>> -}
>> -if (virJSONValueObjectGetNumberUlong(ram, "remaining",
>> - &stats->ram_remaining) < 0) {
>> -virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> -   _("migration was active, but RAM 'remaining' "
>> - "data was missing"));
>> -return -1;
>> -}
>> -if (virJSONValueObjectGetNumberUlong(ram, "total",
>> - &stats->ram_total) < 0) {
>> -virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> -   _("migration was active, but RAM 'total' "
>> - "data was missing"));
>> -return -1;
>> -}
>> +if (virJSONValueObjectGetNumberDouble(ram, "mbps", &mbps) == 0 
>> &&
>> +mbps > 0) {
>> +/* mpbs from QEMU reports Mbits/s (M as in 10^6 not Mi as 
>> 2^20) */
>> +stats->ram_bps = mbps * (1000 * 1000 / 8);
>> +}
>>   -if (virJSONValueObjectGetNumberDouble(ram, "mbps", &mbps) == 0 &&
>> -mbps > 0) {
>> -/* mpbs from QEMU reports Mbits/s (M as in 10^6 not Mi as 2^20) 
>> */
>> -stats->ram_bps = mbps * (1000 * 1000 / 8)

Re: [ovirt-devel] [ARM64] Possiblity to support oVirt on ARM64

2020-07-19 Thread Zhenyu Zheng
Hi Nir,

Thanks alot for the quick response and information, actually I also  worked
a little bit on libvirt and other from our
team has experiences in qemu, those projects works OK on ARM platform, it
is just the capability might be less
when comparing to other platforms, for example, myself just added the
getHost() capability for ARM64 in libvirt and
now planning to work on compareCPU() API, after that we will have better
migration workflow on ARM platform.

For resources, actually, yes, I think we can provide those for oVirt
community, could you tell me more details about
what kind of resources are more suitable, our current resources are mostly
VMs from public cloud, will that be OK?

BR,

Zhenyu

On Mon, Jul 20, 2020 at 2:07 AM Nir Soffer  wrote:

> On Sun, Jul 19, 2020 at 5:04 PM Zhenyu Zheng 
> wrote:
> >
> > Hi oVirt,
> >
> > We are currently trying to make oVirt work on ARM64 platform, since I'm
> quite new to oVirt community, I'm wondering what is the current status
> about ARM64 support in the oVirt upstream, as I saw the oVirt Wikipedia
> page mentioned there is an ongoing efforts to support ARM platform. We have
> a small team here and we are willing to also help to make this work.
>
> Hi Zhenyu,
>
> I think this is a great idea, both supporting more hardware, and
> enlarging the oVirt
> community.
>
> Regarding hardware support we depend mostly on libvirt and qemu, and I
> don't know
> that is the status. Adding relevant lists and people.
>
> I don't know about any effort  on oVirt side, but last week I added
> arm builds for
> ovirt-imageio and it works:
>
> https://copr.fedorainfracloud.org/coprs/nsoffer/ovirt-imageio-preview/build/1555705/
>
> We have many dependendencies, but oVirt itself is mostly python and java,
> with
> tiny bits in C or using ctypes, so it should not be too hard.
>
> I think the first thing is getting some hardware for testing. Do you
> have such hardware,
> or have some contacts that can help to get hardware contribution for this?
>
> Nir
>
>


Re: [PATCH] qemu: clear residual QMP caps processes during QEMU driver initialization

2020-07-19 Thread Bihong Yu



On 2020/7/18 5:14, Daniel Henrique Barboza wrote:
> 
> 
> On 7/17/20 8:10 AM, Bihong Yu wrote:
>>> From c328ff62b11d58553fd2032a85fd3295e009b3d3 Mon Sep 17 00:00:00 2001
>> From: Bihong Yu 
>> Date: Fri, 17 Jul 2020 16:55:12 +0800
>> Subject: [PATCH] qemu: clear residual QMP caps processes during QEMU driver
>>   initialization
>>
>> In some cases, the QMP capabilities processes possible residue. So we need to
>> clear the residual QMP caps processes during starting libvirt.
> 
> Which cases are you referring to? Do you have a reproducer for this behavior? 
> I
> tried it up starting and stopping libvirtd, fetching capabilities, starting 
> vms
> and so on, and wasn't able to see any 'qmp.pid' file hanging around.

Yes, I have a reproducer for this behavior. I refer case is that send kill 
signal
to libvirtd when libvirtd is fetching capabilities before libvirtd calling
qemuProcessQMPFree() and qemuProcessQMPStop().

When libvirtd restart, it can not find the temporary qmp directory and has no 
way
to clear the residual QMP caps processes.

Thanks,

Bihong Yu

> 
> About the code, I looked it up and all calls to qemuProcessQMPNew() are being
> cleaned up accordingly with a qemuProcessQMPFree() function, which calls
> qemuProcessQMPStop(), and the Stop function cleans up both the pid file and
> the uniqDir:
> 
> - qemuProcessQMPStop(qemuProcessQMPPtr proc) -
> 
>     if (proc->pidfile)
>     unlink(proc->pidfile);
> 
>     if (proc->uniqDir)
>     rmdir(proc->uniqDir);
> 
> --
> 
> The proper fix here would be to understand in which conditions the 'qmp.pid' 
> file
> is being left behind and make the code go through the existing cleanup path,
> fixing qemuProcessQMPFree and/or qemuProcessQMPStop if needed. What you're 
> doing
> here works, but it's fixing the symptom of a bug instead of the bug itself.
> 
> 
> Thanks,
> 
> 
> DHB
> 
> 
>>
>> Signed-off-by:Bihong Yu 
>> ---
>>   src/qemu/qemu_driver.c  |  2 ++
>>   src/qemu/qemu_process.c | 40 
>>   src/qemu/qemu_process.h |  2 ++
>>   3 files changed, 44 insertions(+)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index d185666..d627fd7 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -889,6 +889,8 @@ qemuStateInitialize(bool privileged,
>>   run_gid = cfg->group;
>>   }
>>
>> +    qemuProcessQMPClear(cfg->libDir);
>> +
>>   qemu_driver->qemuCapsCache = virQEMUCapsCacheNew(cfg->libDir,
>>    cfg->cacheDir,
>>    run_uid,
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 70fc24b..d545e3e 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -8312,6 +8312,46 @@ static qemuMonitorCallbacks callbacks = {
>>   };
>>
>>
>> +/**
>> + * qemuProcessQMPClear
>> + *
>> + * Try to kill residual QMP caps processes
>> + */
>> +void
>> +qemuProcessQMPClear(const char *libDir)
>> +{
>> +    virErrorPtr orig_err;
>> +    DIR *dirp = NULL;
>> +    struct dirent *dp;
>> +
>> +    if (!virFileExists(libDir))
>> +    return;
>> +
>> +    if (virDirOpen(&dirp, libDir) < 0)
>> +    return;
>> +
>> +    virErrorPreserveLast(&orig_err);
>> +    while (virDirRead(dirp, &dp, libDir) > 0) {
>> +    g_autofree char *qmp_uniqDir = NULL;
>> +    g_autofree char *qmp_pidfile = NULL;
>> +
>> +    if (!STRPREFIX(dp->d_name, "qmp-"))
>> +    continue;
>> +
>> +    qmp_uniqDir = g_strdup_printf("%s/%s", libDir, dp->d_name);
>> +    qmp_pidfile = g_strdup_printf("%s/%s", qmp_uniqDir, "qmp.pid");
>> +
>> +    ignore_value(virPidFileForceCleanupPath(qmp_pidfile));
>> +
>> +    if (qmp_uniqDir)
>> +    rmdir(qmp_uniqDir);
>> +    }
>> +    virErrorRestore(&orig_err);
>> +
>> +    VIR_DIR_CLOSE(dirp);
>> +}
>> +
>> +
>>   static void
>>   qemuProcessQMPStop(qemuProcessQMPPtr proc)
>>   {
>> diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h
>> index 15e67b9..b039e6c 100644
>> --- a/src/qemu/qemu_process.h
>> +++ b/src/qemu/qemu_process.h
>> @@ -233,6 +233,8 @@ qemuProcessQMPPtr qemuProcessQMPNew(const char *binary,
>>   gid_t runGid,
>>   bool forceTCG);
>>
>> +void qemuProcessQMPClear(const char *libDir);
>> +
>>   void qemuProcessQMPFree(qemuProcessQMPPtr proc);
>>
>>   int qemuProcessQMPStart(qemuProcessQMPPtr proc);
>>
> .




Re: [PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-07-19 Thread Masayoshi Mizuma
On Sat, Jul 18, 2020 at 08:06:00AM +0200, Peter Krempa wrote:
> On Thu, Jul 16, 2020 at 20:55:29 -0400, Masayoshi Mizuma wrote:
> > Thank you for your review.
> > 
> > On Tue, Jul 07, 2020 at 06:36:23AM -0500, Eric Blake wrote:
> > > On 7/7/20 2:12 AM, Peter Krempa wrote:
> > > 
> > > > You can install a qcow2 overlay on top of a raw file too. IMO the
> > > > implications of using  allow that.
> > > > 
> > > > As said above I'd strongly prefer if the overlay is created in qemu
> > > > using the blockdev-create blockjob (there is already infrastructure in
> > > > libvirt to achieve that).
> > > 
> > > Agreed.  At this point, any time we call out to qemu-img as a separate
> > > process, we are probably doing it wrong.
> > 
> > Got it. I'm thinking about the procedure such as followings.
> > Does that make sense?
> > 
> >   1) Open the monitor with qemuProcessQMPNew()/qemuProcessQMPStart(), 
> >  and connect it.
> 
> Starting a new qemu process just to format an image is extreme overkill
> and definitely not what we want to do.

I see, thanks.

> 
> >   2) Setup the transient disk with qemuDomainPrepareStorageSourceBlockdev(),
> >  qemuBlockStorageSourceAttachApplyStorage(), 
> > qemuBlockStorageSourceCreateGetFormatProps()
> >  and something...
> > 
> >   3) Run blockdev-create command with qemuMonitorBlockdevCreate(), then
> >  close the monitor.
> 
> These two steps should be exectued in the qemu process which already
> will run the VM prior to starting the guest CPUs.
> 
> >   4) Switch the original disk to the transient disk.
> > 
> >   5) Build the blockdev argument for qemu.
> 
> And instead of this step, you use the external snapshot infrastructure
> to install the overlays via 'blockdev-snapshot' QMP command

OK. I suppose qemuDomainSnapshotDiskPrepare() and
qemuDomainSnapshotDiskUpdateSource() maybe helpful to implement the
setup steps of transient disk.

> 
> > 
> >   6) Run qemu
> 
> And instead of this the VM cpus will be started.

Got it, I think the appropriate place is after virCommandRun() at 
qemuProcessLaunch(),
and before qemuProcessFinishStartup().

> 
> 
> The above steps require factoring out snapshot code a bit. I have a few
> patches in that direction so I'll try posting them next week hopefully.

Great, thanks!

- Masa



Re: device compatibility interface for live migration with assigned devices

2020-07-19 Thread Jason Wang



On 2020/7/18 上午12:12, Alex Williamson wrote:

On Thu, 16 Jul 2020 16:32:30 +0800
Yan Zhao  wrote:


On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote:

On 2020/7/14 上午7:29, Yan Zhao wrote:

hi folks,
we are defining a device migration compatibility interface that helps upper
layer stack like openstack/ovirt/libvirt to check if two devices are
live migration compatible.
The "devices" here could be MDEVs, physical devices, or hybrid of the two.
e.g. we could use it to check whether
- a src MDEV can migrate to a target MDEV,
- a src VF in SRIOV can migrate to a target VF in SRIOV,
- a src MDEV can migration to a target VF in SRIOV.
(e.g. SIOV/SRIOV backward compatibility case)

The upper layer stack could use this interface as the last step to check
if one device is able to migrate to another device before triggering a real
live migration procedure.
we are not sure if this interface is of value or help to you. please don't
hesitate to drop your valuable comments.


(1) interface definition
The interface is defined in below way:

   __userspace
/\  \
   / \write
  / read  \
 /__   ___\|/_
| migration_version | | migration_version |-->check migration
- -   compatibility
   device Adevice B


a device attribute named migration_version is defined under each device's
sysfs node. e.g. 
(/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).


Are you aware of the devlink based device management interface that is
proposed upstream? I think it has many advantages over sysfs, do you
consider to switch to that?


Advantages, such as?



My understanding for devlink(netlink) over sysfs (some are mentioned at 
the time of vDPA sysfs mgmt API discussion) are:


- existing users (NIC, crypto, SCSI, ib), mature and stable
- much better error reporting (ext_ack other than string or errno)
- namespace aware
- do not couple with kobject

Thanks



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-19 Thread Jason Wang



On 2020/7/17 下午10:18, Peter Xu wrote:

On Thu, Jul 16, 2020 at 10:54:31AM +0800, Jason Wang wrote:

On 2020/7/16 上午9:00, Peter Xu wrote:

On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote:

On 2020/7/10 下午9:30, Peter Xu wrote:

On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:

On 2020/7/9 下午10:10, Peter Xu wrote:

On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:

- If we care the performance, it's better to implement the MAP event for
vhost, otherwise it could be a lot of IOTLB miss

I feel like these are two things.

So far what we are talking about is whether vt-d should have knowledge about
what kind of events one iommu notifier is interested in.  I still think we
should keep this as answered in question 1.

The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP
events even without vDMA, so that vhost can establish the mapping even before
IO starts.  IMHO it's doable, but only if the guest runs DPDK workloads.  When
the guest is using dynamic iommu page mappings, I feel like that can be even
slower, because then the worst case is for each IO we'll need to vmexit twice:

   - The first vmexit caused by an invalidation to MAP the page tables, so 
vhost
 will setup the page table before IO starts

   - IO/DMA triggers and completes

   - The second vmexit caused by another invalidation to UNMAP the page 
tables

So it seems to be worse than when vhost only uses UNMAP like right now.  At
least we only have one vmexit (when UNMAP).  We'll have a vhost translate()
request from kernel to userspace, but IMHO that's cheaper than the vmexit.

Right but then I would still prefer to have another notifier.

Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a
dedicated command for flushing device IOTLB. But the check for
vtd_as_has_map_notifier is used to skip the device which can do demand
paging via ATS or device specific way. If we have two different notifiers,
vhost will be on the device iotlb notifier so we don't need it at all?

But we can still have iommu notifier that only registers to UNMAP even after we
introduce dev-iotlb notifier?  We don't want to do page walk for them as well.
TCG should be the only one so far, but I don't know.. maybe there can still be
new ones?

I think you're right. But looking at the codes, it looks like the check of
vtd_as_has_map_notifier() was only used in:

1) vtd_iommu_replay()
2) vtd_iotlb_page_invalidate_notify() (PSI)

For the replay, it's expensive anyhow. For PSI, I think it's just about one
or few mappings, not sure it will have obvious performance impact.

And I had two questions:

1) The codes doesn't check map for DSI or GI, does this match what spec
said? (It looks to me the spec is unclear in this part)

Both DSI/GI should cover maps too?  E.g. vtd_sync_shadow_page_table() in
vtd_iotlb_domain_invalidate().

I meant the code doesn't check whether there's an MAP notifier :)

It's actually checked, because it loops over vtd_as_with_notifiers, and only
MAP notifiers register to that. :)


I may miss something but I don't find the code to block UNMAP notifiers?

vhost_iommu_region_add()
     memory_region_register_iommu_notifier()
         memory_region_update_iommu_notify_flags()
             imrc->notify_flag_changed()
                 vtd_iommu_notify_flag_changed()

?

Yeah I think you're right.  I might have confused with some previous
implementations.  Maybe we should also do similar thing for DSI/GI just like
what we do in PSI.



Ok.





2) for the replay() I don't see other implementations (either spapr or
generic one) that did unmap (actually they skip unmap explicitly), any
reason for doing this in intel IOMMU?

I could be wrong, but I'd guess it's because vt-d implemented the caching mode
by leveraging the same invalidation strucuture, so it's harder to make all
things right (IOW, we can't clearly identify MAP with UNMAP when we receive an
invalidation request, because MAP/UNMAP requests look the same).

I didn't check others, but I believe spapr is doing it differently by using
some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to
what virtio-iommu is doing.  Anyway, the point is if we have explicit MAP/UNMAP
from the guest, logically the replay indeed does not need to do any unmap
because we don't need to call replay() on an already existing device but only
for e.g. hot plug.

But this looks conflict with what memory_region_iommu_replay( ) did, for
IOMMU that doesn't have a replay method, it skips UNMAP request:

      for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
      iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
      if (iotlb.perm != IOMMU_NONE) {
      n->notify(n, &iotlb);
      }

I guess there's no knowledge of whether guest have an explicit MAP/UMAP for
this generic code. Or replay implies that guest doesn't have explicit
MAP/UNMAP?

I think it matches exactly with

Re: [libvirt PATCH] spec: Don't require mdevctl on RHEL 7

2020-07-19 Thread Erik Skultety
On Sun, Jul 19, 2020 at 12:46:05AM +0200, Andrea Bolognani wrote:
> mdevctl is a relatively new tool that's packaged for Fedora and
> RHEL 8, but not for RHEL 7. Make the dependency conditional to
> avoid the libvirt-daemon-driver-nodedev package becoming
> uninstallable on that platform.
>
> Fixes: 9691440ecbc7d9383a1410f1067a4f9221f2de2c
> Signed-off-by: Andrea Bolognani 
> ---

Oops...
Reviewed-by: Erik Skultety