RE: Question about delaying dirty log stop to next vm_start

2021-08-03 Thread Zhoujian (jay)
Hi Yi,

> -Original Message-
> From: Yi Sun [mailto:yi.y@linux.intel.com]
> Sent: Monday, August 2, 2021 3:54 PM
> To: Zhoujian (jay) 
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; sanjay.k.ku...@intel.com;
> yi.l@intel.com; yi.y@linux.intel.com
> Subject: Question about delaying dirty log stop to next vm_start
> 
> Hi, Jay,
> 
> We are testing some live migration cases recently. We found your below patch
> delays the dirty log stop to the next vm_start.
> 
> commit 1931076077254a2886daa7c830c7838ebd1f81ef
> Author: Jay Zhou 
> Date:   Fri Jul 28 18:28:53 2017 +0800
> 
> migration: optimize the downtime
> 
> We understand this commit can optimize the downtime. But we observed that
> the dirty log stop cannot be executed if user inputs "quit" in qemu monitor 
> after
> migration completes. Have you considered such case before?

If the migration is successfully finished, the src VM should be destroyed. 

If user inputs "quit" in qemu monitor after migration completes, the VM is still
paused(vcpu is stopped), dirty log takes no effect; if the migration failed, 
the dirty log
stop can be executed after src VM started.

Thanks,
Jay Zhou


RE: [PATCH] migration: Count new_dirty instead of real_dirty

2020-06-15 Thread Zhoujian (jay)
Hi Keqian,

> -Original Message-
> From: zhukeqian
> Sent: Monday, June 15, 2020 11:19 AM
> To: qemu-devel@nongnu.org; qemu-...@nongnu.org; Paolo Bonzini
> ; Zhoujian (jay) 
> Cc: Juan Quintela ; Chao Fan ;
> Wanghaibin (D) 
> Subject: Re: [PATCH] migration: Count new_dirty instead of real_dirty
> 
> Hi Paolo and Jian Zhou,
> 
> Do you have any suggestion on this patch?
> 
> Thanks,
> Keqian
> 
> On 2020/6/1 12:02, Keqian Zhu wrote:
> > DIRTY_LOG_INITIALLY_ALL_SET feature is on the queue. This fixs the

s/fixs/fixes

> > dirty rate calculation for this feature. After introducing this
> > feature, real_dirty_pages is equal to total memory size at begining.
> > This causing wrong dirty rate and false positive throttling.

I think it should be tested whether DIRTY_LOG_INITIALLY_ALL_SET is enabled
in ram_init_bitmaps(maybe?) in order to be compatible with the old path.

Thanks,
Jay Zhou

> >
> > BTW, real dirty rate is not suitable and not very accurate.
> >
> > 1. For not suitable: We mainly concern on the relationship between
> >dirty rate and network bandwidth. Net increasement of dirty pages
> >makes more sense.
> > 2. For not very accurate: With manual dirty log clear, some dirty pages
> >will be cleared during each peroid, our "real dirty rate" is less
> >than real "real dirty rate".



> >
> > Signed-off-by: Keqian Zhu 
> > ---
> >  include/exec/ram_addr.h | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index
> > 5e59a3d8d7..af9677e291 100644
> > --- a/include/exec/ram_addr.h
> > +++ b/include/exec/ram_addr.h
> > @@ -443,7 +443,7 @@ static inline
> >  uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
> > ram_addr_t start,
> > ram_addr_t length,
> > -   uint64_t
> *real_dirty_pages)
> > +   uint64_t
> > + *accu_dirty_pages)
> >  {
> >  ram_addr_t addr;
> >  unsigned long word = BIT_WORD((start + rb->offset) >>
> > TARGET_PAGE_BITS); @@ -469,7 +469,6 @@ uint64_t
> cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
> >  if (src[idx][offset]) {
> >  unsigned long bits = atomic_xchg([idx][offset], 0);
> >  unsigned long new_dirty;
> > -*real_dirty_pages += ctpopl(bits);
> >  new_dirty = ~dest[k];
> >  dest[k] |= bits;
> >  new_dirty &= bits;
> > @@ -502,7 +501,6 @@ uint64_t
> cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
> >  start + addr + offset,
> >  TARGET_PAGE_SIZE,
> >  DIRTY_MEMORY_MIGRATION)) {
> > -*real_dirty_pages += 1;
> >  long k = (start + addr) >> TARGET_PAGE_BITS;
> >  if (!test_and_set_bit(k, dest)) {
> >  num_dirty++;
> > @@ -511,6 +509,7 @@ uint64_t
> cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
> >  }
> >  }
> >
> > +*accu_dirty_pages += num_dirty;
> >  return num_dirty;
> >  }
> >  #endif
> >


RE: [PATCH] kvm: support to get/set dirty log initial-all-set capability

2020-06-12 Thread Zhoujian (jay)


> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Friday, June 12, 2020 5:28 PM
> To: Zhoujian (jay) ; qemu-devel@nongnu.org;
> k...@vger.kernel.org
> Cc: m...@redhat.com; coh...@redhat.com; pet...@redhat.com; Wangxin
> (Alexander, Cloud Infrastructure Service Product Dept.)
> ; Huangweidong (C)
> ; Liujinsong (Paul) 
> Subject: Re: [PATCH] kvm: support to get/set dirty log initial-all-set 
> capability
> 
> On 12/06/20 05:01, Zhoujian (jay) wrote:
> >
> >
> >> -Original Message-
> >> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> >> Sent: Wednesday, March 18, 2020 6:48 PM
> >> To: Zhoujian (jay) ; qemu-devel@nongnu.org;
> >> k...@vger.kernel.org
> >> Cc: m...@redhat.com; coh...@redhat.com; pet...@redhat.com; wangxin (U)
> >> ; Huangweidong (C)
> >> ; Liujinsong (Paul)
> >> 
> >> Subject: Re: [PATCH] kvm: support to get/set dirty log
> >> initial-all-set capability
> >>
> >> On 04/03/20 03:55, Jay Zhou wrote:
> >>> Since the new capability KVM_DIRTY_LOG_INITIALLY_SET of
> >>> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 has been introduced in the
> kernel,
> >>> tweak the userspace side to detect and enable this capability.
> >>>
> >>> Signed-off-by: Jay Zhou 
> >>> ---
> >>>  accel/kvm/kvm-all.c   | 21 ++---
> >>>  linux-headers/linux/kvm.h |  3 +++
> >>>  2 files changed, 17 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index
> >>> 439a4efe52..45ab25be63 100644
> >>> --- a/accel/kvm/kvm-all.c
> >>> +++ b/accel/kvm/kvm-all.c
> >>> @@ -100,7 +100,7 @@ struct KVMState
> >>>  bool kernel_irqchip_required;
> >>>  OnOffAuto kernel_irqchip_split;
> >>>  bool sync_mmu;
> >>> -bool manual_dirty_log_protect;
> >>> +uint64_t manual_dirty_log_protect;
> >>>  /* The man page (and posix) say ioctl numbers are signed int, but
> >>>   * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
> >>>   * unsigned, and treating them as signed here can break things
> >>> */ @@ -1882,6 +1882,7 @@ static int kvm_init(MachineState *ms)
> >>>  int ret;
> >>>  int type = 0;
> >>>  const char *kvm_type;
> >>> +uint64_t dirty_log_manual_caps;
> >>>
> >>>  s = KVM_STATE(ms->accelerator);
> >>>
> >>> @@ -2007,14 +2008,20 @@ static int kvm_init(MachineState *ms)
> >>>  s->coalesced_pio = s->coalesced_mmio &&
> >>> kvm_check_extension(s,
> >> KVM_CAP_COALESCED_PIO);
> >>>
> >>> -s->manual_dirty_log_protect =
> >>> +dirty_log_manual_caps =
> >>>  kvm_check_extension(s,
> >> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> >>> -if (s->manual_dirty_log_protect) {
> >>> -ret = kvm_vm_enable_cap(s,
> >> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0, 1);
> >>> +dirty_log_manual_caps &=
> >> (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
> >>> +  KVM_DIRTY_LOG_INITIALLY_SET);
> >>> +s->manual_dirty_log_protect = dirty_log_manual_caps;
> >>> +if (dirty_log_manual_caps) {
> >>> +ret = kvm_vm_enable_cap(s,
> >> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0,
> >>> +   dirty_log_manual_caps);
> >>>  if (ret) {
> >>> -warn_report("Trying to enable
> >> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 "
> >>> -"but failed.  Falling back to the legacy mode.
> ");
> >>> -s->manual_dirty_log_protect = false;
> >>> +warn_report("Trying to enable capability %"PRIu64" of "
> >>> +"KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
> >> but failed. "
> >>> +"Falling back to the legacy mode. ",
> >>> +dirty_log_manual_caps);
> >>> +s->manual_dirty_log_protect = 0;
> >>>  }
> >>>  }
> >>>
> >>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> >>> index 265099100e..3cb71c2b19 100644
> >>> --- a/linux-headers/linux/kvm.h
> >>> +++ b/linux-headers/linux/kvm.h
> >>> @@ -1628,4 +1628,7 @@ struct kvm_hyperv_eventfd {
> >>>  #define KVM_HYPERV_CONN_ID_MASK  0x00ff
> >>>  #define KVM_HYPERV_EVENTFD_DEASSIGN  (1 << 0)
> >>>
> >>> +#define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE(1 << 0)
> >>> +#define KVM_DIRTY_LOG_INITIALLY_SET(1 << 1)
> >>> +
> >>>  #endif /* __LINUX_KVM_H */
> >>>
> >>
> >> Queued, thanks.
> >>
> >
> > Hi Paolo,
> >
> > It seems that this patch isn't included in your last pull request...
> > If there's something else to be done, please let me know.
> 
> Sorry, I thought mistakenly that it was a 5.8 feature (so it would have to 
> wait for
> the 5.8-rc1 release and header update).  It's still queued though.

Okay, never mind, :)

Regards,
Jay Zhou


RE: [PATCH] kvm: support to get/set dirty log initial-all-set capability

2020-06-11 Thread Zhoujian (jay)



> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Wednesday, March 18, 2020 6:48 PM
> To: Zhoujian (jay) ; qemu-devel@nongnu.org;
> k...@vger.kernel.org
> Cc: m...@redhat.com; coh...@redhat.com; pet...@redhat.com; wangxin (U)
> ; Huangweidong (C)
> ; Liujinsong (Paul) 
> Subject: Re: [PATCH] kvm: support to get/set dirty log initial-all-set 
> capability
> 
> On 04/03/20 03:55, Jay Zhou wrote:
> > Since the new capability KVM_DIRTY_LOG_INITIALLY_SET of
> > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 has been introduced in the kernel,
> > tweak the userspace side to detect and enable this capability.
> >
> > Signed-off-by: Jay Zhou 
> > ---
> >  accel/kvm/kvm-all.c   | 21 ++---
> >  linux-headers/linux/kvm.h |  3 +++
> >  2 files changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index
> > 439a4efe52..45ab25be63 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -100,7 +100,7 @@ struct KVMState
> >  bool kernel_irqchip_required;
> >  OnOffAuto kernel_irqchip_split;
> >  bool sync_mmu;
> > -bool manual_dirty_log_protect;
> > +uint64_t manual_dirty_log_protect;
> >  /* The man page (and posix) say ioctl numbers are signed int, but
> >   * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
> >   * unsigned, and treating them as signed here can break things */
> > @@ -1882,6 +1882,7 @@ static int kvm_init(MachineState *ms)
> >  int ret;
> >  int type = 0;
> >  const char *kvm_type;
> > +uint64_t dirty_log_manual_caps;
> >
> >  s = KVM_STATE(ms->accelerator);
> >
> > @@ -2007,14 +2008,20 @@ static int kvm_init(MachineState *ms)
> >  s->coalesced_pio = s->coalesced_mmio &&
> > kvm_check_extension(s,
> KVM_CAP_COALESCED_PIO);
> >
> > -s->manual_dirty_log_protect =
> > +dirty_log_manual_caps =
> >  kvm_check_extension(s,
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> > -if (s->manual_dirty_log_protect) {
> > -ret = kvm_vm_enable_cap(s,
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0, 1);
> > +dirty_log_manual_caps &=
> (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
> > +  KVM_DIRTY_LOG_INITIALLY_SET);
> > +s->manual_dirty_log_protect = dirty_log_manual_caps;
> > +if (dirty_log_manual_caps) {
> > +ret = kvm_vm_enable_cap(s,
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, 0,
> > +   dirty_log_manual_caps);
> >  if (ret) {
> > -warn_report("Trying to enable
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 "
> > -"but failed.  Falling back to the legacy mode. ");
> > -s->manual_dirty_log_protect = false;
> > +warn_report("Trying to enable capability %"PRIu64" of "
> > +"KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
> but failed. "
> > +"Falling back to the legacy mode. ",
> > +dirty_log_manual_caps);
> > +s->manual_dirty_log_protect = 0;
> >  }
> >  }
> >
> > diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> > index 265099100e..3cb71c2b19 100644
> > --- a/linux-headers/linux/kvm.h
> > +++ b/linux-headers/linux/kvm.h
> > @@ -1628,4 +1628,7 @@ struct kvm_hyperv_eventfd {
> >  #define KVM_HYPERV_CONN_ID_MASK0x00ff
> >  #define KVM_HYPERV_EVENTFD_DEASSIGN(1 << 0)
> >
> > +#define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE(1 << 0)
> > +#define KVM_DIRTY_LOG_INITIALLY_SET(1 << 1)
> > +
> >  #endif /* __LINUX_KVM_H */
> >
> 
> Queued, thanks.
> 

Hi Paolo,

It seems that this patch isn't included in your last pull request...
If there's something else to be done, please let me know.

Regards,
Jay Zhou



Question about vhost-user-gpu on Arm64

2020-06-04 Thread Zhoujian (jay)
Hi,

I want to test vhost-user-gpu on arm64 recently and follow the steps listed in
section "vhost-user virtio gpu"[1].
So I compiled the qemu with CONFIG_VIRTIO_VGA=y and
CONFIG_VHOST_USER_VGA=y forcely, the vhost-user-gpu gpu emulation
process and qemu process started successfully, but the qemu reported repeatly:

"qemu-system-aarch64: console doesn't support GL!"
"qemu-system-aarch64: console doesn't support dmabuf!"

Something seems wrong. Reading the qemu source code, the parent of
TYPE_VHOST_USER_VGA is TYPE_VIRTIO_VGA_BASE, which seems not to be
supported by arm("On arm systems display devices with a pci memory bar
do not work", also documented in [1]).

I maybe missed something, may I ask that does vhost-user-gpu supported
on arm64? Thanks,

Jay Zhou

[1] https://www.kraxel.org/blog/2019/09/display-devices-in-qemu/



RE: RFC: Split EPT huge pages in advance of dirty logging

2020-03-02 Thread Zhoujian (jay)


From: Peter Feiner [mailto:pfei...@google.com] 
Sent: Tuesday, March 3, 2020 12:29 AM
To: Zhoujian (jay) 
Cc: Ben Gardon ; Peter Xu ; 
k...@vger.kernel.org; qemu-devel@nongnu.org; pbonz...@redhat.com; 
dgilb...@redhat.com; quint...@redhat.com; Liujinsong (Paul) 
; linfeng (M) ; wangxin (U) 
; Huangweidong (C) ; 
Junaid Shahid 
Subject: Re: RFC: Split EPT huge pages in advance of dirty logging

> -Original Message-
> From: Peter Feiner [mailto:pfei...@google.com]

[...]

>Hi Jay,
>I've been sick since I sent my last email, so I haven't gotten to this patch 
>set yet. I'll send it in the next week or two. 

OK, please take care of yourself.


Regards,
Jay Zhou


RE: RFC: Split EPT huge pages in advance of dirty logging

2020-03-02 Thread Zhoujian (jay)


> -Original Message-
> From: Peter Feiner [mailto:pfei...@google.com]
> Sent: Saturday, February 22, 2020 8:19 AM
> To: Junaid Shahid 
> Cc: Ben Gardon ; Zhoujian (jay)
> ; Peter Xu ;
> k...@vger.kernel.org; qemu-devel@nongnu.org; pbonz...@redhat.com;
> dgilb...@redhat.com; quint...@redhat.com; Liujinsong (Paul)
> ; linfeng (M) ; wangxin (U)
> ; Huangweidong (C)
> 
> Subject: Re: RFC: Split EPT huge pages in advance of dirty logging
> 
> On Fri, Feb 21, 2020 at 2:08 PM Junaid Shahid  wrote:
> >
> > On 2/20/20 9:34 AM, Ben Gardon wrote:
> > >
> > > FWIW, we currently do this eager splitting at Google for live
> > > migration. When the log-dirty-memory flag is set on a memslot we
> > > eagerly split all pages in the slot down to 4k granularity.
> > > As Jay said, this does not cause crippling lock contention because
> > > the vCPU page faults generated by write protection / splitting can
> > > be resolved in the fast page fault path without acquiring the MMU lock.
> > > I believe +Junaid Shahid tried to upstream this approach at some
> > > point in the past, but the patch set didn't make it in. (This was
> > > before my time, so I'm hoping he has a link.) I haven't done the
> > > analysis to know if eager splitting is more or less efficient with
> > > parallel slow-path page faults, but it's definitely faster under the
> > > MMU lock.
> > >
> >
> > I am not sure if we ever posted those patches upstream. Peter Feiner would
> know for sure. One notable difference in what we do compared to the approach
> outlined by Jay is that we don't rely on tdp_page_fault() to do the 
> splitting. So we
> don't have to create a dummy VCPU and the specialized split function is also
> much faster.
> 
> We've been carrying these patches since 2015. I've never posted them.
> Getting them in shape for upstream consumption will take some work. I can look
> into this next week.

Hi Peter Feiner,

May I ask any new updates about your plan? Sorry to disturb.

Regards,
Jay Zhou


RE: [edk2-devel] A problem with live migration of UEFI virtual machines

2020-02-27 Thread Zhoujian (jay)
Hi Laszlo,

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+jianjay.zhou=huawei@nongnu.org] On Behalf
> Of Laszlo Ersek
> Sent: Wednesday, February 26, 2020 5:42 PM
> To: Andrew Fish ; de...@edk2.groups.io
> Cc: berra...@redhat.com; qemu-devel@nongnu.org; Dr. David Alan Gilbert
> ; zhoujianjay ; discuss
> ; Alex Bennée ;
> wuchenye1995 
> Subject: Re: [edk2-devel] A problem with live migration of UEFI virtual 
> machines
> 
> Hi Andrew,
> 
> On 02/25/20 22:35, Andrew Fish wrote:
> 
> > Laszlo,
> >
> > The FLASH offsets changing breaking things makes sense.
> >
> > I now realize this is like updating the EFI ROM without rebooting the
> > system.  Thus changes in how the new EFI code works is not the issue.
> >
> > Is this migration event visible to the firmware? Traditionally the
> > NVRAM is a region in the FD so if you update the FD you have to skip
> > NVRAM region or save and restore it. Is that activity happening in
> > this case? Even if the ROM layout does not change how do you not lose
> > the contents of the NVRAM store when the live migration happens? Sorry
> > if this is a remedial question but I'm trying to learn how this
> > migration works.
> 
> With live migration, the running guest doesn't notice anything. This is a 
> general
> requirement for live migration (regardless of UEFI or flash).
> 
> You are very correct to ask about "skipping" the NVRAM region. With the
> approach that OvmfPkg originally supported, live migration would simply be
> unfeasible. The "build" utility would produce a single (unified) OVMF.fd 
> file, which
> would contain both NVRAM and executable regions, and the guest's variable
> updates would modify the one file that would exist.
> This is inappropriate even without considering live migration, because OVMF
> binary upgrades (package updates) on the virtualization host would force 
> guests
> to lose their private variable stores (NVRAMs).
> 
> Therefore, the "build" utility produces "split" files too, in addition to the 
> unified
> OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.
> OVMF.fd is simply the concatenation of the latter two.
> 
> $ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd [prints nothing]
> 
> When you define a new domain (VM) on a virtualization host, the domain
> definition saves a reference (pathname) to the OVMF_CODE.fd file.
> However, the OVMF_VARS.fd file (the variable store *template*) is not directly
> referenced; instead, it is *copied* into a separate (private) file for the 
> domain.
> 
> Furthermore, once booted, guest has two flash chips, one that maps the
> firmware executable OVMF_CODE.fd read-only, and another pflash chip that
> maps its private varstore file read-write.
> 
> This makes it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via
> package upgrades on the virt host) without messing with varstores that were
> earlier instantiated from OVMF_VARS.fd. What's important here is that the
> various constants in the new (upgraded) OVMF_CODE.fd file remain compatible
> with the *old* OVMF_VARS.fd structure, across package upgrades.
> 
> If that's not possible for introducing e.g. a new feature, then the package
> upgrade must not overwrite the OVMF_CODE.fd file in place, but must provide an
> additional firmware binary. This firmware binary can then only be used by 
> freshly
> defined domains (old domains cannot be switched over). Old domains can be
> switched over manually -- and only if the sysadmin decides it is OK to lose 
> the
> current variable store contents. Then the old varstore file for the domain is
> deleted (manually), the domain definition is updated, and then a new 
> (logically
> empty, pristine) varstore can be created from the *new* OVMF_2_VARS.fd that
> matches the *new* OVMF_2_CODE.fd.
> 
> 
> During live migration, the "RAM-like" contents of both pflash chips are 
> migrated
> (the guest-side view of both chips remains the same, including the case when 
> the
> writeable chip happens to be in "programming mode", i.e., during a UEFI 
> variable
> write through the Fault Tolerant Write and Firmware Volume Block(2) 
> protocols).
> 
> Once live migration completes, QEMU dumps the full contents of the writeable
> chip to the backing file (on the destination host). Going forward, flash 
> writes from
> within the guest are reflected to said host-side file on-line, just like it 
> happened
> on the source host before live migration. If the file backing the r/w pflash 
> chip is
> on NFS (shared by both src and dst hosts), then this one-time dumping when the
> migration completes is superfluous, but it's also harmless.
> 
> The interesting question is, what happens when you power down the VM on the
> destination host (= post migration), and launch it again there, from zero. In 
> that
> case, the firmware executable file comes from the *destination host* (it was
> never persistently migrated from the source host, i.e. never written out on 
> the
> dst). It simply comes from 

RE: RFC: Split EPT huge pages in advance of dirty logging

2020-02-23 Thread Zhoujian (jay)


> -Original Message-
> From: Peter Feiner [mailto:pfei...@google.com]
> Sent: Saturday, February 22, 2020 8:19 AM
> To: Junaid Shahid 
> Cc: Ben Gardon ; Zhoujian (jay)
> ; Peter Xu ;
> k...@vger.kernel.org; qemu-devel@nongnu.org; pbonz...@redhat.com;
> dgilb...@redhat.com; quint...@redhat.com; Liujinsong (Paul)
> ; linfeng (M) ; wangxin (U)
> ; Huangweidong (C)
> 
> Subject: Re: RFC: Split EPT huge pages in advance of dirty logging
> 
> On Fri, Feb 21, 2020 at 2:08 PM Junaid Shahid  wrote:
> >
> > On 2/20/20 9:34 AM, Ben Gardon wrote:
> > >
> > > FWIW, we currently do this eager splitting at Google for live
> > > migration. When the log-dirty-memory flag is set on a memslot we
> > > eagerly split all pages in the slot down to 4k granularity.
> > > As Jay said, this does not cause crippling lock contention because
> > > the vCPU page faults generated by write protection / splitting can
> > > be resolved in the fast page fault path without acquiring the MMU lock.
> > > I believe +Junaid Shahid tried to upstream this approach at some
> > > point in the past, but the patch set didn't make it in. (This was
> > > before my time, so I'm hoping he has a link.) I haven't done the
> > > analysis to know if eager splitting is more or less efficient with
> > > parallel slow-path page faults, but it's definitely faster under the
> > > MMU lock.
> > >
> >
> > I am not sure if we ever posted those patches upstream. Peter Feiner would
> know for sure. One notable difference in what we do compared to the approach
> outlined by Jay is that we don't rely on tdp_page_fault() to do the 
> splitting. So
> we don't have to create a dummy VCPU and the specialized split function is 
> also
> much faster.

I'm curious and interested in the way you implemented, especially you mentioned
that the performance is much faster without a dummy VCPU.

> We've been carrying these patches since 2015. I've never posted them.
> Getting them in shape for upstream consumption will take some work. I can
> look into this next week.

It will be nice if you're going to post it to the upstream.

Regards,
Jay Zhou

> 
> Peter


RE: RFC: Split EPT huge pages in advance of dirty logging

2020-02-20 Thread Zhoujian (jay)


> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Friday, February 21, 2020 2:17 AM
> To: Ben Gardon 
> Cc: Zhoujian (jay) ; Junaid Shahid
> ; k...@vger.kernel.org; qemu-devel@nongnu.org;
> pbonz...@redhat.com; dgilb...@redhat.com; quint...@redhat.com;
> Liujinsong (Paul) ; linfeng (M)
> ; wangxin (U) ;
> Huangweidong (C) 
> Subject: Re: RFC: Split EPT huge pages in advance of dirty logging
> 
> On Thu, Feb 20, 2020 at 09:34:52AM -0800, Ben Gardon wrote:
> > On Thu, Feb 20, 2020 at 5:53 AM Zhoujian (jay) 
> wrote:
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Peter Xu [mailto:pet...@redhat.com]
> > > > Sent: Thursday, February 20, 2020 1:19 AM
> > > > To: Zhoujian (jay) 
> > > > Cc: k...@vger.kernel.org; qemu-devel@nongnu.org;
> > > > pbonz...@redhat.com; dgilb...@redhat.com; quint...@redhat.com;
> > > > Liujinsong (Paul) ; linfeng (M)
> > > > ; wangxin (U)
> ;
> > > > Huangweidong (C) 
> > > > Subject: Re: RFC: Split EPT huge pages in advance of dirty logging
> > > >
> > > > On Wed, Feb 19, 2020 at 01:19:08PM +, Zhoujian (jay) wrote:
> > > > > Hi Peter,
> > > > >
> > > > > > -Original Message-
> > > > > > From: Peter Xu [mailto:pet...@redhat.com]
> > > > > > Sent: Wednesday, February 19, 2020 1:43 AM
> > > > > > To: Zhoujian (jay) 
> > > > > > Cc: k...@vger.kernel.org; qemu-devel@nongnu.org;
> > > > pbonz...@redhat.com;
> > > > > > dgilb...@redhat.com; quint...@redhat.com; Liujinsong (Paul)
> > > > > > ; linfeng (M) ;
> > > > > > wangxin (U) ; Huangweidong (C)
> > > > > > 
> > > > > > Subject: Re: RFC: Split EPT huge pages in advance of dirty
> > > > > > logging
> > > > > >
> > > > > > On Tue, Feb 18, 2020 at 01:13:47PM +, Zhoujian (jay) wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > We found that the guest will be soft-lockup occasionally
> > > > > > > when live migrating a 60 vCPU, 512GiB huge page and memory
> > > > > > > sensitive VM. The reason is clear, almost all of the vCPUs
> > > > > > > are waiting for the KVM MMU spin-lock to create 4K SPTEs
> > > > > > > when the huge pages are write protected. This
> > > > > > phenomenon is also described in this patch set:
> > > > > > > https://patchwork.kernel.org/cover/11163459/
> > > > > > > which aims to handle page faults in parallel more efficiently.
> > > > > > >
> > > > > > > Our idea is to use the migration thread to touch all of the
> > > > > > > guest memory in the granularity of 4K before enabling dirty
> > > > > > > logging. To be more specific, we split all the PDPE_LEVEL
> > > > > > > SPTEs into DIRECTORY_LEVEL SPTEs as the first step, and then
> > > > > > > split all the DIRECTORY_LEVEL SPTEs into
> > > > > > PAGE_TABLE_LEVEL SPTEs as the following step.
> > > > > >
> > > > > > IIUC, QEMU will prefer to use huge pages for all the anonymous
> > > > > > ramblocks (please refer to ram_block_add):
> > > > > >
> > > > > > qemu_madvise(new_block->host, new_block->max_length,
> > > > > > QEMU_MADV_HUGEPAGE);
> > > > >
> > > > > Yes, you're right
> > > > >
> > > > > >
> > > > > > Another alternative I can think of is to add an extra
> > > > > > parameter to QEMU to explicitly disable huge pages (so that
> > > > > > can even be MADV_NOHUGEPAGE instead of MADV_HUGEPAGE).
> > > > > > However that
> > > > should also
> > > > > > drag down the performance for the whole lifecycle of the VM.
> > > > >
> > > > > From the performance point of view, it is better to keep the
> > > > > huge pages when the VM is not in the live migration state.
> > > > >
> > > > > > A 3rd option is to make a QMP
> > > > > > command to dynamically turn huge pages on/off for ramblocks
> globally.
> > > > >
> > > > > We're searching a dynamic method too.
> >

RE: RFC: Split EPT huge pages in advance of dirty logging

2020-02-20 Thread Zhoujian (jay)


> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Thursday, February 20, 2020 1:19 AM
> To: Zhoujian (jay) 
> Cc: k...@vger.kernel.org; qemu-devel@nongnu.org; pbonz...@redhat.com;
> dgilb...@redhat.com; quint...@redhat.com; Liujinsong (Paul)
> ; linfeng (M) ; wangxin (U)
> ; Huangweidong (C)
> 
> Subject: Re: RFC: Split EPT huge pages in advance of dirty logging
> 
> On Wed, Feb 19, 2020 at 01:19:08PM +, Zhoujian (jay) wrote:
> > Hi Peter,
> >
> > > -Original Message-
> > > From: Peter Xu [mailto:pet...@redhat.com]
> > > Sent: Wednesday, February 19, 2020 1:43 AM
> > > To: Zhoujian (jay) 
> > > Cc: k...@vger.kernel.org; qemu-devel@nongnu.org;
> pbonz...@redhat.com;
> > > dgilb...@redhat.com; quint...@redhat.com; Liujinsong (Paul)
> > > ; linfeng (M) ;
> > > wangxin (U) ; Huangweidong (C)
> > > 
> > > Subject: Re: RFC: Split EPT huge pages in advance of dirty logging
> > >
> > > On Tue, Feb 18, 2020 at 01:13:47PM +, Zhoujian (jay) wrote:
> > > > Hi all,
> > > >
> > > > We found that the guest will be soft-lockup occasionally when live
> > > > migrating a 60 vCPU, 512GiB huge page and memory sensitive VM. The
> > > > reason is clear, almost all of the vCPUs are waiting for the KVM
> > > > MMU spin-lock to create 4K SPTEs when the huge pages are write
> > > > protected. This
> > > phenomenon is also described in this patch set:
> > > > https://patchwork.kernel.org/cover/11163459/
> > > > which aims to handle page faults in parallel more efficiently.
> > > >
> > > > Our idea is to use the migration thread to touch all of the guest
> > > > memory in the granularity of 4K before enabling dirty logging. To
> > > > be more specific, we split all the PDPE_LEVEL SPTEs into
> > > > DIRECTORY_LEVEL SPTEs as the first step, and then split all the
> > > > DIRECTORY_LEVEL SPTEs into
> > > PAGE_TABLE_LEVEL SPTEs as the following step.
> > >
> > > IIUC, QEMU will prefer to use huge pages for all the anonymous
> > > ramblocks (please refer to ram_block_add):
> > >
> > > qemu_madvise(new_block->host, new_block->max_length,
> > > QEMU_MADV_HUGEPAGE);
> >
> > Yes, you're right
> >
> > >
> > > Another alternative I can think of is to add an extra parameter to
> > > QEMU to explicitly disable huge pages (so that can even be
> > > MADV_NOHUGEPAGE instead of MADV_HUGEPAGE).  However that
> should also
> > > drag down the performance for the whole lifecycle of the VM.
> >
> > From the performance point of view, it is better to keep the huge
> > pages when the VM is not in the live migration state.
> >
> > > A 3rd option is to make a QMP
> > > command to dynamically turn huge pages on/off for ramblocks globally.
> >
> > We're searching a dynamic method too.
> > We plan to add two new flags for each memory slot, say
> > KVM_MEM_FORCE_PT_DIRECTORY_PAGES and
> > KVM_MEM_FORCE_PT_PAGE_TABLE_PAGES. These flags can be set through
> > KVM_SET_USER_MEMORY_REGION ioctl.
> >
> > The mapping_level which is called by tdp_page_fault in the kernel side
> > will return PT_DIRECTORY_LEVEL if the
> KVM_MEM_FORCE_PT_DIRECTORY_PAGES
> > flag of the memory slot is set, and return PT_PAGE_TABLE_LEVEL if the
> > KVM_MEM_FORCE_PT_PAGE_TABLE_PAGES flag is set.
> >
> > The key steps to split the huge pages in advance of enabling dirty log
> > is as follows:
> > 1. The migration thread in user space uses
> KVM_SET_USER_MEMORY_REGION
> > ioctl to set the KVM_MEM_FORCE_PT_DIRECTORY_PAGES flag for each
> memory
> > slot.
> > 2. The migration thread continues to use the KVM_SPLIT_HUGE_PAGES
> > ioctl (which is newly added) to do the splitting of large pages in the
> > kernel side.
> > 3. A new vCPU is created temporally(do some initialization but will
> > not
> > run) to help to do the work, i.e. as the parameter of the tdp_page_fault.
> > 4. Collect the GPA ranges of all the memory slots with the
> > KVM_MEM_FORCE_PT_DIRECTORY_PAGES flag set.
> > 5. Split the 1G huge pages(collected in step 4) into 2M by calling
> > tdp_page_fault, since the mapping_level will return
> > PT_DIRECTORY_LEVEL. Here is the main difference from the usual path
> > which is caused by the Guest side(EPT violation/misconfig etc), we
> > call it directly in the hypervisor side.
> > 6. Do some cleanups, i.e. free the vCPU re

RE: RFC: Split EPT huge pages in advance of dirty logging

2020-02-19 Thread Zhoujian (jay)
Hi Peter,

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Wednesday, February 19, 2020 1:43 AM
> To: Zhoujian (jay) 
> Cc: k...@vger.kernel.org; qemu-devel@nongnu.org; pbonz...@redhat.com;
> dgilb...@redhat.com; quint...@redhat.com; Liujinsong (Paul)
> ; linfeng (M) ; wangxin (U)
> ; Huangweidong (C)
> 
> Subject: Re: RFC: Split EPT huge pages in advance of dirty logging
> 
> On Tue, Feb 18, 2020 at 01:13:47PM +, Zhoujian (jay) wrote:
> > Hi all,
> >
> > We found that the guest will be soft-lockup occasionally when live
> > migrating a 60 vCPU, 512GiB huge page and memory sensitive VM. The
> > reason is clear, almost all of the vCPUs are waiting for the KVM MMU
> > spin-lock to create 4K SPTEs when the huge pages are write protected. This
> phenomenon is also described in this patch set:
> > https://patchwork.kernel.org/cover/11163459/
> > which aims to handle page faults in parallel more efficiently.
> >
> > Our idea is to use the migration thread to touch all of the guest
> > memory in the granularity of 4K before enabling dirty logging. To be
> > more specific, we split all the PDPE_LEVEL SPTEs into DIRECTORY_LEVEL
> > SPTEs as the first step, and then split all the DIRECTORY_LEVEL SPTEs into
> PAGE_TABLE_LEVEL SPTEs as the following step.
> 
> IIUC, QEMU will prefer to use huge pages for all the anonymous ramblocks
> (please refer to ram_block_add):
> 
> qemu_madvise(new_block->host, new_block->max_length,
> QEMU_MADV_HUGEPAGE);

Yes, you're right

> 
> Another alternative I can think of is to add an extra parameter to QEMU to
> explicitly disable huge pages (so that can even be MADV_NOHUGEPAGE
> instead of MADV_HUGEPAGE).  However that should also drag down the
> performance for the whole lifecycle of the VM.  

From the performance point of view, it is better to keep the huge pages
when the VM is not in the live migration state.

> A 3rd option is to make a QMP
> command to dynamically turn huge pages on/off for ramblocks globally.

We're searching a dynamic method too.
We plan to add two new flags for each memory slot, say
KVM_MEM_FORCE_PT_DIRECTORY_PAGES and
KVM_MEM_FORCE_PT_PAGE_TABLE_PAGES. These flags can be set
through KVM_SET_USER_MEMORY_REGION ioctl.

The mapping_level which is called by tdp_page_fault in the kernel side
will return PT_DIRECTORY_LEVEL if the
KVM_MEM_FORCE_PT_DIRECTORY_PAGES flag of the memory slot is
set, and return PT_PAGE_TABLE_LEVEL if the
KVM_MEM_FORCE_PT_PAGE_TABLE_PAGES flag is set.
 
The key steps to split the huge pages in advance of enabling dirty log is
as follows:
1. The migration thread in user space uses
KVM_SET_USER_MEMORY_REGION ioctl to set the
KVM_MEM_FORCE_PT_DIRECTORY_PAGES flag for each memory slot.
2. The migration thread continues to use the KVM_SPLIT_HUGE_PAGES
ioctl (which is newly added) to do the splitting of large pages in the
kernel side.
3. A new vCPU is created temporally(do some initialization but will not
run) to help to do the work, i.e. as the parameter of the tdp_page_fault.
4. Collect the GPA ranges of all the memory slots with the
KVM_MEM_FORCE_PT_DIRECTORY_PAGES flag set.
5. Split the 1G huge pages(collected in step 4) into 2M by calling
tdp_page_fault, since the mapping_level will return
PT_DIRECTORY_LEVEL. Here is the main difference from the usual
path which is caused by the Guest side(EPT violation/misconfig etc),
we call it directly in the hypervisor side.
6. Do some cleanups, i.e. free the vCPU related resources
7. The KVM_SPLIT_HUGE_PAGES ioctl returned to the user space side.
8. Using KVM_MEM_FORCE_PT_PAGE_TABLE_PAGES instread of
KVM_MEM_FORCE_PT_DIRECTORY_PAGES to repeat step 1 ~ step 7,
in step 5 the 2M huge pages will be splitted into 4K pages.
9. Clear the KVM_MEM_FORCE_PT_DIRECTORY_PAGES and
KVM_MEM_FORCE_PT_PAGE_TABLE_PAGES flags for each memory slot.
10. Then the migration thread calls the log_start ioctl to enable the dirty
logging, and the remaining thing is the same.

What's your take on this, thanks.

Regards,
Jay Zhou

> Haven't thought deep into any of them, but seems doable.
> 
> Thanks,
> 
> --
> Peter Xu



RFC: Split EPT huge pages in advance of dirty logging

2020-02-18 Thread Zhoujian (jay)
Hi all,

We found that the guest will be soft-lockup occasionally when live migrating a 
60 vCPU,
512GiB huge page and memory sensitive VM. The reason is clear, almost all of 
the vCPUs
are waiting for the KVM MMU spin-lock to create 4K SPTEs when the huge pages are
write protected. This phenomenon is also described in this patch set:
https://patchwork.kernel.org/cover/11163459/
which aims to handle page faults in parallel more efficiently.

Our idea is to use the migration thread to touch all of the guest memory in the
granularity of 4K before enabling dirty logging. To be more specific, we split 
all the
PDPE_LEVEL SPTEs into DIRECTORY_LEVEL SPTEs as the first step, and then split 
all
the DIRECTORY_LEVEL SPTEs into PAGE_TABLE_LEVEL SPTEs as the following step.

However, there is a side effect. It takes more time to clear the D-bits of the 
last level
SPTEs when enabling dirty logging, which is held the QEMU BQL and KVM mmu-lock
simultaneously. To solve this issue, the idea of dirty logging gradually in 
small chunks
is proposed too, here is the link for v1:
https://patchwork.kernel.org/patch/11388227/

Under the Intel(R) Xeon(R) Gold 6161 CPU @ 2.20GHz environment, some tests has
been done with a 60U256G VM which enables numa balancing using the demo we
written. We start a process which has 60 threads to randomly touch most of the
memory in VM, meanwhile count the function execution time in VM when live
migration. The change_prot_numa() is chosen since it will not release the CPU
unless its work has finished. Here is the number:

Original The demo we written
[1]  > 9s (most of the time) ~5ms
Hypervisor cost   > 90%   ~3%

[1]: execution time of the change_prot_numa() function

If the time in [1] bigger than 20s, it will be result in soft-lockup.

I know it is a little hacking to do so, but my question is: is this worth 
trying to split
EPT huge pages in advance of dirty logging?

Any advice will be appreciated, thanks.

Regards,
Jay Zhou



Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-16 Thread Zhoujian (jay)
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, August 14, 2018 9:07 PM
> To: Zhoujian (jay) ; Dr. David Alan Gilbert
> 
> Cc: qemu-devel@nongnu.org; quint...@redhat.com; wangxin (U)
> 
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 15:03, Zhoujian (jay) wrote:
> >> -Original Message-
> >> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> >> Sent: Tuesday, August 14, 2018 8:02 PM
> >> To: Dr. David Alan Gilbert 
> >> Cc: Zhoujian (jay) ; qemu-devel@nongnu.org;
> >> quint...@redhat.com; wangxin (U) 
> >> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> >>
> >> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >>>  a) Should the watchdog expire when the VM is stopped; I think it
> >>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> >>> is the bug here that the watchdog being used didn't use a virtual timer?
> >>
> >> All watchdogs do.
> >>
> >>>  b) If the watchdog expires just before the VM gets stopped, is
> >>> there a race which could hit this?  Possibly.
> >>
> >> Yes, I think it is a race that happens just before vm_stop, but I
> >> don't understand why the "qemu_clock_enable" in pause_all_vcpus does not
> prevent it.
> >
> > Hi Paolo,
> > The sequence is like this I think
> >
> >  |
> >  |  <-  watchdog expired, which set reset_requested to
> SHUTDOWN_CAUSE_GUEST_RESET
> >  |
> >  |  <-  migration thread sets to RUN_STATE_FINISH_MIGRATE, it
> will disable QEMU_CLOCK_VIRTUAL clock,
> >  |  but it is done after the setting of reset_requested
> 
> So the fix would be to process the reset request here?  (In do_vm_stop or
> pause_all_vcpus).  The code is currently in main_loop_should_exit().

After a second thought, I think it should keep the reset request process
in main_loop_should_exit(), since pause_all_vcpus(or do_vm_stop) is not in
a loop, it can't detect all the reset requests immediately.
If processing the reset request both in main_loop_should_exit() and do_vm_stop
or pause_all_vcpus will lead to a race of referencing to the global variable
'reset_requested'.
Could we add the check !runstate_check(RUN_STATE_FINISH_MIGRATE) before setting
to RUN_STATE_PRELAUNCH, just like !runstate_check(RUN_STATE_RUNNING) and
!runstate_check(RUN_STATE_INMIGRATE) did? But I'm not sure whether this will
cause any side effect.

Regards,
Jay Zhou

> 
> Paolo
> 
> >  |  <-  main loop thread sets to RUN_STATE_PRELAUNCH since it
> detected a reset request
> >  |
> >  |  <-  migration thread sets to RUN_STATE_POSTMIGRATE
> >
> >
> > Regards,
> > Jay Zhou
> >
> >>
> >> It should be possible to write a deterministic testcase with qtest...
> >>
> >> Paolo



Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Zhoujian (jay)
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, August 14, 2018 9:07 PM
> To: Zhoujian (jay) ; Dr. David Alan Gilbert
> 
> Cc: qemu-devel@nongnu.org; quint...@redhat.com; wangxin (U)
> 
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 15:03, Zhoujian (jay) wrote:
> >> -Original Message-
> >> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> >> Sent: Tuesday, August 14, 2018 8:02 PM
> >> To: Dr. David Alan Gilbert 
> >> Cc: Zhoujian (jay) ; qemu-devel@nongnu.org;
> >> quint...@redhat.com; wangxin (U) 
> >> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> >>
> >> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >>>  a) Should the watchdog expire when the VM is stopped; I think it
> >>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> >>> is the bug here that the watchdog being used didn't use a virtual timer?
> >>
> >> All watchdogs do.
> >>
> >>>  b) If the watchdog expires just before the VM gets stopped, is
> >>> there a race which could hit this?  Possibly.
> >>
> >> Yes, I think it is a race that happens just before vm_stop, but I
> >> don't understand why the "qemu_clock_enable" in pause_all_vcpus does not
> prevent it.
> >
> > Hi Paolo,
> > The sequence is like this I think
> >
> >  |
> >  |  <-  watchdog expired, which set reset_requested to
> SHUTDOWN_CAUSE_GUEST_RESET
> >  |
> >  |  <-  migration thread sets to RUN_STATE_FINISH_MIGRATE, it
> will disable QEMU_CLOCK_VIRTUAL clock,
> >  |  but it is done after the setting of reset_requested
> 
> So the fix would be to process the reset request here?  (In do_vm_stop or
> pause_all_vcpus).  The code is currently in main_loop_should_exit().

I think this makes sense, process the reset request after the QEMU_CLOCK_VIRTUAL
disabled, will have a try.

Regards,
Jay Zhou

> 
> Paolo
> 
> >  |  <-  main loop thread sets to RUN_STATE_PRELAUNCH since it
> detected a reset request
> >  |
> >  |  <-  migration thread sets to RUN_STATE_POSTMIGRATE
> >
> >
> > Regards,
> > Jay Zhou
> >
> >>
> >> It should be possible to write a deterministic testcase with qtest...
> >>
> >> Paolo



Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Zhoujian (jay)
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, August 14, 2018 8:02 PM
> To: Dr. David Alan Gilbert 
> Cc: Zhoujian (jay) ; qemu-devel@nongnu.org;
> quint...@redhat.com; wangxin (U) 
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >  a) Should the watchdog expire when the VM is stopped; I think it
> > shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so is
> > the bug here that the watchdog being used didn't use a virtual timer?
> 
> All watchdogs do.
> 
> >  b) If the watchdog expires just before the VM gets stopped, is there
> > a race which could hit this?  Possibly.
> 
> Yes, I think it is a race that happens just before vm_stop, but I don't
> understand why the "qemu_clock_enable" in pause_all_vcpus does not prevent it.

Hi Paolo,
The sequence is like this I think

 |
 |  <-  watchdog expired, which set reset_requested to 
SHUTDOWN_CAUSE_GUEST_RESET
 | 
 |  <-  migration thread sets to RUN_STATE_FINISH_MIGRATE, it will 
disable QEMU_CLOCK_VIRTUAL clock,
 |  but it is done after the setting of reset_requested
 |
 |  <-  main loop thread sets to RUN_STATE_PRELAUNCH since it 
detected a reset request
 |
 |  <-  migration thread sets to RUN_STATE_POSTMIGRATE


Regards,
Jay Zhou

> 
> It should be possible to write a deterministic testcase with qtest...
> 
> Paolo


Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Zhoujian (jay)
> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Tuesday, August 14, 2018 7:52 PM
> To: Paolo Bonzini 
> Cc: Zhoujian (jay) ; qemu-devel@nongnu.org;
> quint...@redhat.com; wangxin (U) 
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> * Paolo Bonzini (pbonz...@redhat.com) wrote:
> > On 14/08/2018 12:48, Jay Zhou wrote:
> > > I got the following error when migrating a VM with watchdog
> > > device:
> > >
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 668099},
> > > "event": "WATCHDOG", "data": {"action": "reset"}}
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 677658},
> > > "event": "RESET", "data": {"guest": true}}
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 677874},
> > > "event": "STOP"}
> > > qemu-system-x86_64: invalid runstate transition: 'prelaunch' ->
> 'postmigrate'
> > > Aborted
> > >
> > > The run state transition is RUN_STATE_FINISH_MIGRATE to
> > > RUN_STATE_PRELAUNCH, then the migration thread aborted when it tries to
> set RUN_STATE_POSTMIGRATE.
> > > There is a race between the main loop thread and the migration thread I
> think.
> >
> > In that case I think you shouldn't go to POSTMIGRATE at all, because
> > the VM has been reset.
> 
> Migration has the VM stopped; it's not expecting the state to change at that
> point.
> 
> > Alternatively, when the watchdog fires in RUN_STATE_FINISH_MIGRATE
> > state, it might delay the action until after the "cont" command is
> > invoked on the source, but I'm not sure what's the best way to achieve
> > that...
> 
> Jay: Which watchdog were you using?

Hi Dave,
it is i6300esb, which uses QEMU_CLOCK_VIRTUAL.

> 
>  a) Should the watchdog expire when the VM is stopped; I think it shouldn't -
> hw/acpi/tco.c uses a virtual timer as does i6300esb; so is the bug here that
> the watchdog being used didn't use a virtual timer?
> 
>  b) If the watchdog expires just before the VM gets stopped, is there a race
> which could hit this?  Possibly.

This is the case I met I think.

Regards,
Jay Zhou

> 
>  c) Could main_loop_should_exit guard all the 'request's by something that
> checks whether the VM is stopped?
> 
> Dave
> 
> 
> > Paolo
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH] cryptodev: remove dead code

2018-07-30 Thread Zhoujian (jay)
Hi Paolo,

I've posted a patch before, pls see
https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg05215.html

which reviewed by Stefan
https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg01549.html

It seems that it hasn't been merged. Could you pick it up pls?

Regards,
Jay

> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-
> bounces+jianjay.zhou=huawei@nongnu.org] On Behalf Of Paolo Bonzini
> Sent: Monday, July 30, 2018 4:51 PM
> To: qemu-devel@nongnu.org
> Subject: [Qemu-devel] [PATCH] cryptodev: remove dead code
> 
> Reported by Coverity as CID 1390600.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  backends/cryptodev-vhost-user.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-
> user.c index d52daccfcd..d539f14d59 100644
> --- a/backends/cryptodev-vhost-user.c
> +++ b/backends/cryptodev-vhost-user.c
> @@ -157,7 +157,6 @@ static void cryptodev_vhost_user_event(void *opaque, int
> event)  {
>  CryptoDevBackendVhostUser *s = opaque;
>  CryptoDevBackend *b = CRYPTODEV_BACKEND(s);
> -Error *err = NULL;
>  int queues = b->conf.peers.queues;
> 
>  assert(queues < MAX_CRYPTO_QUEUE_NUM); @@ -174,10 +173,6 @@ static void
> cryptodev_vhost_user_event(void *opaque, int event)
>  cryptodev_vhost_user_stop(queues, s);
>  break;
>  }
> -
> -if (err) {
> -error_report_err(err);
> -}
>  }
> 
>  static void cryptodev_vhost_user_init(
> --
> 2.17.1
> 




Re: [Qemu-devel] [PATCH RESEND v10] vhost: used_memslots refactoring

2018-05-01 Thread Zhoujian (jay)
Hi Michael and Igor,
I'm afraid this patch has been missed, could you help to review?

PS: In order to avoid side effect, code redundancy does exist between
vhost_user_set_used_memslots(), vhost_user_set_mem_table() and
vhost_user_set_mem_table_postcopy(), I haven't found a good solution to
solve it since my limited knowledge. But the counting for vhost-user
and vhost-kernel's used_memslots are okay now, I think. Pls let me know
if there're issues to be dealt with.

Regards,
Jay

> -Original Message-
> From: Zhoujian (jay)
> Sent: Tuesday, March 27, 2018 12:14 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Zhoujian (jay) <jianjay.z...@huawei.com>;
> Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: [PATCH RESEND v10] vhost: used_memslots refactoring
> 
> Used_memslots is shared by vhost kernel and user, it is equal to
> dev->mem->nregions, which is correct for vhost kernel, but not for
> vhost user, the latter one uses memory regions that have file descriptor. E.g.
> a VM has a vhost-user NIC and 8(vhost user memslot upper limit) memory slots,
> it will be failed to hotplug a new DIMM device since vhost_has_free_slot()
> finds no free slot left. It should be successful if only part of memory slots
> have file descriptor, so setting used memslots for vhost-user and vhost-
> kernel respectively.
> 
> Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> Signed-off-by: Liuzhe <liuzh...@huawei.com>
> ---
> 
> v10:
>  - fix misaligned access to structures
>  - refine on setting used_memslots for vhost-user to
>avoid side effect
>  - distinguish "has free memslots" and "out of memslots"
> v7 ... v9:
>  - rebased on the master
> v2 ... v6:
>  - delete the "used_memslots" global variable, and add it
>for vhost-user and vhost-kernel separately
>  - refine the function, commit log
>  - used_memslots refactoring
> 
>  hw/virtio/vhost-backend.c | 21 -
>  hw/virtio/vhost-user.c| 36 +---
>  hw/virtio/vhost.c | 13 ++---
>  include/hw/virtio/vhost-backend.h |  8 ++--
>  4 files changed, 65 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c index
> 7f09efa..3539b5f 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -15,6 +15,8 @@
>  #include "hw/virtio/vhost-backend.h"
>  #include "qemu/error-report.h"
> 
> +static unsigned int vhost_kernel_used_memslots;
> +
>  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int
> request,
>   void *arg)  { @@ -62,6 +64,21 @@ static int
> vhost_kernel_memslots_limit(struct vhost_dev *dev)
>  return limit;
>  }
> 
> +static void vhost_kernel_set_used_memslots(struct vhost_dev *dev) {
> +vhost_kernel_used_memslots = dev->mem->nregions; }
> +
> +static bool vhost_kernel_has_free_memslots(struct vhost_dev *dev) {
> +return vhost_kernel_used_memslots <
> +vhost_kernel_memslots_limit(dev); }
> +
> +static bool vhost_kernel_out_of_memslots(struct vhost_dev *dev) {
> +return vhost_kernel_used_memslots >
> +vhost_kernel_memslots_limit(dev); }
> +
>  static int vhost_kernel_net_set_backend(struct vhost_dev *dev,
>  struct vhost_vring_file *file)  { @@
> -237,7 +254,9 @@ static const VhostOps kernel_ops = {
>  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
>  .vhost_backend_init = vhost_kernel_init,
>  .vhost_backend_cleanup = vhost_kernel_cleanup,
> -.vhost_backend_memslots_limit = vhost_kernel_memslots_limit,
> +.vhost_set_used_memslots = vhost_kernel_set_used_memslots,
> +.vhost_backend_has_free_memslots = vhost_kernel_has_free_memslots,
> +.vhost_out_of_memslots = vhost_kernel_out_of_memslots,
>  .vhost_net_set_backend = vhost_kernel_net_set_backend,
>  .vhost_scsi_set_endpoint = vhost_kernel_scsi_set_endpoint,
>  .vhost_scsi_clear_endpoint = vhost_kernel_scsi_clear_endpoint, diff
> --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> 44aea5c..9691806 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -170,6 +170,8 @@ static VhostUserMsg m __attribute__ ((unused));
>  /* The version of the protocol we support */
>  #define VHOST_USER_VERSION(0x1)
> 
> +static unsigned int vhost_user_use

Re: [Qemu-devel] [PULL 10/13] cryptodev: add vhost-user as a new cryptodev backend

2018-04-27 Thread Zhoujian (jay)
Hi Peter,

> -Original Message-
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: Saturday, April 28, 2018 12:15 AM
> To: Michael S. Tsirkin <m...@redhat.com>
> Cc: QEMU Developers <qemu-devel@nongnu.org>; Gonglei (Arei)
> <arei.gong...@huawei.com>; longpeng <longpe...@huawei.com>; Zhoujian (jay)
> <jianjay.z...@huawei.com>; Paolo Bonzini <pbonz...@redhat.com>
> Subject: Re: [PULL 10/13] cryptodev: add vhost-user as a new cryptodev
> backend
> 
> On 1 March 2018 at 16:46, Michael S. Tsirkin <m...@redhat.com> wrote:
> > From: Gonglei <arei.gong...@huawei.com>
> >
> > Usage:
> >  -chardev socket,id=charcrypto0,path=/path/to/your/socket
> >  -object cryptodev-vhost-user,id=cryptodev0,chardev=charcrypto0
> >  -device virtio-crypto-pci,id=crypto0,cryptodev=cryptodev0
> >
> > Signed-off-by: Gonglei <arei.gong...@huawei.com>
> > Signed-off-by: Longpeng(Mike) <longpe...@huawei.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > Reviewed-by: Michael S. Tsirkin <m...@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> 
> Hi; Coverity (CID 1390600) points out that there's dead code in this function:
> 
> > +static void cryptodev_vhost_user_event(void *opaque, int event) {
> > +CryptoDevBackendVhostUser *s = opaque;
> > +CryptoDevBackend *b = CRYPTODEV_BACKEND(s);
> > +Error *err = NULL;
> 
> We set err to NULL here...
> 
> > +int queues = b->conf.peers.queues;
> > +
> > +assert(queues < MAX_CRYPTO_QUEUE_NUM);
> > +
> > +switch (event) {
> > +case CHR_EVENT_OPENED:
> > +if (cryptodev_vhost_user_start(queues, s) < 0) {
> > +exit(1);
> > +}
> > +b->ready = true;
> > +break;
> > +case CHR_EVENT_CLOSED:
> > +b->ready = false;
> > +cryptodev_vhost_user_stop(queues, s);
> > +break;
> > +}
> 
> ...and nothing here does anything with err...
> 
> > +
> > +if (err) {
> > +error_report_err(err);
> > +}
> 
> ...so this if() is all dead code and we could remove err entirely.

Thanks for reporting this. I'll send a patch to fix it.

Regards,
Jay


Re: [Qemu-devel] [PATCH v9] vhost: used_memslots refactoring

2018-03-20 Thread Zhoujian (jay)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, March 20, 2018 8:36 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [PATCH v9] vhost: used_memslots refactoring
> 
> On Tue, Mar 20, 2018 at 03:39:17AM +, Zhoujian (jay) wrote:
> >
> >
> > > -Original Message-
> > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > Sent: Tuesday, March 20, 2018 10:51 AM
> > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> > > <weidong.hu...@huawei.com>; wangxin (U)
> > > <wangxinxin.w...@huawei.com>; Gonglei
> > > (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler)
> > > <liuzh...@huawei.com>
> > > Subject: Re: [PATCH v9] vhost: used_memslots refactoring
> > >
> > > On Tue, Mar 20, 2018 at 02:09:34AM +, Zhoujian (jay) wrote:
> > > > Hi Michael,
> > > >
> > > > > -Original Message-
> > > > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > > > Sent: Tuesday, March 20, 2018 9:34 AM
> > > > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > > > Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> > > > > <weidong.hu...@huawei.com>; wangxin (U)
> > > > > <wangxinxin.w...@huawei.com>; Gonglei
> > > > > (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler)
> > > > > <liuzh...@huawei.com>
> > > > > Subject: Re: [PATCH v9] vhost: used_memslots refactoring
> > > > >
> > > > > On Mon, Mar 05, 2018 at 05:12:49PM +0800, Jay Zhou wrote:
> > > > > > Used_memslots is shared by vhost kernel and user, it is equal
> > > > > > to
> > > > > > dev->mem->nregions, which is correct for vhost kernel, but not
> > > > > > dev->mem->for
> > > > > > vhost user, the latter one uses memory regions that have file
> > > > > > descriptor. E.g. a VM has a vhost-user NIC and 8(vhost user
> > > > > > memslot upper limit) memory slots, it will be failed to
> > > > > > hotplug a new DIMM device since vhost_has_free_slot() finds no
> > > > > > free slot left. It should be successful if only part of memory
> > > > > > slots have file descriptor, so setting used memslots for
> > > > > > vhost-user and vhost-
> > > kernel respectively.
> > > > >
> > > > >
> > > > > Below should go after ---
> > > >
> > > > Thanks for reminding.
> > > >
> > > > >
> > > > > > v7 ... v9:
> > > > > >  - rebased on the master
> > > > > > v2 ... v6:
> > > > > >  - delete the "used_memslots" global variable, and add it
> > > > > >for vhost-user and vhost-kernel separately
> > > > > >  - refine the function, commit log
> > > > > >  - used_memslots refactoring
> > > > > >
> > > > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > > > > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > > > > > Signed-off-by: Liuzhe <liuzh...@huawei.com>
> > > > >
> > > > > When built with clang this causes runtime warnings (during make
> > > > > check) about misaligned access to structures.
> > > > >
> > > > > The issue is that vhost_user_prepare_msg requests
> > > > > VhostUserMemory which compiler assumes but is then used with a
> > > > > pointer into a packed structure - where fields are not aligned.
> > > >
> > > > Sorry I missed the patch you have sent to fix the alignment, I
> > > > have replied to that thread.
> > > >
> > > > >
> > > > >
> > > > > > ---
> > > > > >  hw/virtio/vhost-backend.c | 15 +++-
> > > > > >  hw/virtio/vhost-user.c| 77 ++-
> 
> > > 
> > > > > 
> > > >

Re: [Qemu-devel] [PATCH v9] vhost: used_memslots refactoring

2018-03-19 Thread Zhoujian (jay)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, March 20, 2018 11:14 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [PATCH v9] vhost: used_memslots refactoring
> 
> On Tue, Mar 20, 2018 at 02:09:34AM +, Zhoujian (jay) wrote:
> > Hi Michael,
> >
> > > -Original Message-
> > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > Sent: Tuesday, March 20, 2018 9:34 AM
> > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> > > <weidong.hu...@huawei.com>; wangxin (U)
> > > <wangxinxin.w...@huawei.com>; Gonglei
> > > (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler)
> > > <liuzh...@huawei.com>
> > > Subject: Re: [PATCH v9] vhost: used_memslots refactoring
> > >
> > > On Mon, Mar 05, 2018 at 05:12:49PM +0800, Jay Zhou wrote:
> > > > Used_memslots is shared by vhost kernel and user, it is equal to
> > > > dev->mem->nregions, which is correct for vhost kernel, but not for
> > > > vhost user, the latter one uses memory regions that have file
> > > > descriptor. E.g. a VM has a vhost-user NIC and 8(vhost user
> > > > memslot upper limit) memory slots, it will be failed to hotplug a
> > > > new DIMM device since vhost_has_free_slot() finds no free slot
> > > > left. It should be successful if only part of memory slots have
> > > > file descriptor, so setting used memslots for vhost-user and vhost-
> kernel respectively.
> > >
> > >
> > > Below should go after ---
> >
> > Thanks for reminding.
> >
> > >
> > > > v7 ... v9:
> > > >  - rebased on the master
> > > > v2 ... v6:
> > > >  - delete the "used_memslots" global variable, and add it
> > > >for vhost-user and vhost-kernel separately
> > > >  - refine the function, commit log
> > > >  - used_memslots refactoring
> > > >
> > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > > > Signed-off-by: Liuzhe <liuzh...@huawei.com>
> > >
> > > When built with clang this causes runtime warnings (during make
> > > check) about misaligned access to structures.
> > >
> > > The issue is that vhost_user_prepare_msg requests VhostUserMemory
> > > which compiler assumes but is then used with a pointer into a packed
> > > structure - where fields are not aligned.
> >
> > Sorry I missed the patch you have sent to fix the alignment, I have
> > replied to that thread.
> 
> I've dropped this from the pull for now.  Sorry about that.  Once next pull
> is merged, pls rebase and post a version fixing up the alignment issues.
> 
> Thanks for your effort!

Hi Michael,
Sorry for the trouble, it's okay for me, will fix the alignment issues
in the next version.

Regards,
Jay

> 
> --
> MST



Re: [Qemu-devel] [PATCH v9] vhost: used_memslots refactoring

2018-03-19 Thread Zhoujian (jay)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, March 20, 2018 10:51 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [PATCH v9] vhost: used_memslots refactoring
> 
> On Tue, Mar 20, 2018 at 02:09:34AM +, Zhoujian (jay) wrote:
> > Hi Michael,
> >
> > > -Original Message-
> > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > Sent: Tuesday, March 20, 2018 9:34 AM
> > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> > > <weidong.hu...@huawei.com>; wangxin (U)
> > > <wangxinxin.w...@huawei.com>; Gonglei
> > > (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler)
> > > <liuzh...@huawei.com>
> > > Subject: Re: [PATCH v9] vhost: used_memslots refactoring
> > >
> > > On Mon, Mar 05, 2018 at 05:12:49PM +0800, Jay Zhou wrote:
> > > > Used_memslots is shared by vhost kernel and user, it is equal to
> > > > dev->mem->nregions, which is correct for vhost kernel, but not for
> > > > vhost user, the latter one uses memory regions that have file
> > > > descriptor. E.g. a VM has a vhost-user NIC and 8(vhost user
> > > > memslot upper limit) memory slots, it will be failed to hotplug a
> > > > new DIMM device since vhost_has_free_slot() finds no free slot
> > > > left. It should be successful if only part of memory slots have
> > > > file descriptor, so setting used memslots for vhost-user and vhost-
> kernel respectively.
> > >
> > >
> > > Below should go after ---
> >
> > Thanks for reminding.
> >
> > >
> > > > v7 ... v9:
> > > >  - rebased on the master
> > > > v2 ... v6:
> > > >  - delete the "used_memslots" global variable, and add it
> > > >for vhost-user and vhost-kernel separately
> > > >  - refine the function, commit log
> > > >  - used_memslots refactoring
> > > >
> > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > > > Signed-off-by: Liuzhe <liuzh...@huawei.com>
> > >
> > > When built with clang this causes runtime warnings (during make
> > > check) about misaligned access to structures.
> > >
> > > The issue is that vhost_user_prepare_msg requests VhostUserMemory
> > > which compiler assumes but is then used with a pointer into a packed
> > > structure - where fields are not aligned.
> >
> > Sorry I missed the patch you have sent to fix the alignment, I have
> > replied to that thread.
> >
> > >
> > >
> > > > ---
> > > >  hw/virtio/vhost-backend.c | 15 +++-
> > > >  hw/virtio/vhost-user.c| 77 ++-
> 
> > > 
> > > >  hw/virtio/vhost.c | 13 +++
> > > >  include/hw/virtio/vhost-backend.h |  6 ++-
> > > >  4 files changed, 75 insertions(+), 36 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > index 7f09efa..59def69 100644
> > > > --- a/hw/virtio/vhost-backend.c
> > > > +++ b/hw/virtio/vhost-backend.c
> > > > @@ -15,6 +15,8 @@
> > > >  #include "hw/virtio/vhost-backend.h"
> > > >  #include "qemu/error-report.h"
> > > >
> > > > +static unsigned int vhost_kernel_used_memslots;
> > > > +
> > > >  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long
> > > > int
> > > request,
> > > >   void *arg)  { @@ -62,6 +64,11 @@
> > > > static int vhost_kernel_memslots_limit(struct vhost_dev *dev)
> > > >  return limit;
> > > >  }
> > > >
> > > > +static bool vhost_kernel_has_free_memslots(struct vhost_dev *dev) {
> > > > +return vhost_kernel_used_memslots <
> > > > +vhost_kernel_memslots_limit(dev); }
> > > > +
> > > >  static int vhost_kernel_net_set_backend(struct vhost_dev *dev,
>

Re: [Qemu-devel] [PATCH v9] vhost: used_memslots refactoring

2018-03-19 Thread Zhoujian (jay)
Hi Michael,

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, March 20, 2018 9:34 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [PATCH v9] vhost: used_memslots refactoring
> 
> On Mon, Mar 05, 2018 at 05:12:49PM +0800, Jay Zhou wrote:
> > Used_memslots is shared by vhost kernel and user, it is equal to
> > dev->mem->nregions, which is correct for vhost kernel, but not for
> > vhost user, the latter one uses memory regions that have file
> > descriptor. E.g. a VM has a vhost-user NIC and 8(vhost user memslot
> > upper limit) memory slots, it will be failed to hotplug a new DIMM
> > device since vhost_has_free_slot() finds no free slot left. It should
> > be successful if only part of memory slots have file descriptor, so
> > setting used memslots for vhost-user and vhost-kernel respectively.
> 
> 
> Below should go after ---

Thanks for reminding.

> 
> > v7 ... v9:
> >  - rebased on the master
> > v2 ... v6:
> >  - delete the "used_memslots" global variable, and add it
> >for vhost-user and vhost-kernel separately
> >  - refine the function, commit log
> >  - used_memslots refactoring
> >
> > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > Signed-off-by: Liuzhe <liuzh...@huawei.com>
> 
> When built with clang this causes runtime warnings (during make check) about
> misaligned access to structures.
> 
> The issue is that vhost_user_prepare_msg requests VhostUserMemory which
> compiler assumes but is then used with a pointer into a packed structure -
> where fields are not aligned.

Sorry I missed the patch you have sent to fix the alignment, I have replied
to that thread.

> 
> 
> > ---
> >  hw/virtio/vhost-backend.c | 15 +++-
> >  hw/virtio/vhost-user.c| 77 ++-
> 
> >  hw/virtio/vhost.c | 13 +++
> >  include/hw/virtio/vhost-backend.h |  6 ++-
> >  4 files changed, 75 insertions(+), 36 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 7f09efa..59def69 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -15,6 +15,8 @@
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "qemu/error-report.h"
> >
> > +static unsigned int vhost_kernel_used_memslots;
> > +
> >  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int
> request,
> >   void *arg)  { @@ -62,6 +64,11 @@ static
> > int vhost_kernel_memslots_limit(struct vhost_dev *dev)
> >  return limit;
> >  }
> >
> > +static bool vhost_kernel_has_free_memslots(struct vhost_dev *dev) {
> > +return vhost_kernel_used_memslots <
> > +vhost_kernel_memslots_limit(dev); }
> > +
> >  static int vhost_kernel_net_set_backend(struct vhost_dev *dev,
> >  struct vhost_vring_file
> > *file)  { @@ -233,11 +240,16 @@ static void
> > vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> >  qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL,
> > NULL);  }
> >
> > +static void vhost_kernel_set_used_memslots(struct vhost_dev *dev) {
> > +vhost_kernel_used_memslots = dev->mem->nregions; }
> > +
> >  static const VhostOps kernel_ops = {
> >  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> >  .vhost_backend_init = vhost_kernel_init,
> >  .vhost_backend_cleanup = vhost_kernel_cleanup,
> > -.vhost_backend_memslots_limit = vhost_kernel_memslots_limit,
> > +.vhost_backend_has_free_memslots =
> > + vhost_kernel_has_free_memslots,
> >  .vhost_net_set_backend = vhost_kernel_net_set_backend,
> >  .vhost_scsi_set_endpoint = vhost_kernel_scsi_set_endpoint,
> >  .vhost_scsi_clear_endpoint =
> > vhost_kernel_scsi_clear_endpoint, @@ -264,6 +276,7 @@ static const
> > VhostOps kernel_ops = {  #endif /* CONFIG_VHOST_VSOCK */
> >  .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> >  .vhost_send_device_iotlb_msg =
> > vhost_kernel_send_device_iotlb_msg,
> > +.vhost_set_used_memslots = vhos

Re: [Qemu-devel] [PATCH] vhost-user: avoid misaligned access

2018-03-19 Thread Zhoujian (jay)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Saturday, March 17, 2018 2:20 AM
> To: qemu-devel@nongnu.org
> Cc: Zhoujian (jay) <jianjay.z...@huawei.com>
> Subject: [PATCH] vhost-user: avoid misaligned access
> 
> We can't pass a pointer to memory field directly since it's within a packed
> structure, so isn't aligned.
> Pass a pointer on stack and copy.
> 
> Fixes: 30c4cc7 ("vhost: used_memslots refactoring")
> Cc: Jay Zhou <jianjay.z...@huawei.com>
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> ---
> 
> I had to apply this to fix make check errors with clang.
> Pls review, test and ack.
> 
> Thanks!
> 
>  hw/virtio/vhost-user.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> c12fdd9..a44ee7f 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -396,6 +396,7 @@ static int vhost_user_set_mem_table_postcopy(struct
> vhost_dev *dev,
>  bool reply_supported = virtio_has_feature(dev->protocol_features,
> 
> VHOST_USER_PROTOCOL_F_REPLY_ACK);
>  VhostUserMsg msg_reply;
> +VhostUserMemory memory = {};
>  int region_i, msg_i;
> 
>  VhostUserMsg msg = {
> @@ -407,10 +408,11 @@ static int vhost_user_set_mem_table_postcopy(struct
> vhost_dev *dev,
>  msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>  }
> 
> -if (vhost_user_prepare_msg(dev, , fds) < 0) {
> +if (vhost_user_prepare_msg(dev, , fds) < 0) {
>  error_report("Failed preparing vhost-user memory table msg");
>  return -1;
>  }
> +msg.payload.memory = memory;
> 
>  fd_num = msg.payload.memory.nregions;
> 
> @@ -549,16 +551,19 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
>  .hdr.request = VHOST_USER_SET_MEM_TABLE,
>  .hdr.flags = VHOST_USER_VERSION,
>  };
> +VhostUserMemory memory = {};
> 
>  if (reply_supported) {
>  msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>  }
> 
> -if (vhost_user_prepare_msg(dev, , fds) < 0) {
> +if (vhost_user_prepare_msg(dev, , fds) < 0) {
>  error_report("Failed preparing vhost-user memory table msg");
>  return -1;
>  }
> 
> +msg.payload.memory = memory;
> +
>  fd_num = msg.payload.memory.nregions;
> 
>  if (!fd_num) {
> @@ -1575,8 +1580,11 @@ static void vhost_user_set_used_memslots(struct
> vhost_dev *dev)  {
>  int fds[VHOST_MEMORY_MAX_NREGIONS];
>  VhostUserMsg msg;
> +VhostUserMemory memory = {};
> +
> +vhost_user_prepare_msg(dev, , fds);
> 
> -vhost_user_prepare_msg(dev, , fds);
> +msg.payload.memory = memory;
>  }

Hi Michael, here should be like this:


static void vhost_user_set_used_memslots(struct vhost_dev *dev)
 {
 int fds[VHOST_MEMORY_MAX_NREGIONS];
-VhostUserMsg msg;
+VhostUserMemory memory = {};
 
-vhost_user_prepare_msg(dev, , fds);
+vhost_user_prepare_msg(dev, , fds);
 }


Regards,
Jay

> 
>  const VhostOps user_ops = {
> --
> MST



Re: [Qemu-devel] [PATCH v8 2/2] vhost: used_memslots refactoring

2018-03-05 Thread Zhoujian (jay)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Friday, March 02, 2018 12:17 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [PATCH v8 2/2] vhost: used_memslots refactoring
> 
> On Tue, Feb 27, 2018 at 03:10:05PM +0800, Jay Zhou wrote:
> > Used_memslots is shared by vhost kernel and user, it is equal to
> > dev->mem->nregions, which is correct for vhost kernel, but not for
> > vhost user, the latter one uses memory regions that have file
> > descriptor. E.g. a VM has a vhost-user NIC and 8(vhost user memslot
> > upper limit) memory slots, it will be failed to hotplug a new DIMM
> > device since vhost_has_free_slot() finds no free slot left. It should
> > be successful if only part of memory slots have file descriptor, so
> > setting used memslots for vhost-user and vhost-kernel respectively.
> >
> > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > Signed-off-by: Liuzhe <liuzh...@huawei.com>
> 
> make check fails with this patch, I dropped it for now.

Hi Michael, pls see the reason inline.

> 
> > ---
> >  hw/virtio/vhost-backend.c | 15 +++-
> >  hw/virtio/vhost-user.c| 77 ++-
> 
> >  hw/virtio/vhost.c | 13 +++
> >  include/hw/virtio/vhost-backend.h |  6 ++-
> >  4 files changed, 75 insertions(+), 36 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 7f09efa..59def69 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -15,6 +15,8 @@
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "qemu/error-report.h"
> >
> > +static unsigned int vhost_kernel_used_memslots;
> > +
> >  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int
> request,
> >   void *arg)  { @@ -62,6 +64,11 @@ static
> > int vhost_kernel_memslots_limit(struct vhost_dev *dev)
> >  return limit;
> >  }
> >
> > +static bool vhost_kernel_has_free_memslots(struct vhost_dev *dev) {
> > +return vhost_kernel_used_memslots <
> > +vhost_kernel_memslots_limit(dev); }
> > +
> >  static int vhost_kernel_net_set_backend(struct vhost_dev *dev,
> >  struct vhost_vring_file
> > *file)  { @@ -233,11 +240,16 @@ static void
> > vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> >  qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL,
> > NULL);  }
> >
> > +static void vhost_kernel_set_used_memslots(struct vhost_dev *dev) {
> > +vhost_kernel_used_memslots = dev->mem->nregions; }
> > +
> >  static const VhostOps kernel_ops = {
> >  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> >  .vhost_backend_init = vhost_kernel_init,
> >  .vhost_backend_cleanup = vhost_kernel_cleanup,
> > -.vhost_backend_memslots_limit = vhost_kernel_memslots_limit,
> > +.vhost_backend_has_free_memslots =
> > + vhost_kernel_has_free_memslots,
> >  .vhost_net_set_backend = vhost_kernel_net_set_backend,
> >  .vhost_scsi_set_endpoint = vhost_kernel_scsi_set_endpoint,
> >  .vhost_scsi_clear_endpoint =
> > vhost_kernel_scsi_clear_endpoint, @@ -264,6 +276,7 @@ static const
> > VhostOps kernel_ops = {  #endif /* CONFIG_VHOST_VSOCK */
> >  .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> >  .vhost_send_device_iotlb_msg =
> > vhost_kernel_send_device_iotlb_msg,
> > +.vhost_set_used_memslots = vhost_kernel_set_used_memslots,
> >  };
> >
> >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType
> > backend_type) diff --git a/hw/virtio/vhost-user.c
> > b/hw/virtio/vhost-user.c index 6eb9798..f732c80 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -147,6 +147,8 @@ static VhostUserMsg m __attribute__ ((unused));
> >  /* The version of the protocol we support */
> >  #define VHOST_USER_VERSION(0x1)
> >
> > +static bool vhost_user_free_memslots = true;
> > +
> >  struct vhost_user {
> >  CharBackend *chr;
> >  int slave_fd;
&g

Re: [Qemu-devel] [PATCH v4] tap: setting error appropriately when calling net_init_tap_one()

2018-03-02 Thread Zhoujian (jay)


> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Friday, March 02, 2018 2:30 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; m...@redhat.com; Gonglei
> (Arei) <arei.gong...@huawei.com>; imamm...@redhat.com; wangxin (U)
> <wangxinxin.w...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v4] tap: setting error appropriately when
> calling net_init_tap_one()
> 
> 
> 
> On 2018年02月06日 20:53, Jay Zhou wrote:
> > If netdev_add tap,id=net0,...,vhost=on failed in net_init_tap_one(),
> > the followed up device_add virtio-net-pci,netdev=net0 will fail too,
> > prints:
> >
> > TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD
> > ioctl() failed: Bad file descriptor
> >
> > The reason is that the fd of tap is closed when error occured after
> > calling net_init_tap_one().
> >
> > The fd should be closed when calling net_init_tap_one failed:
> > - if tap_set_sndbuf() failed
> > - if tap_set_sndbuf() succeeded but vhost failed to open or
> >   initialize with vhostforce flag on The fd should not be closed
> > just because vhost failed to open or initialize but without vhostforce
> > flag. So the followed up device_add can fall back to userspace virtio
> > successfully.
> >
> > Suggested-by: Michael S. Tsirkin <m...@redhat.com>
> > Suggested-by: Igor Mammedov <imamm...@redhat.com>
> > Suggested-by: Jason Wang <jasow...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > ---
> > v4: - reduce duplication
> >  - close the fd by caller
> >  - tweak the title
> >
> > v3: - set errp appropriately
> > ---
> >   include/net/vhost_net.h |  3 +++
> >   net/tap.c   | 24 ++--
> >   2 files changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h index
> > afc1499..77e4739 100644
> > --- a/include/net/vhost_net.h
> > +++ b/include/net/vhost_net.h
> > @@ -4,6 +4,9 @@
> >   #include "net/net.h"
> >   #include "hw/virtio/vhost-backend.h"
> >
> > +#define VHOST_NET_INIT_FAILED \
> > +"vhost-net requested but could not be initialized"
> > +
> >   struct vhost_net;
> >   typedef struct vhost_net VHostNetState;
> >
> > diff --git a/net/tap.c b/net/tap.c
> > index 979e622..14d230f 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -686,14 +686,23 @@ static void net_init_tap_one(const NetdevTapOptions
> *tap, NetClientState *peer,
> >   if (vhostfdname) {
> >   vhostfd = monitor_fd_param(cur_mon, vhostfdname, );
> >   if (vhostfd == -1) {
> > -error_propagate(errp, err);
> > +if (tap->has_vhostforce && tap->vhostforce) {
> > +error_propagate(errp, err);
> > +} else {
> > +warn_report_err(err);
> > +}
> >   return;
> >   }
> >   } else {
> >   vhostfd = open("/dev/vhost-net", O_RDWR);
> >   if (vhostfd < 0) {
> > -error_setg_errno(errp, errno,
> > - "tap: open vhost char device failed");
> > +if (tap->has_vhostforce && tap->vhostforce) {
> > +error_setg_errno(errp, errno,
> > + "tap: open vhost char device failed");
> > +} else {
> > +warn_report("tap: open vhost char device failed: %s",
> > +strerror(errno));
> > +}
> >   return;
> >   }
> >   fcntl(vhostfd, F_SETFL, O_NONBLOCK); @@ -702,12 +711,15
> > @@ static void net_init_tap_one(const NetdevTapOptions *tap,
> > NetClientState *peer,
> >
> >   s->vhost_net = vhost_net_init();
> >   if (!s->vhost_net) {
> > -error_setg(errp,
> > -   "vhost-net requested but could not be initialized");
> > +if (tap->has_vhostforce && tap->vhostforce) {
> > +error_setg(errp, VHOST_NET_INIT_FAILED);
> > +} else {
> > +warn_report(VHOST_NET_INIT_FAILED);
> > +}
> >   return;
> >   }
> >   } else if (vhostfdname) {
> > -error_setg(errp, "vhostfd(s)= is not valid without vhost");
> > +warn_report("vhostfd(s)= is not valid without vhost");
> 
> Do we need to keep the error here consider it was a wrong command line
> parameter?

Fair enough, thanks!

Regards,
Jay

> 
> Thanks
> 
> >   }
> >   }
> >



Re: [Qemu-devel] [PATCH v8 2/2] vhost: used_memslots refactoring

2018-03-01 Thread Zhoujian (jay)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Friday, March 02, 2018 12:17 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [PATCH v8 2/2] vhost: used_memslots refactoring
> 
> On Tue, Feb 27, 2018 at 03:10:05PM +0800, Jay Zhou wrote:
> > Used_memslots is shared by vhost kernel and user, it is equal to
> > dev->mem->nregions, which is correct for vhost kernel, but not for
> > vhost user, the latter one uses memory regions that have file
> > descriptor. E.g. a VM has a vhost-user NIC and 8(vhost user memslot
> > upper limit) memory slots, it will be failed to hotplug a new DIMM
> > device since vhost_has_free_slot() finds no free slot left. It should
> > be successful if only part of memory slots have file descriptor, so
> > setting used memslots for vhost-user and vhost-kernel respectively.
> >
> > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > Signed-off-by: Liuzhe <liuzh...@huawei.com>
> 
> make check fails with this patch, I dropped it for now.

Maybe something updated on the master tree affects this patch, will
look into and resolve.

Regards,
Jay

> 
> > ---
> >  hw/virtio/vhost-backend.c | 15 +++-
> >  hw/virtio/vhost-user.c| 77 ++-
> 
> >  hw/virtio/vhost.c | 13 +++
> >  include/hw/virtio/vhost-backend.h |  6 ++-
> >  4 files changed, 75 insertions(+), 36 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 7f09efa..59def69 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -15,6 +15,8 @@
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "qemu/error-report.h"
> >
> > +static unsigned int vhost_kernel_used_memslots;
> > +
> >  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int
> request,
> >   void *arg)  { @@ -62,6 +64,11 @@ static
> > int vhost_kernel_memslots_limit(struct vhost_dev *dev)
> >  return limit;
> >  }
> >
> > +static bool vhost_kernel_has_free_memslots(struct vhost_dev *dev) {
> > +return vhost_kernel_used_memslots <
> > +vhost_kernel_memslots_limit(dev); }
> > +
> >  static int vhost_kernel_net_set_backend(struct vhost_dev *dev,
> >  struct vhost_vring_file
> > *file)  { @@ -233,11 +240,16 @@ static void
> > vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> >  qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL,
> > NULL);  }
> >
> > +static void vhost_kernel_set_used_memslots(struct vhost_dev *dev) {
> > +vhost_kernel_used_memslots = dev->mem->nregions; }
> > +
> >  static const VhostOps kernel_ops = {
> >  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> >  .vhost_backend_init = vhost_kernel_init,
> >  .vhost_backend_cleanup = vhost_kernel_cleanup,
> > -.vhost_backend_memslots_limit = vhost_kernel_memslots_limit,
> > +.vhost_backend_has_free_memslots =
> > + vhost_kernel_has_free_memslots,
> >  .vhost_net_set_backend = vhost_kernel_net_set_backend,
> >  .vhost_scsi_set_endpoint = vhost_kernel_scsi_set_endpoint,
> >  .vhost_scsi_clear_endpoint =
> > vhost_kernel_scsi_clear_endpoint, @@ -264,6 +276,7 @@ static const
> > VhostOps kernel_ops = {  #endif /* CONFIG_VHOST_VSOCK */
> >  .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> >  .vhost_send_device_iotlb_msg =
> > vhost_kernel_send_device_iotlb_msg,
> > +.vhost_set_used_memslots = vhost_kernel_set_used_memslots,
> >  };
> >
> >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType
> > backend_type) diff --git a/hw/virtio/vhost-user.c
> > b/hw/virtio/vhost-user.c index 6eb9798..f732c80 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -147,6 +147,8 @@ static VhostUserMsg m __attribute__ ((unused));
> >  /* The version of the protocol we support */
> >  #define VHOST_USER_VERSION(0x1)
> >
> > +static bool vhost_user_free_memslots = true;
> > +
> >  str

Re: [Qemu-devel] [PATCH v8 0/4] cryptodev: add vhost support

2018-03-01 Thread Zhoujian (jay)
The Intel guy Fan Zhang(CCed) reported a config issue when enabling and testing
vhost_crypto, so I resend this version, thanks!

Regards,
Jay

> -Original Message-
> From: Zhoujian (jay)
> Sent: Tuesday, February 27, 2018 2:33 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; pbonz...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; stefa...@redhat.com; Zhoujian (jay)
> <jianjay.z...@huawei.com>; pa...@linux.vnet.ibm.com; longpeng
> <longpe...@huawei.com>; xin.z...@intel.com; roy.fan.zh...@intel.com; Gonglei
> (Arei) <arei.gong...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>
> Subject: [PATCH v8 0/4] cryptodev: add vhost support
> 
> From: Gonglei <arei.gong...@huawei.com>
> 
> I posted the RFC verion a few months ago for DPDK vhost-crypto implmention,
> and now it's time to send the formal version. Because we need an user space
> scheme for better performance.
> 
> The vhost user crypto server side patches had been sent to DPDK community,
> pls see
> 
> [RFC PATCH 0/6] lib/librte_vhost: introduce new vhost_user crypto backend
> support http://dpdk.org/ml/archives/dev/2017-November/081048.html
> 
> You also can get virtio-crypto polling mode driver from:
> 
> [PATCH v2 0/7] crypto: add virtio poll mode driver
> http://dpdk.org/ml/archives/dev/2018-February/091410.html
> 
> v8:
>   - make cryptodev-vhost.c complied linux independently
> v7:
>   - make virtio crypto enabled on non-Linux
>   - fix format-string issues
>   - fix error reported by clang
>   - fix a typo when setting length of cipher key
>   - rebased on the master
> v6:
>   - Fix compile error about backends/cryptodev-vhost-user.o and rebase on
> the master
> v5:
>   - squash [PATCH v4 5/5] into previous patches [Michael]
> v4:
>   - "[PATCH v4 5/5] cryptodev-vhost-user: depend on CONFIG_VHOST_CRYPTO
> and CONFIG_VHOST_USER" newly added to fix compilation dependency [Michael]
> v3:
>   - New added vhost user messages should be sent only when feature
> has been successfully negotiated [Michael]
> v2:
>   - Fix compile error on mingw32
> 
> Gonglei (4):
>   cryptodev: add vhost-user as a new cryptodev backend
>   cryptodev: add vhost support
>   cryptodev-vhost-user: add crypto session handler
>   cryptodev-vhost-user: set the key length
> 
>  backends/Makefile.objs|   6 +
>  backends/cryptodev-builtin.c  |   1 +
>  backends/cryptodev-vhost-user.c   | 377
> ++
>  backends/cryptodev-vhost.c| 347 +++
>  configure |  15 ++
>  docs/interop/vhost-user.txt   |  26 +++
>  hw/virtio/vhost-user.c| 104 ++
>  hw/virtio/virtio-crypto.c |  70 +++
>  include/hw/virtio/vhost-backend.h |   8 +
>  include/hw/virtio/virtio-crypto.h |   1 +
>  include/sysemu/cryptodev-vhost-user.h |  47 +
>  include/sysemu/cryptodev-vhost.h  | 154 ++
>  include/sysemu/cryptodev.h|   8 +
>  qemu-options.hx   |  21 ++
>  vl.c  |   6 +
>  15 files changed, 1191 insertions(+)
>  create mode 100644 backends/cryptodev-vhost-user.c  create mode 100644
> backends/cryptodev-vhost.c  create mode 100644 include/sysemu/cryptodev-
> vhost-user.h
>  create mode 100644 include/sysemu/cryptodev-vhost.h
> 
> --
> 1.8.3.1
> 




Re: [Qemu-devel] [PATCH v4] tap: setting error appropriately when calling net_init_tap_one()

2018-02-28 Thread Zhoujian (jay)
Ping ...

> -Original Message-
> From: Zhoujian (jay)
> Sent: Tuesday, February 06, 2018 8:54 PM
> To: qemu-devel@nongnu.org
> Cc: jasow...@redhat.com; m...@redhat.com; imamm...@redhat.com; Huangweidong 
> (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Zhoujian (jay) <jianjay.z...@huawei.com>
> Subject: [PATCH v4] tap: setting error appropriately when calling
> net_init_tap_one()
> 
> If netdev_add tap,id=net0,...,vhost=on failed in net_init_tap_one(), the
> followed up device_add virtio-net-pci,netdev=net0 will fail too, prints:
> 
>TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD
>ioctl() failed: Bad file descriptor
> 
> The reason is that the fd of tap is closed when error occured after calling
> net_init_tap_one().
> 
> The fd should be closed when calling net_init_tap_one failed:
>- if tap_set_sndbuf() failed
>- if tap_set_sndbuf() succeeded but vhost failed to open or
>  initialize with vhostforce flag on
> The fd should not be closed just because vhost failed to open or initialize
> but without vhostforce flag. So the followed up device_add can fall back to
> userspace virtio successfully.
> 
> Suggested-by: Michael S. Tsirkin <m...@redhat.com>
> Suggested-by: Igor Mammedov <imamm...@redhat.com>
> Suggested-by: Jason Wang <jasow...@redhat.com>
> Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> ---
> v4: - reduce duplication
> - close the fd by caller
> - tweak the title
> 
> v3: - set errp appropriately
> ---
>  include/net/vhost_net.h |  3 +++
>  net/tap.c   | 24 ++--
>  2 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h index
> afc1499..77e4739 100644
> --- a/include/net/vhost_net.h
> +++ b/include/net/vhost_net.h
> @@ -4,6 +4,9 @@
>  #include "net/net.h"
>  #include "hw/virtio/vhost-backend.h"
> 
> +#define VHOST_NET_INIT_FAILED \
> +"vhost-net requested but could not be initialized"
> +
>  struct vhost_net;
>  typedef struct vhost_net VHostNetState;
> 
> diff --git a/net/tap.c b/net/tap.c
> index 979e622..14d230f 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -686,14 +686,23 @@ static void net_init_tap_one(const NetdevTapOptions
> *tap, NetClientState *peer,
>  if (vhostfdname) {
>  vhostfd = monitor_fd_param(cur_mon, vhostfdname, );
>  if (vhostfd == -1) {
> -error_propagate(errp, err);
> +if (tap->has_vhostforce && tap->vhostforce) {
> +error_propagate(errp, err);
> +} else {
> +warn_report_err(err);
> +}
>  return;
>  }
>  } else {
>  vhostfd = open("/dev/vhost-net", O_RDWR);
>  if (vhostfd < 0) {
> -error_setg_errno(errp, errno,
> - "tap: open vhost char device failed");
> +if (tap->has_vhostforce && tap->vhostforce) {
> +error_setg_errno(errp, errno,
> + "tap: open vhost char device failed");
> +} else {
> +warn_report("tap: open vhost char device failed: %s",
> +strerror(errno));
> +}
>  return;
>  }
>  fcntl(vhostfd, F_SETFL, O_NONBLOCK); @@ -702,12 +711,15 @@
> static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState
> *peer,
> 
>  s->vhost_net = vhost_net_init();
>  if (!s->vhost_net) {
> -error_setg(errp,
> -   "vhost-net requested but could not be initialized");
> +if (tap->has_vhostforce && tap->vhostforce) {
> +error_setg(errp, VHOST_NET_INIT_FAILED);
> +} else {
> +warn_report(VHOST_NET_INIT_FAILED);
> +}
>  return;
>  }
>  } else if (vhostfdname) {
> -error_setg(errp, "vhostfd(s)= is not valid without vhost");
> +warn_report("vhostfd(s)= is not valid without vhost");
>  }
>  }
> 
> --
> 1.8.3.1
> 




Re: [Qemu-devel] [PATCH v6 2/4] cryptodev: add vhost support

2018-02-13 Thread Zhoujian (jay)
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, February 14, 2018 12:44 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; stefa...@redhat.com; pa...@linux.vnet.ibm.com;
> longpeng <longpe...@huawei.com>; xin.z...@intel.com; roy.fan.zh...@intel.com;
> Gonglei (Arei) <arei.gong...@huawei.com>; wangxin (U)
> <wangxinxin.w...@huawei.com>
> Subject: Re: [PATCH v6 2/4] cryptodev: add vhost support
> 
> On Sun, Jan 21, 2018 at 08:54:48PM +0800, Jay Zhou wrote:
> > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index
> > 765d363..c65dca2 100644
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -7,7 +7,7 @@ common-obj-y += virtio-mmio.o  obj-y += virtio.o
> > virtio-balloon.o
> >  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> >  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o -obj-y += virtio-crypto.o
> > +obj-$(CONFIG_LINUX) += virtio-crypto.o
> >  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o  endif
> >
> 
> This disables virtio crypto completely on non-Linux, which is not nice. We
> should not break working configs.

So If I understand correctly, the virtio crypto device should be compiled in
unconditionally, which is like this:

obj-y += virtio-crypto.o

> 
> In particular this causes test failures on non-Linux hosts. Peter Maydell was
> kind enough to debug this and reported this backtrace:
> 
> gdb --args ./aarch64-softmmu/qemu-system-aarch64 -device virtio-crypto-pci -
> machine virt [...]
> #0  0x7f7ff450e6fa in _lwp_kill () from /usr/lib/libc.so.12
> #1  0x7f7ff450e385 in abort () from /usr/lib/libc.so.12
> #2  0x7f7ff5c65da2 in g_assertion_message () from /usr/pkg/lib/libglib-
> 2.0.so.0
> #3  0x7f7ff5c65e11 in g_assertion_message_expr () from
> /usr/pkg/lib/libglib-2.0.so.0
> #4  0x0074dc16 in object_initialize_with_type
> (data=data@entry=0x7f7ff33a2170, size=, type=0x0)
> at /root/qemu/qom/object.c:372
> #5  0x0074de33 in object_initialize (data=data@entry=0x7f7ff33a2170,
> size=, typename=)
> at /root/qemu/qom/object.c:392
> #6  0x004d2293 in virtio_instance_init_common
> (proxy_obj=0x7f7ff339a000, data=0x7f7ff33a2170, vdev_size=,
> vdev_name=) at /root/qemu/hw/virtio/virtio.c:2232
> #7  0x0074db0d in object_initialize_with_type
> (data=data@entry=0x7f7ff339a000, size=33664, type=type@entry=0x7f7ff7b79a80)
> at /root/qemu/qom/object.c:384
> #8  0x0074dc66 in object_new_with_type (type=0x7f7ff7b79a80) at
> /root/qemu/qom/object.c:492
> #9  0x0074deb9 in object_new (typename=typename@entry=0x7f7ff7b454e0
> "virtio-crypto-pci") at /root/qemu/qom/object.c:502
> #10 0x005924d6 in qdev_device_add (opts=0x7f7ff7b4c070,
> errp=errp@entry=0x7f7fda10) at /root/qemu/qdev-monitor.c:615
> #11 0x00594d31 in device_init_func (opaque=,
> opts=, errp=) at /root/qemu/vl.c:2373
> #12 0x00826e56 in qemu_opts_foreach (list=,
> func=func@entry=0x594d0c , opaque=opaque@entry=0x0,
> errp=errp@entry=0x0) at /root/qemu/util/qemu-option.c:1073
> #13 0x008b723d in main (argc=, argv=,
> envp=) at /root/qemu/vl.c:4642
> 
> 
> He explained:
> 
> 
>  ... this is almost certainly the classic "device A depends on device
> B, device B is conditionally compiled but device A isn't"
>  the type that is missing is virtio-crypto-device  virtio-
> crypto.o is built only if CONFIG_LINUX, but virtio-crypto-pci is in virtio-
> crypto-pci.c which is built if CONFIG_VIRTIO_PCI

Okay, I see. Thanks for Peter's help.

Regards,
Jay

> 
> 
> --
> MST



Re: [Qemu-devel] [PATCH v6 0/4] cryptodev: add vhost support

2018-02-13 Thread Zhoujian (jay)
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, February 14, 2018 12:47 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; stefa...@redhat.com; pa...@linux.vnet.ibm.com;
> longpeng <longpe...@huawei.com>; xin.z...@intel.com; roy.fan.zh...@intel.com;
> Gonglei (Arei) <arei.gong...@huawei.com>; wangxin (U)
> <wangxinxin.w...@huawei.com>
> Subject: Re: [PATCH v6 0/4] cryptodev: add vhost support
> 
> On Sun, Jan 21, 2018 at 08:54:46PM +0800, Jay Zhou wrote:
> > From: Gonglei <arei.gong...@huawei.com>
> >
> > I posted the RFC verion a few months ago for DPDK vhost-crypto
> > implmention, and now it's time to send the formal version. Because we
> > need an user space scheme for better performance.
> >
> > The vhost user crypto server side patches had been sent to DPDK
> > community, pls see
> 
> I dropped the patchset from the latest pull request.
> Please address the issues found, test stop path some more and resumit.

Hi Michael,
Thanks for your help, I'll respin this patchset when the issues are solved.

Regards,
Jay

> 
> Thanks!
> 
> > [RFC PATCH 0/6] lib/librte_vhost: introduce new vhost_user crypto
> > backend support
> > http://dpdk.org/ml/archives/dev/2017-November/081048.html
> >
> > You also can get virtio-crypto polling mode driver from:
> >
> > [PATCH] virtio: add new driver for crypto devices
> > http://dpdk.org/ml/archives/dev/2017-November/081985.html
> >
> > v5 -> v6:
> >   Fix compile error about backends/cryptodev-vhost-user.o and rebase on
> >   the master
> > v4 -> v5:
> >   squash [PATCH v4 5/5] into previous patches [Michael]
> > v3 -> v4:
> >   "[PATCH v4 5/5] cryptodev-vhost-user: depend on CONFIG_VHOST_CRYPTO
> >   and CONFIG_VHOST_USER" newly added to fix compilation dependency
> > [Michael]
> > v2 -> v3:
> >   New added vhost user messages should be sent only when feature
> >   has been successfully negotiated [Michael]
> > v1 -> v2:
> >   Fix compile error on mingw32
> >
> > Gonglei (4):
> >   cryptodev: add vhost-user as a new cryptodev backend
> >   cryptodev: add vhost support
> >   cryptodev-vhost-user: add crypto session handler
> >   cryptodev-vhost-user: set the key length
> >
> >  backends/Makefile.objs|   6 +
> >  backends/cryptodev-builtin.c  |   1 +
> >  backends/cryptodev-vhost-user.c   | 379
> ++
> >  backends/cryptodev-vhost.c| 347
> +++
> >  configure |  15 ++
> >  docs/interop/vhost-user.txt   |  26 +++
> >  hw/virtio/Makefile.objs   |   2 +-
> >  hw/virtio/vhost-user.c| 104 ++
> >  hw/virtio/virtio-crypto.c |  70 +++
> >  include/hw/virtio/vhost-backend.h |   8 +
> >  include/hw/virtio/virtio-crypto.h |   1 +
> >  include/sysemu/cryptodev-vhost-user.h |  47 +
> >  include/sysemu/cryptodev-vhost.h  | 154 ++
> >  include/sysemu/cryptodev.h|   8 +
> >  qemu-options.hx   |  21 ++
> >  vl.c  |   6 +
> >  16 files changed, 1194 insertions(+), 1 deletion(-)  create mode
> > 100644 backends/cryptodev-vhost-user.c  create mode 100644
> > backends/cryptodev-vhost.c  create mode 100644
> > include/sysemu/cryptodev-vhost-user.h
> >  create mode 100644 include/sysemu/cryptodev-vhost.h
> >
> > --
> > 1.8.3.1
> >



Re: [Qemu-devel] [PATCH v6 1/4] cryptodev: add vhost-user as a new cryptodev backend

2018-02-13 Thread Zhoujian (jay)
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, February 14, 2018 12:46 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; stefa...@redhat.com; pa...@linux.vnet.ibm.com;
> longpeng <longpe...@huawei.com>; xin.z...@intel.com; roy.fan.zh...@intel.com;
> Gonglei (Arei) <arei.gong...@huawei.com>; wangxin (U)
> <wangxinxin.w...@huawei.com>
> Subject: Re: [PATCH v6 1/4] cryptodev: add vhost-user as a new cryptodev
> backend
> 
> On Sun, Jan 21, 2018 at 08:54:47PM +0800, Jay Zhou wrote:
> > diff --git a/backends/cryptodev-vhost-user.c
> > b/backends/cryptodev-vhost-user.c new file mode 100644 index
> > 000..4e63ece
> > --- /dev/null
> > +++ b/backends/cryptodev-vhost-user.c
> > @@ -0,0 +1,333 @@
> > +/*
> > + * QEMU Cryptodev backend for QEMU cipher APIs
> > + *
> > + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> > + *
> > + * Authors:
> > + *Gonglei <arei.gong...@huawei.com>
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see
> <http://www.gnu.org/licenses/>.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/boards.h"
> > +#include "qapi/error.h"
> > +#include "qapi/qmp/qerror.h"
> > +#include "qemu/error-report.h"
> > +#include "standard-headers/linux/virtio_crypto.h"
> > +#include "sysemu/cryptodev-vhost.h"
> > +#include "chardev/char-fe.h"
> > +
> > +
> > +/**
> > + * @TYPE_CRYPTODEV_BACKEND_VHOST_USER:
> > + * name of backend that uses vhost user server  */ #define
> > +TYPE_CRYPTODEV_BACKEND_VHOST_USER "cryptodev-vhost-user"
> > +
> > +#define CRYPTODEV_BACKEND_VHOST_USER(obj) \
> > +OBJECT_CHECK(CryptoDevBackendVhostUser, \
> > + (obj), TYPE_CRYPTODEV_BACKEND_VHOST_USER)
> > +
> > +
> > +typedef struct CryptoDevBackendVhostUser {
> > +CryptoDevBackend parent_obj;
> > +
> > +CharBackend chr;
> > +char *chr_name;
> > +bool opened;
> > +CryptoDevBackendVhost *vhost_crypto[MAX_CRYPTO_QUEUE_NUM];
> > +} CryptoDevBackendVhostUser;
> > +
> > +static int
> > +cryptodev_vhost_user_running(
> > + CryptoDevBackendVhost *crypto) {
> > +return crypto ? 1 : 0;
> > +}
> > +
> > +static void cryptodev_vhost_user_stop(int queues,
> > +  CryptoDevBackendVhostUser *s) {
> > +size_t i;
> > +
> > +for (i = 0; i < queues; i++) {
> > +if (!cryptodev_vhost_user_running(s->vhost_crypto[i])) {
> > +continue;
> > +}
> > +
> > +if (s->vhost_crypto) {
> > +cryptodev_vhost_cleanup(s->vhost_crypto[i]);
> > +s->vhost_crypto[i] = NULL;
> > +}
> > +}
> > +}
> 
> This test is problematic: clang build triggers an error:
> > /home/petmay01/linaro/qemu-for-merges/backends/cryptodev-vhost-user.c:86:16:
> > error: address of array 's->vhost_crypto' will always evaluate to
> > 'true' [-Werror,-Wpointer-bool-conversion]
> > if (s->vhost_crypto) {
> > ~~  ~~~^~~~

This line should be

if (s->vhost_crypto[i]) {
> 
> I really don't see how this could do the right thing, which makes me suspect
> that either you did not test stop, or you always have all queues enabled.
> 
> Pls test a config with some queues disabled.
> 
> In particular this machinery needs some unit tests to catch errors like this.

Okay, will do more tests, sorry about that.

Regards,
Jay

> 
> 
> --
> MST




Re: [Qemu-devel] [PULL 00/26] virtio, vhost, pci, pc: features, fixes and cleanups

2018-02-13 Thread Zhoujian (jay)
> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-
> bounces+jianjay.zhou=huawei@nongnu.org] On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, February 14, 2018 12:52 AM
> To: Peter Maydell 
> Cc: QEMU Developers 
> Subject: Re: [Qemu-devel] [PULL 00/26] virtio, vhost, pci, pc: features,
> fixes and cleanups
> 
> On Tue, Feb 13, 2018 at 04:33:14PM +, Peter Maydell wrote:
> > On 12 February 2018 at 09:35, Peter Maydell 
> wrote:
> > > This asserts in 'make check' for NetBSD, FreeBSD, OpenBSD, OSX:
> > >
> > > TEST: tests/device-introspect-test... (pid=19530)
> > >   /aarch64/device/introspect/list: OK
> > >   /aarch64/device/introspect/list-fields:  OK
> > >   /aarch64/device/introspect/none: OK
> > >   /aarch64/device/introspect/abstract: OK
> > >   /aarch64/device/introspect/concrete: **
> > > ERROR:/root/qemu/qom/object.c:372:object_initialize_with_type:
> > > assertion failed: (type != NULL)
> > > Broken pipe
> > > FAIL
> > > GTester: last random seed: R02Sd4e2c04f6ac00d843a31ccac4de0d914
> > > (pid=12686)
> > >   /aarch64/device/introspect/abstract-interfaces:  OK
> > > FAIL: tests/device-introspect-test
> > >
> > > (other archs fail too, aarch64 is just the first one we hit). This
> > > error is often "device A instantiates device B, but device B is
> > > conditionally compiled in and A is always compiled; so test fails
> > > trying to create device A on hosts where device B isn't built" I think.
> >
> > This part is indeed that problem -- the 'virtio-crypto-device' object
> > is built only if CONFIG_LINUX, but 'virtio-crypto-pci' is built if
> > CONFIG_VIRTIO_PCI, so on systems where the latter is true but not the
> > former you get a virtio-crypto-pci device that crashes when instantiated.
> >
> > The fix should be
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -8,7 +8,7 @@ obj-y += virtio.o virtio-balloon.o
> >  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> >  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
> >  obj-$(CONFIG_LINUX) += virtio-crypto.o
> > -obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
> > +obj-$(call land,$(CONFIG_LINUX),$(CONFIG_VIRTIO_PCI)) +=
> > +virtio-crypto-pci.o
> >  endif
> >
> >  common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
> >
> >
> > thanks
> > -- PMM
> 
> Thanks for your help with this!
> 
> I think the root cause is that one of the patches disables virtio-crypto
> build on non linux which used to be enabled.
> 
> This has been silently done by a patch which was supposed to merely add a
> vhost crypto device, I'm sorry I missed that in the review.

Hi Michael, Peter

Sorry for the late response and the trouble I have made, I just returned
from a vacation.
Thanks for your help and pointing out the root cause, I should explicitly
put the config change of virtio-crypto into a separate patch to make
review much easier.

> 
> I've dropped the crypto vhost patches from the pull request for now.

Will fix the issue in the next version, sorry again for the inconvenience.

Regards,
Jay

> 
> Pushed with the same name - should be fine now.
> 
> --
> MST




Re: [Qemu-devel] [PATCH v3] tap: close fd conditionally when error occured

2018-02-06 Thread Zhoujian (jay)
Hi Jason,

> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Tuesday, February 06, 2018 5:29 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> Cc: m...@redhat.com; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>
> Subject: Re: [PATCH v3] tap: close fd conditionally when error occured
> 
> 
> 
> On 2018年02月02日 17:50, Jay Zhou wrote:
> > If netdev_add tap,id=net0,...,vhost=on failed in net_init_tap_one(),
> > the followed up device_add virtio-net-pci,netdev=net0 will fail too,
> > prints:
> >
> > TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD
> > ioctl() failed: Bad file descriptor
> >
> > The reason is that the fd of tap is closed when error occured after
> > calling net_init_tap_one().
> >
> > The fd should be closed when calling net_init_tap_one failed:
> > - if tap_set_sndbuf() failed
> > - if tap_set_sndbuf() succeeded but vhost failed to open or
> >   initialize with vhostforce flag on The fd should not be closed
> > just because vhost failed to open or initialize but without vhostforce
> > flag. So the followed up device_add can fall back to userspace virtio
> > successfully.
> >
> > Suggested-by: Michael S. Tsirkin <m...@redhat.com>
> > Suggested-by: Igor Mammedov <imamm...@redhat.com>
> > Suggested-by: Jason Wang <jasow...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > ---
> >   net/tap.c | 44 
> >   1 file changed, 32 insertions(+), 12 deletions(-)
> >
> > diff --git a/net/tap.c b/net/tap.c
> > index 979e622..c46ab14 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -651,7 +651,7 @@ static void net_init_tap_one(const NetdevTapOptions
> *tap, NetClientState *peer,
> >   tap_set_sndbuf(s->fd, tap, );
> >   if (err) {
> >   error_propagate(errp, err);
> > -return;
> > +goto error;
> >   }
> >
> >   if (tap->has_fd || tap->has_fds) { @@ -686,15 +686,26 @@ static
> > void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
> >   if (vhostfdname) {
> >   vhostfd = monitor_fd_param(cur_mon, vhostfdname, );
> >   if (vhostfd == -1) {
> > -error_propagate(errp, err);
> > -return;
> > +if (tap->has_vhostforce && tap->vhostforce) {
> > +error_propagate(errp, err);
> > +goto error;
> > +} else {
> > +warn_report_err(err);
> > +return;
> > +}
> >   }
> >   } else {
> >   vhostfd = open("/dev/vhost-net", O_RDWR);
> >   if (vhostfd < 0) {
> > -error_setg_errno(errp, errno,
> > - "tap: open vhost char device failed");
> > -return;
> > +if (tap->has_vhostforce && tap->vhostforce) {
> > +error_setg_errno(errp, errno,
> > + "tap: open vhost char device failed");
> > +goto error;
> > +} else {
> > +warn_report("tap: open vhost char device failed: %s",
> > +strerror(errno));
> > +return;
> > +}
> >   }
> >   fcntl(vhostfd, F_SETFL, O_NONBLOCK);
> >   }
> > @@ -702,12 +713,23 @@ static void net_init_tap_one(const
> > NetdevTapOptions *tap, NetClientState *peer,
> >
> >   s->vhost_net = vhost_net_init();
> >   if (!s->vhost_net) {
> > -error_setg(errp,
> > -   "vhost-net requested but could not be initialized");
> > -return;
> > +if (tap->has_vhostforce && tap->vhostforce) {
> > +error_setg(errp,
> > +   "vhost-net requested but could not be
> initialized");
> > +goto error;
> > +} else {
> > +warn_report("vhost-net requested but could not be
> initialized");
> > + 

Re: [Qemu-devel] [PATCH v6 0/4] cryptodev: add vhost support

2018-02-01 Thread Zhoujian (jay)
Hi Michael,
I am wondering whether this version is OK for you?
Any comment will be appreciated, thanks.

Regards,
Jay

> -Original Message-
> From: Zhoujian (jay)
> Sent: Sunday, January 21, 2018 8:55 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; pbonz...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; stefa...@redhat.com; Zhoujian (jay)
> <jianjay.z...@huawei.com>; pa...@linux.vnet.ibm.com; longpeng
> <longpe...@huawei.com>; xin.z...@intel.com; roy.fan.zh...@intel.com; Gonglei
> (Arei) <arei.gong...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>
> Subject: [PATCH v6 0/4] cryptodev: add vhost support
> 
> From: Gonglei <arei.gong...@huawei.com>
> 
> I posted the RFC verion a few months ago for DPDK vhost-crypto implmention,
> and now it's time to send the formal version. Because we need an user space
> scheme for better performance.
> 
> The vhost user crypto server side patches had been sent to DPDK community,
> pls see
> 
> [RFC PATCH 0/6] lib/librte_vhost: introduce new vhost_user crypto backend
> support http://dpdk.org/ml/archives/dev/2017-November/081048.html
> 
> You also can get virtio-crypto polling mode driver from:
> 
> [PATCH] virtio: add new driver for crypto devices
> http://dpdk.org/ml/archives/dev/2017-November/081985.html
> 
> v5 -> v6:
>   Fix compile error about backends/cryptodev-vhost-user.o and rebase on
>   the master
> v4 -> v5:
>   squash [PATCH v4 5/5] into previous patches [Michael]
> v3 -> v4:
>   "[PATCH v4 5/5] cryptodev-vhost-user: depend on CONFIG_VHOST_CRYPTO
>   and CONFIG_VHOST_USER" newly added to fix compilation dependency [Michael]
> v2 -> v3:
>   New added vhost user messages should be sent only when feature
>   has been successfully negotiated [Michael]
> v1 -> v2:
>   Fix compile error on mingw32
> 
> Gonglei (4):
>   cryptodev: add vhost-user as a new cryptodev backend
>   cryptodev: add vhost support
>   cryptodev-vhost-user: add crypto session handler
>   cryptodev-vhost-user: set the key length
> 
>  backends/Makefile.objs|   6 +
>  backends/cryptodev-builtin.c  |   1 +
>  backends/cryptodev-vhost-user.c   | 379
> ++
>  backends/cryptodev-vhost.c| 347 +++
>  configure |  15 ++
>  docs/interop/vhost-user.txt   |  26 +++
>  hw/virtio/Makefile.objs   |   2 +-
>  hw/virtio/vhost-user.c| 104 ++
>  hw/virtio/virtio-crypto.c |  70 +++
>  include/hw/virtio/vhost-backend.h |   8 +
>  include/hw/virtio/virtio-crypto.h |   1 +
>  include/sysemu/cryptodev-vhost-user.h |  47 +
>  include/sysemu/cryptodev-vhost.h  | 154 ++
>  include/sysemu/cryptodev.h|   8 +
>  qemu-options.hx   |  21 ++
>  vl.c  |   6 +
>  16 files changed, 1194 insertions(+), 1 deletion(-)  create mode 100644
> backends/cryptodev-vhost-user.c  create mode 100644 backends/cryptodev-
> vhost.c  create mode 100644 include/sysemu/cryptodev-vhost-user.h
>  create mode 100644 include/sysemu/cryptodev-vhost.h
> 
> --
> 1.8.3.1
> 




Re: [Qemu-devel] [PATCH v6 0/3] vhost: two fixes and used_memslots refactoring

2018-01-21 Thread Zhoujian (jay)
> So please simply rebase on top of master now.
> 

Hi Michael,
Having done that and pls help to review again.

Regards,
Jay



Re: [Qemu-devel] [PATCH v5 0/4] cryptodev: add vhost support

2018-01-19 Thread Zhoujian (jay)
[...]

> Configure options:
> --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --
> prefix=/tmp/qemu-test/install
[...]

> KVM support   yes
> HAX support   no
> HVF support   no
> TCG support   yes
> TCG debug enabled no
> TCG interpreter   no
> malloc trim support yes
> RDMA support  yes
> fdt support   yes
> preadv supportyes
> fdatasync yes
> madvise   yes
> posix_madvise yes
> libcap-ng support no
> vhost-net support yes
> vhost-crypto support yes
> vhost-scsi support yes
> vhost-vsock support yes
> vhost-user support yes
[...]

> make[1]: *** No rule to make target `../backends/cryptodev-vhost-user.o',
> needed by `qemu-system-x86_64'.  Stop.
> make[1]: *** Waiting for unfinished jobs

But it's successfully compiled using centos7 and mingw32 locally, and I couldn't
reproduce this. Is it related to docker? (I don't have docker environment in 
hand)

Regards,
Jay




Re: [Qemu-devel] [PATCH v3 1/4] cryptodev: add vhost-user as a new cryptodev backend

2018-01-16 Thread Zhoujian (jay)
> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-
> bounces+jianjay.zhou=huawei@nongnu.org] On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, January 17, 2018 12:41 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: pa...@linux.vnet.ibm.com; Huangweidong (C) <weidong.hu...@huawei.com>;
> xin.z...@intel.com; qemu-devel@nongnu.org; Gonglei (Arei)
> <arei.gong...@huawei.com>; roy.fan.zh...@intel.com; stefa...@redhat.com;
> pbonz...@redhat.com; longpeng <longpe...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v3 1/4] cryptodev: add vhost-user as a new
> cryptodev backend
> 
> On Tue, Jan 16, 2018 at 10:06:50PM +0800, Jay Zhou wrote:
> > From: Gonglei <arei.gong...@huawei.com>
> >
> > Usage:
> >  -chardev socket,id=charcrypto0,path=/path/to/your/socket
> >  -object cryptodev-vhost-user,id=cryptodev0,chardev=charcrypto0
> >  -device virtio-crypto-pci,id=crypto0,cryptodev=cryptodev0
> >
> > Signed-off-by: Gonglei <arei.gong...@huawei.com>
> > Signed-off-by: Longpeng(Mike) <longpe...@huawei.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > ---
> >  backends/Makefile.objs   |   4 +
> >  backends/cryptodev-vhost-user.c  | 333
> +++
> >  backends/cryptodev-vhost.c   |  73 +
> >  include/sysemu/cryptodev-vhost.h | 154 ++
> >  qemu-options.hx  |  21 +++
> >  vl.c |   4 +
> >  6 files changed, 589 insertions(+)
> >  create mode 100644 backends/cryptodev-vhost-user.c  create mode
> > 100644 backends/cryptodev-vhost.c  create mode 100644
> > include/sysemu/cryptodev-vhost.h
> >
> > diff --git a/backends/Makefile.objs b/backends/Makefile.objs index
> > 0400799..9e1fb76 100644
> > --- a/backends/Makefile.objs
> > +++ b/backends/Makefile.objs
> > @@ -8,3 +8,7 @@ common-obj-$(CONFIG_LINUX) += hostmem-file.o
> >
> >  common-obj-y += cryptodev.o
> >  common-obj-y += cryptodev-builtin.o
> > +
> > +ifeq ($(CONFIG_VIRTIO),y)
> > +common-obj-$(CONFIG_LINUX) += cryptodev-vhost.o
> > +cryptodev-vhost-user.o endif
> 
> Shouldn't this depend on CONFIG_VHOST_USER?

Yes, you're right. Will fix it soon.

Regards,
Jay

> 
> 
> > diff --git a/backends/cryptodev-vhost-user.c
> > b/backends/cryptodev-vhost-user.c new file mode 100644 index
> > 000..4e63ece
> > --- /dev/null
> > +++ b/backends/cryptodev-vhost-user.c
> > @@ -0,0 +1,333 @@
> > +/*
> > + * QEMU Cryptodev backend for QEMU cipher APIs
> > + *
> > + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> > + *
> > + * Authors:
> > + *Gonglei <arei.gong...@huawei.com>
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see
> <http://www.gnu.org/licenses/>.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/boards.h"
> > +#include "qapi/error.h"
> > +#include "qapi/qmp/qerror.h"
> > +#include "qemu/error-report.h"
> > +#include "standard-headers/linux/virtio_crypto.h"
> > +#include "sysemu/cryptodev-vhost.h"
> > +#include "chardev/char-fe.h"
> > +
> > +
> > +/**
> > + * @TYPE_CRYPTODEV_BACKEND_VHOST_USER:
> > + * name of backend that uses vhost user server  */ #define
> > +TYPE_CRYPTODEV_BACKEND_VHOST_USER "cryptodev-vhost-user"
> > +
> > +#define CRYPTODEV_BACKEND_VHOST_USER(obj) \
> > +OBJECT_CHECK(CryptoDevBackendVhostUser, \
> > + (obj), TYPE_CRYPTODEV_BACKEND_VHOST_USER)
> > +
> > +
> > +typedef struct CryptoDevBackendVhostUser {
> > +CryptoDevBackend parent_obj;
> > +
> > +CharBackend chr;
> > +char *chr_name;
> > +bool opened;
> > +CryptoDevBackendVhost *vhost_crypto[MAX

Re: [Qemu-devel] [PATCH] memory: update comments and fix some typos

2018-01-16 Thread Zhoujian (jay)
> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, January 16, 2018 10:44 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; wangxin (U)
> <wangxinxin.w...@huawei.com>
> Subject: Re: [PATCH] memory: update comments and fix some typos
> 
> On 16/01/2018 09:32, Zhoujian (jay) wrote:
> > Hi Paolo,
> > Maybe it is a little boring to review updated comments, but I think it
> > is the right thing to do, so could you have a look when you're free?
> 
> On the contrary, it's useful!  I just missed it among the email after new
> year's break.
> 
> I queued it now.

Thanks,

Jay

> 
> Paolo
> 
> > Regards,
> > Jay
> >
> >> -Original Message-
> >> From: Zhoujian (jay)
> >> Sent: Thursday, January 04, 2018 1:30 PM
> >> To: qemu-devel@nongnu.org
> >> Cc: pbonz...@redhat.com; Huangweidong (C) <weidong.hu...@huawei.com>;
> >> wangxin
> >> (U) <wangxinxin.w...@huawei.com>; Zhoujian (jay)
> >> <jianjay.z...@huawei.com>
> >> Subject: [PATCH] memory: update comments and fix some typos
> >>
> >> Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> >> ---
> >>  include/exec/memory.h | 27 +++
> >>  1 file changed, 15 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/include/exec/memory.h b/include/exec/memory.h index
> >> a4cabdf..6e5684d 100644
> >> --- a/include/exec/memory.h
> >> +++ b/include/exec/memory.h
> >> @@ -324,7 +324,7 @@ FlatView *address_space_to_flatview(AddressSpace *as);
> >>   * MemoryRegionSection: describes a fragment of a #MemoryRegion
> >>   *
> >>   * @mr: the region, or %NULL if empty
> >> - * @address_space: the address space the region is mapped in
> >> + * @fv: the flat view of the address space the region is mapped in
> >>   * @offset_within_region: the beginning of the section, relative to
> >> @mr's start
> >>   * @size: the size of the section; will not exceed @mr's boundaries
> >>   * @offset_within_address_space: the address of the first byte of
> >> the section @@ -607,6 +607,7 @@ void
> >> memory_region_init_rom_nomigrate(MemoryRegion *mr,
> >>   * @mr: the #MemoryRegion to be initialized.
> >>   * @owner: the object that tracks the region's reference count
> >>   * @ops: callbacks for write access handling (must not be NULL).
> >> + * @opaque: passed to the read and write callbacks of the @ops structure.
> >>   * @name: Region name, becomes part of RAMBlock name used in
> >> migration stream
> >>   *must be unique within any device
> >>   * @size: size of the region.
> >> @@ -650,11 +651,10 @@ static inline void
> >> memory_region_init_reservation(MemoryRegion *mr,
> >>   * An IOMMU region translates addresses and forwards accesses to a target
> >>   * memory region.
> >>   *
> >> - * @typename: QOM class name
> >>   * @_iommu_mr: the #IOMMUMemoryRegion to be initialized
> >>   * @instance_size: the IOMMUMemoryRegion subclass instance size
> >> + * @mrtypename: the type name of the #IOMMUMemoryRegion
> >>   * @owner: the object that tracks the region's reference count
> >> - * @ops: a function that translates addresses into the @target region
> >>   * @name: used for debugging; not visible to the user or ABI
> >>   * @size: size of the region.
> >>   */
> >> @@ -824,8 +824,8 @@ static inline IOMMUMemoryRegion
> >> *memory_region_get_iommu(MemoryRegion *mr)
> >>   * memory_region_get_iommu_class_nocheck: returns iommu memory region
> class
> >>   *   if an iommu or NULL if not
> >>   *
> >> - * Returns pointer to IOMMUMemoryRegioniClass if a memory region is
> >> an iommu,
> >> - * otherwise NULL. This is fast path avoinding QOM checking, use
> >> with caution.
> >> + * Returns pointer to IOMMUMemoryRegionClass if a memory region is
> >> + an iommu,
> >> + * otherwise NULL. This is fast path avoiding QOM checking, use with
> caution.
> >>   *
> >>   * @mr: the memory region being queried
> >>   */
> >> @@ -990,7 +990,8 @@ int memory_region_get_fd(MemoryRegion *mr);
> >>   * protecting the pointer, such as a reference to the region that
> includes
> >>   * the incoming ram_addr_t.
> >>   *
> >> - * @mr: the memory region being queried.
> >

Re: [Qemu-devel] [PATCH v3 0/4] cryptodev: add vhost support

2018-01-16 Thread Zhoujian (jay)
VHOST_USER_CREATE_CRYPTO_SESSION and VHOST_USER_CLOSE_CRYPTO_SESSION are new
added messages, they should be sent only when
VHOST_USER_PROTOCOL_F_CRYPTO_SESSION feature has been successfully negotiated.

The differs between v2 and v3 are listed below, pls review, thanks!

---
diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index f43c63d..3aec685 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -327,6 +327,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_MTU4
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ  5
 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
+#define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
 
 Master message types
 
@@ -605,6 +606,9 @@ Master message types
 
   Create a session for crypto operation. The server side must return the
   session id, 0 or positive for success, negative for failure.
+  This request should be sent only when 
VHOST_USER_PROTOCOL_F_CRYPTO_SESSION
+  feature has been successfully negotiated.
+  It's a required feature for crypto devices.
 
  * VHOST_USER_CLOSE_CRYPTO_SESSION
 
@@ -614,6 +618,9 @@ Master message types
 
   Close a session for crypto operation which was previously
   created by VHOST_USER_CREATE_CRYPTO_SESSION.
+  This request should be sent only when 
VHOST_USER_PROTOCOL_F_CRYPTO_SESSION
+  feature has been successfully negotiated.
+  It's a required feature for crypto devices.
 
 Slave message types
 ---
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 7865c6d..f779512 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -35,6 +35,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_NET_MTU = 4,
 VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
 VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
+VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
 
 VHOST_USER_PROTOCOL_F_MAX
 };
@@ -941,6 +942,8 @@ static int vhost_user_crypto_create_session(struct 
vhost_dev *dev,
   void *session_info,
   uint64_t *session_id)
 {
+bool crypto_session = virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_CRYPTO_SESSION);
 CryptoDevBackendSymSessionInfo *sess_info = session_info;
 VhostUserMsg msg = {
 .request = VHOST_USER_CREATE_CRYPTO_SESSION,
@@ -950,6 +953,11 @@ static int vhost_user_crypto_create_session(struct 
vhost_dev *dev,
 
 assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
+if (!crypto_session) {
+error_report("vhost-user trying to send unhandled ioctl");
+return -1;
+}
+
 memcpy(_setup_data, sess_info,
   sizeof(CryptoDevBackendSymSessionInfo));
 if (sess_info->key_len) {
@@ -994,6 +1002,8 @@ static int vhost_user_crypto_create_session(struct 
vhost_dev *dev,
 static int
 vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id)
 {
+bool crypto_session = virtio_has_feature(dev->protocol_features,
+   VHOST_USER_PROTOCOL_F_CRYPTO_SESSION);
 VhostUserMsg msg = {
 .request = VHOST_USER_CLOSE_CRYPTO_SESSION,
 .flags = VHOST_USER_VERSION,
@@ -1001,6 +1011,11 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, 
uint64_t session_id)
 };
 msg.payload.u64 = session_id;
 
+if (!crypto_session) {
+error_report("vhost-user trying to send unhandled ioctl");
+return -1;
+}
+
 if (vhost_user_write(dev, , NULL, 0) < 0) {
 error_report("vhost_user_write() return -1, close session failed");
 return -1;


> -Original Message-
> From: Zhoujian (jay)
> Sent: Tuesday, January 16, 2018 10:07 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; pbonz...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; stefa...@redhat.com; Zhoujian (jay)
> <jianjay.z...@huawei.com>; pa...@linux.vnet.ibm.com; longpeng
> <longpe...@huawei.com>; xin.z...@intel.com; roy.fan.zh...@intel.com; Gonglei
> (Arei) <arei.gong...@huawei.com>
> Subject: [PATCH v3 0/4] cryptodev: add vhost support
> 
> From: Gonglei <arei.gong...@huawei.com>
> 
> I posted the RFC verion a few months ago for DPDK vhost-crypto implmention,
> and now it's time to send the formal version. Because we need an user space
> scheme for better performance.
> 
> The vhost user crypto server side patches had been sent to DPDK community,
> pls see
> 
> [RFC PATCH 0/6] lib/librte_vhost: introduce new vhost_user crypto backend
> support http://dpdk.org/ml/archives/dev/2017-November/081048.html
> 
> You also can get virtio-crypto polling mode driver from:
> 
> [PATCH] virtio: add new driver for crypto devices
> http://dpdk.org/ml/archives/dev/2017-November/081985.html
&

Re: [Qemu-devel] [PATCH v2 3/4] cryptodev-vhost-user: add crypto session handler

2018-01-16 Thread Zhoujian (jay)
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, January 16, 2018 12:24 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; stefa...@redhat.com; pa...@linux.vnet.ibm.com;
> longpeng <longpe...@huawei.com>; xin.z...@intel.com; roy.fan.zh...@intel.com;
> Gonglei (Arei) <arei.gong...@huawei.com>
> Subject: Re: [PATCH v2 3/4] cryptodev-vhost-user: add crypto session handler
> 
> On Sat, Dec 30, 2017 at 04:52:12PM +0800, Jay Zhou wrote:
> > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> > index 954771d..f43c63d 100644
> > --- a/docs/interop/vhost-user.txt
> > +++ b/docs/interop/vhost-user.txt
> > @@ -596,6 +596,25 @@ Master message types
> >and expect this message once (per VQ) during device configuration
> >(ie. before the master starts the VQ).
> >
> > + * VHOST_USER_CREATE_CRYPTO_SESSION
> > +
> > +  Id: 23
> > +  Equivalent ioctl: N/A
> > +  Master payload: crypto session description
> > +  Slave payload: crypto session description
> > +
> > +  Create a session for crypto operation. The server side must return
> the
> > +  session id, 0 or positive for success, negative for failure.
> > +
> > + * VHOST_USER_CLOSE_CRYPTO_SESSION
> > +
> > +  Id: 24
> > +  Equivalent ioctl: N/A
> > +  Master payload: u64
> > +
> > +  Close a session for crypto operation which was previously
> > +  created by VHOST_USER_CREATE_CRYPTO_SESSION.
> > +
> >  Slave message types
> >  ---
> >
> 
> 
> Sorry about a delayed response.
> So an issue with this patchset is that you are adding new messages
> unconditionally.
> 
> You must add a protocol bit whenever you add new messages.
> If appropriate, you can document that it's a required feature for crypto
> devices.

OK, will do.

Regards,
Jay

> 
> 
> --
> MST



Re: [Qemu-devel] [PATCH] memory: update comments and fix some typos

2018-01-16 Thread Zhoujian (jay)
Hi Paolo,
Maybe it is a little boring to review updated comments, but I think it is the
right thing to do, so could you have a look when you're free?

Regards,
Jay

> -Original Message-
> From: Zhoujian (jay)
> Sent: Thursday, January 04, 2018 1:30 PM
> To: qemu-devel@nongnu.org
> Cc: pbonz...@redhat.com; Huangweidong (C) <weidong.hu...@huawei.com>; wangxin
> (U) <wangxinxin.w...@huawei.com>; Zhoujian (jay) <jianjay.z...@huawei.com>
> Subject: [PATCH] memory: update comments and fix some typos
> 
> Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> ---
>  include/exec/memory.h | 27 +++
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h index
> a4cabdf..6e5684d 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -324,7 +324,7 @@ FlatView *address_space_to_flatview(AddressSpace *as);
>   * MemoryRegionSection: describes a fragment of a #MemoryRegion
>   *
>   * @mr: the region, or %NULL if empty
> - * @address_space: the address space the region is mapped in
> + * @fv: the flat view of the address space the region is mapped in
>   * @offset_within_region: the beginning of the section, relative to @mr's
> start
>   * @size: the size of the section; will not exceed @mr's boundaries
>   * @offset_within_address_space: the address of the first byte of the
> section @@ -607,6 +607,7 @@ void
> memory_region_init_rom_nomigrate(MemoryRegion *mr,
>   * @mr: the #MemoryRegion to be initialized.
>   * @owner: the object that tracks the region's reference count
>   * @ops: callbacks for write access handling (must not be NULL).
> + * @opaque: passed to the read and write callbacks of the @ops structure.
>   * @name: Region name, becomes part of RAMBlock name used in migration
> stream
>   *must be unique within any device
>   * @size: size of the region.
> @@ -650,11 +651,10 @@ static inline void
> memory_region_init_reservation(MemoryRegion *mr,
>   * An IOMMU region translates addresses and forwards accesses to a target
>   * memory region.
>   *
> - * @typename: QOM class name
>   * @_iommu_mr: the #IOMMUMemoryRegion to be initialized
>   * @instance_size: the IOMMUMemoryRegion subclass instance size
> + * @mrtypename: the type name of the #IOMMUMemoryRegion
>   * @owner: the object that tracks the region's reference count
> - * @ops: a function that translates addresses into the @target region
>   * @name: used for debugging; not visible to the user or ABI
>   * @size: size of the region.
>   */
> @@ -824,8 +824,8 @@ static inline IOMMUMemoryRegion
> *memory_region_get_iommu(MemoryRegion *mr)
>   * memory_region_get_iommu_class_nocheck: returns iommu memory region class
>   *   if an iommu or NULL if not
>   *
> - * Returns pointer to IOMMUMemoryRegioniClass if a memory region is an iommu,
> - * otherwise NULL. This is fast path avoinding QOM checking, use with
> caution.
> + * Returns pointer to IOMMUMemoryRegionClass if a memory region is an
> + iommu,
> + * otherwise NULL. This is fast path avoiding QOM checking, use with caution.
>   *
>   * @mr: the memory region being queried
>   */
> @@ -990,7 +990,8 @@ int memory_region_get_fd(MemoryRegion *mr);
>   * protecting the pointer, such as a reference to the region that includes
>   * the incoming ram_addr_t.
>   *
> - * @mr: the memory region being queried.
> + * @ptr: the host pointer to be converted
> + * @offset: the offset within memory region
>   */
>  MemoryRegion *memory_region_from_host(void *ptr, ram_addr_t *offset);
> 
> @@ -1267,7 +1268,7 @@ void memory_region_clear_global_locking(MemoryRegion
> *mr);
>   * @size: the size of the access to trigger the eventfd
>   * @match_data: whether to match against @data, instead of just @addr
>   * @data: the data to match against the guest write
> - * @fd: the eventfd to be triggered when @addr, @size, and @data all match.
> + * @e: event notifier to be triggered when @addr, @size, and @data all match.
>   **/
>  void memory_region_add_eventfd(MemoryRegion *mr,
> hwaddr addr, @@ -1287,7 +1288,7 @@ void
> memory_region_add_eventfd(MemoryRegion *mr,
>   * @size: the size of the access to trigger the eventfd
>   * @match_data: whether to match against @data, instead of just @addr
>   * @data: the data to match against the guest write
> - * @fd: the eventfd to be triggered when @addr, @size, and @data all match.
> + * @e: event notifier to be triggered when @addr, @size, and @data all match.
>   */
>  void memory_region_del_eventfd(MemoryRegion *mr,
> hwaddr addr, @@ -1523,7 +1524,7 @@ bool
> memory_region_request_mmi

Re: [Qemu-devel] [PATCH v6 0/3] vhost: two fixes and used_memslots refactoring

2018-01-16 Thread Zhoujian (jay)
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, January 16, 2018 12:42 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [PATCH v6 0/3] vhost: two fixes and used_memslots refactoring
> 
> On Fri, Jan 12, 2018 at 10:47:56AM +0800, Jay Zhou wrote:
> > Jay Zhou (3):
> >   vhost: remove assertion to prevent crash
> >   vhost: fix memslot limit check
> >   vhost: used_memslots refactoring
> 
> This looks good to me, but needs to be rebased on top of vhost mem slot
> management refactoring by dgilbert is merged. Pls post the rebase, and I'll
> merge.

Hi Michael,
I have tested but there're no conflicts, it is strange.

The patches I applied locally in sequence:

[PULL 21/33] vhost: Build temporary section list and deref after commit
[PULL 22/33] vhost: Move log_dirty check
[PULL 23/33] vhost: Simplify ring verification checks
[PULL 24/33] vhost: Merge sections added to temporary list

And then,

[PATCH v6 1/3] vhost: remove assertion to prevent crash
[PATCH v6 2/3] vhost: fix memslot limit check
[PATCH v6 3/3] vhost: used_memslots refactoring

Could you point out which patches from Dave(or others) are conflict with mine?

Regards,
Jay

> 
> Thanks!
> 
> 
> 
> >  hw/virtio/vhost-backend.c | 15 +++-
> >  hw/virtio/vhost-user.c| 74 +++
> 
> >  hw/virtio/vhost.c | 30 +---
> >  include/hw/virtio/vhost-backend.h |  6 ++--
> >  4 files changed, 86 insertions(+), 39 deletions(-)
> >
> > --
> > 1.8.3.1
> >



Re: [Qemu-devel] [PATCH] tap: close fd conditionally when error occured

2018-01-15 Thread Zhoujian (jay)
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, January 15, 2018 3:21 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> Cc: m...@redhat.com; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>
> Subject: Re: [PATCH] tap: close fd conditionally when error occured
> 
> 
> 
> On 2018年01月12日 15:30, Jay Zhou wrote:
> > If netdev_add tap,id=net0,vhost=on failed in net_init_tap_one(), the
> > followed up device_add virtio-net-pci,netdev=net0 will fail too,
> > prints:
> >
> >TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD
> >ioctl() failed: Bad file descriptor
> >
> > The reason is that the fd of tap is closed when error occured after
> > calling net_init_tap_one().
> >
> > I think the fd should be closed in these two case:
> >1.tap_set_sndbuf() failed
> >2.tap_set_sndbuf() succeeded but vhost failed to initialize with
> >  vhostforce flag on
> > Meanwhile, the fd should not be closed just because vhost failed to
> > initialize but without vhostforce flag. So the followed up device_add
> > can fall back to userspace virtio successfully.
> >
> > Suggested-by: Michael S. Tsirkin <m...@redhat.com>
> > Suggested-by: Igor Mammedov <imamm...@redhat.com>
> > Suggested-by: Jason Wang <jasow...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > ---
> >   net/tap.c | 18 +-
> >   1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/tap.c b/net/tap.c
> > index 979e622..3ed72eb 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -651,6 +651,9 @@ static void net_init_tap_one(const NetdevTapOptions
> *tap, NetClientState *peer,
> >   tap_set_sndbuf(s->fd, tap, );
> >   if (err) {
> >   error_propagate(errp, err);
> > +if (!tap->has_fd && !tap->has_fds) {
> > +close(fd);
> > +}
> >   return;
> >   }
> >
> > @@ -687,14 +690,14 @@ static void net_init_tap_one(const NetdevTapOptions
> *tap, NetClientState *peer,
> >   vhostfd = monitor_fd_param(cur_mon, vhostfdname, );
> >   if (vhostfd == -1) {
> >   error_propagate(errp, err);
> > -return;
> > +goto cleanup;
> >   }
> >   } else {
> >   vhostfd = open("/dev/vhost-net", O_RDWR);
> >   if (vhostfd < 0) {
> >   error_setg_errno(errp, errno,
> >"tap: open vhost char device failed");
> > -return;
> > +goto cleanup;
> >   }
> >   fcntl(vhostfd, F_SETFL, O_NONBLOCK);
> >   }
> > @@ -704,11 +707,18 @@ static void net_init_tap_one(const NetdevTapOptions
> *tap, NetClientState *peer,
> >   if (!s->vhost_net) {
> >   error_setg(errp,
> >  "vhost-net requested but could not be
> > initialized");
> 
> So error_setg() is not appropriate here consider it was not trated as an
> error. We probably just need some warning.

Should it be warning too if vhostforce flag on?

> 
> > -return;
> > +goto cleanup;
> >   }
> >   } else if (vhostfdname) {
> >   error_setg(errp, "vhostfd(s)= is not valid without vhost");
> >   }
> > +
> > +cleanup:
> > +if (!tap->has_fd && !tap->has_fds && tap->has_vhostforce &&
> > +tap->vhostforce) {
> > +close(fd);
> 
> I would still let caller to decide whether or not to close the fd.

How about moving tap_set_sndbuf() out of net_init_tap_one()? So the caller
could decide whether or not close the fd easily(The reason is explained in
the commit message). Any better suggestion?

Regards,
Jay

> 
> Thanks
> 
> > +}
> > +return;
> >   }
> >
> >   static int get_fds(char *str, char *fds[], int max) @@ -877,7 +887,6
> > @@ free_fail:
> >vnet_hdr, fd, );
> >   if (err) {
> >   error_propagate(errp, err);
> > -close(fd);
> >   return -1;
> >   }
> >   } else {
> > @@ -916,7 +925,6 @@ free_fail:
> >vhostfdname, vnet_hdr, fd, );
> >   if (err) {
> >   error_propagate(errp, err);
> > -close(fd);
> >   return -1;
> >   }
> >   }



Re: [Qemu-devel] [PATCH v5 0/4] vhost: two fixes and used_memslots refactoring

2018-01-11 Thread Zhoujian (jay)


> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Friday, January 12, 2018 3:56 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: Igor Mammedov <imamm...@redhat.com>; qemu-devel@nongnu.org; Huangweidong
> (C) <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>;
> Gonglei (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler)
> <liuzh...@huawei.com>
> Subject: Re: [PATCH v5 0/4] vhost: two fixes and used_memslots refactoring
> 
> On Thu, Jan 11, 2018 at 01:55:38PM +, Zhoujian (jay) wrote:
> > Hi Igor,
> >
> > > -Original Message-
> > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > Sent: Thursday, January 11, 2018 9:05 PM
> > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > Cc: qemu-devel@nongnu.org; m...@redhat.com; Huangweidong (C)
> > > <weidong.hu...@huawei.com>; wangxin (U)
> > > <wangxinxin.w...@huawei.com>; Gonglei
> > > (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler)
> > > <liuzh...@huawei.com>
> > > Subject: Re: [PATCH v5 0/4] vhost: two fixes and used_memslots
> > > refactoring
> > >
> > > On Wed, 10 Jan 2018 00:39:02 +0800
> > > Jay Zhou <jianjay.z...@huawei.com> wrote:
> > >
> > > Jay,
> > > considering how non trivial touched code is, would you mind first
> > > adding 'make check' testcases for success/failure paths that you are
> touching?
> > > It would help with merging and ensure that future changes to vhost
> > > won't break memslots handling.
> >
> > Will look into the testcases. Maybe it would take some time since I'm
> > not very familiar with them.
> >
> > Regards,
> > Jay
> 
> 
> ok can you repost whatever's ready please? And please use proper threading
> (e.g. generated with git format-patch --thread=shallow).
> 

Will send a v6 version soon.

PS: Considering vhost memslots series and the patch "tap: do not close fd
if only vhost failed to initialize" are two separate issues, will send
them separately.

Regards,
Jay

> > >
> > > > v4 -> v5:
> > > >   Making the followed up device_add to fall back to userspace
> > > >   virtio when netdev_add fails if vhost force flag does not set.
> > > >
> > > > Jay Zhou (4):
> > > >   vhost: remove assertion to prevent crash
> > > >   tap: do not close fd if only vhost failed to initialize
> > > >   vhost: fix memslot limit check
> > > >   vhost: used_memslots refactoring
> > > >
> > > >  hw/virtio/vhost-backend.c | 15 +++-
> > > >  hw/virtio/vhost-user.c| 74 +++
> 
> > > 
> > > >  hw/virtio/vhost.c | 30 +---
> > > >  include/hw/virtio/vhost-backend.h |  6 ++--
> > > >  net/tap.c | 25 +
> > > >  5 files changed, 104 insertions(+), 46 deletions(-)
> > > >
> > > > --
> > > > 1.8.3.1
> > > >
> > > >



Re: [Qemu-devel] [PATCH v5 0/4] vhost: two fixes and used_memslots refactoring

2018-01-11 Thread Zhoujian (jay)
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Thursday, January 11, 2018 9:05 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; m...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [PATCH v5 0/4] vhost: two fixes and used_memslots refactoring
> 
> On Wed, 10 Jan 2018 00:39:02 +0800
> Jay Zhou <jianjay.z...@huawei.com> wrote:
> 
> Jay,
> considering how non trivial touched code is, would you mind first adding
> 'make check' testcases for success/failure paths that you are touching?
> It would help with merging and ensure that future changes to vhost won't
> break memslots handling.

Will look into the testcases. Maybe it would take some time since I'm not
very familiar with them.

Regards,
Jay

> 
> > v4 -> v5:
> >   Making the followed up device_add to fall back to userspace
> >   virtio when netdev_add fails if vhost force flag does not set.
> >
> > Jay Zhou (4):
> >   vhost: remove assertion to prevent crash
> >   tap: do not close fd if only vhost failed to initialize
> >   vhost: fix memslot limit check
> >   vhost: used_memslots refactoring
> >
> >  hw/virtio/vhost-backend.c | 15 +++-
> >  hw/virtio/vhost-user.c| 74 +++
> 
> >  hw/virtio/vhost.c | 30 +---
> >  include/hw/virtio/vhost-backend.h |  6 ++--
> >  net/tap.c | 25 +
> >  5 files changed, 104 insertions(+), 46 deletions(-)
> >
> > --
> > 1.8.3.1
> >
> >




Re: [Qemu-devel] [PATCH v5 2/4][RFC] tap: do not close fd if only vhost failed to initialize

2018-01-11 Thread Zhoujian (jay)
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, January 11, 2018 6:30 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; m...@redhat.com; wangxin (U)
> <wangxinxin.w...@huawei.com>; Gonglei (Arei) <arei.gong...@huawei.com>;
> imamm...@redhat.com; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v5 2/4][RFC] tap: do not close fd if only
> vhost failed to initialize
> 
> 
> 
> On 2018年01月11日 11:54, Zhoujian (jay) wrote:
> > Hi Jason,
> >
> >> -Original Message-
> >> From: Jason Wang [mailto:jasow...@redhat.com]
> >> Sent: Thursday, January 11, 2018 11:35 AM
> >> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> >> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; m...@redhat.com;
> >> wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei (Arei)
> >> <arei.gong...@huawei.com>; imamm...@redhat.com; Liuzhe (Ahriy, Euler)
> >> <liuzh...@huawei.com>
> >> Subject: Re: [Qemu-devel] [PATCH v5 2/4][RFC] tap: do not close fd if
> >> only vhost failed to initialize
> >>
> >>
> >>
> >> On 2018年01月10日 15:39, Zhoujian (jay) wrote:
> >>>>>> +*vhost_init_failed = true;
> >>>> Why not simply check s->vhost_net after call net_init_tap_one()?
> >>> s->vhost_net is always NULL if net_init_tap_one() failed, it can't
> >> distinguish
> >>> failure reasons.
> >> On which condition net_init_tap_one() fail but s->vhost_net is set?
> > No, it does not exist, so we could not use s->vhost_net to decide
> > whether close the fd of tap when error occurred.
> >
> >
> > Maybe the patch below will be much better to understand, please have a look.
> >
> > diff --git a/net/tap.c b/net/tap.c
> > index 979e622..a5c5111 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -642,7 +642,8 @@ static void net_init_tap_one(const NetdevTapOptions
> *tap, NetClientState *peer,
> >const char *model, const char *name,
> >const char *ifname, const char *script,
> >const char *downscript, const char
> *vhostfdname,
> > - int vnet_hdr, int fd, Error **errp)
> > + int vnet_hdr, int fd, Error **errp,
> > + bool *error_is_set_sndbuf)
> >   {
> >   Error *err = NULL;
> >   TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
> > @@ -651,6 +652,9 @@ static void net_init_tap_one(const NetdevTapOptions
> *tap, NetClientState *peer,
> >   tap_set_sndbuf(s->fd, tap, );
> >   if (err) {
> >   error_propagate(errp, err);
> > +if (error_is_set_sndbuf) {
> > +*error_is_set_sndbuf = true;
> > +}
> >   return;
> >   }
> >
> > @@ -748,6 +752,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
> >   Error *err = NULL;
> >   const char *vhostfdname;
> >   char ifname[128];
> > +bool error_is_set_sndbuf = false;
> >
> >   assert(netdev->type == NET_CLIENT_DRIVER_TAP);
> >   tap = >u.tap;
> > @@ -783,7 +788,7 @@ int net_init_tap(const Netdev *netdev, const char
> > *name,
> >
> >   net_init_tap_one(tap, peer, "tap", name, NULL,
> >script, downscript,
> > - vhostfdname, vnet_hdr, fd, );
> > + vhostfdname, vnet_hdr, fd, , NULL);
> >   if (err) {
> >   error_propagate(errp, err);
> >   return -1;
> > @@ -835,7 +840,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
> >   net_init_tap_one(tap, peer, "tap", name, ifname,
> >script, downscript,
> >tap->has_vhostfds ? vhost_fds[i] : NULL,
> > - vnet_hdr, fd, );
> > + vnet_hdr, fd, , NULL);
> >   if (err) {
> >   error_propagate(errp, err);
> >   goto free_fail;
> > @@ -874,10 +879,13 @@ free_fail:
> >
> >   net_init_tap_one(tap, peer, "bridge", name, ifname,
> >script, downscript, vhostfdname

Re: [Qemu-devel] [PATCH v5 2/4][RFC] tap: do not close fd if only vhost failed to initialize

2018-01-10 Thread Zhoujian (jay)
Hi Jason,

> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, January 11, 2018 11:35 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; m...@redhat.com; wangxin (U)
> <wangxinxin.w...@huawei.com>; Gonglei (Arei) <arei.gong...@huawei.com>;
> imamm...@redhat.com; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v5 2/4][RFC] tap: do not close fd if only
> vhost failed to initialize
> 
> 
> 
> On 2018年01月10日 15:39, Zhoujian (jay) wrote:
> >>>> +*vhost_init_failed = true;
> >> Why not simply check s->vhost_net after call net_init_tap_one()?
> > s->vhost_net is always NULL if net_init_tap_one() failed, it can't
> distinguish
> > failure reasons.
> 
> On which condition net_init_tap_one() fail but s->vhost_net is set?

No, it does not exist, so we could not use s->vhost_net to decide
whether close the fd of tap when error occurred.


Maybe the patch below will be much better to understand, please have a look.

diff --git a/net/tap.c b/net/tap.c
index 979e622..a5c5111 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -642,7 +642,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
  const char *model, const char *name,
  const char *ifname, const char *script,
  const char *downscript, const char *vhostfdname,
- int vnet_hdr, int fd, Error **errp)
+ int vnet_hdr, int fd, Error **errp,
+ bool *error_is_set_sndbuf)
 {
 Error *err = NULL;
 TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
@@ -651,6 +652,9 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 tap_set_sndbuf(s->fd, tap, );
 if (err) {
 error_propagate(errp, err);
+if (error_is_set_sndbuf) {
+*error_is_set_sndbuf = true;
+}
 return;
 }
 
@@ -748,6 +752,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
 Error *err = NULL;
 const char *vhostfdname;
 char ifname[128];
+bool error_is_set_sndbuf = false;
 
 assert(netdev->type == NET_CLIENT_DRIVER_TAP);
 tap = >u.tap;
@@ -783,7 +788,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
 
 net_init_tap_one(tap, peer, "tap", name, NULL,
  script, downscript,
- vhostfdname, vnet_hdr, fd, );
+ vhostfdname, vnet_hdr, fd, , NULL);
 if (err) {
 error_propagate(errp, err);
 return -1;
@@ -835,7 +840,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
 net_init_tap_one(tap, peer, "tap", name, ifname,
  script, downscript,
  tap->has_vhostfds ? vhost_fds[i] : NULL,
- vnet_hdr, fd, );
+ vnet_hdr, fd, , NULL);
 if (err) {
 error_propagate(errp, err);
 goto free_fail;
@@ -874,10 +879,13 @@ free_fail:
 
 net_init_tap_one(tap, peer, "bridge", name, ifname,
  script, downscript, vhostfdname,
- vnet_hdr, fd, );
+ vnet_hdr, fd, , _is_set_sndbuf);
 if (err) {
 error_propagate(errp, err);
-close(fd);
+if (error_is_set_sndbuf || (tap->has_vhostforce &&
+tap->vhostforce)) {
+close(fd);
+}
 return -1;
 }
 } else {
@@ -913,10 +921,14 @@ free_fail:
 net_init_tap_one(tap, peer, "tap", name, ifname,
  i >= 1 ? "no" : script,
  i >= 1 ? "no" : downscript,
- vhostfdname, vnet_hdr, fd, );
+ vhostfdname, vnet_hdr, fd, ,
+ _is_set_sndbuf);
 if (err) {
 error_propagate(errp, err);
-close(fd);
+if (error_is_set_sndbuf || (tap->has_vhostforce &&
+tap->vhostforce)) {
+close(fd);
+}
 return -1;
 }
 }



PS: I think I do not express the meaning clearly... I can express it in
Chinese in private if necessary

Regards,
Jay


Re: [Qemu-devel] [PATCH v5 1/4] vhost: remove assertion to prevent crash

2018-01-10 Thread Zhoujian (jay)
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Wednesday, January 10, 2018 9:31 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; Huangweidong (C) <weidong.hu...@huawei.com>;
> m...@redhat.com; wangxin (U) <wangxinxin.w...@huawei.com>; qemu-
> sta...@nongnu.org; Gonglei (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy,
> Euler) <liuzh...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v5 1/4] vhost: remove assertion to prevent
> crash
> 
> On Wed, 10 Jan 2018 00:39:35 +0800
> Jay Zhou <jianjay.z...@huawei.com> wrote:
> 
> > Start QEMU with more DIMM devices than limit but without any vhost
> > backends and then hotplug a vhost user backend, the VM will be
> > crashed.
> > Instead of asserting in vhost_user_set_mem_table(), error number is
> > used to gracefully prevent device to start. This fixes the crash
> > issue.
> 
> I'd rewrite commit message as following:
> --
> QEMU will assert on vhsot-user backed virtio device hotplug if QEMU is using
> more RAM regions than VHOST_MEMORY_MAX_NREGIONS (for example if it  were
> started with a lot of DIMM devices).
> 
> Fix it by returning error instead of asserting and let callers of
> vhost_set_mem_table() handle error condition gracefully.
> --

s/vhsot-user/vhost-user, otherwise much better than mine, will use it
in the next version, thanks!

Jay

> >
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > ---
> >  hw/virtio/vhost-user.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> > 093675e..8500562 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -317,11 +317,14 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
> >   );
> >  fd = memory_region_get_fd(mr);
> >  if (fd > 0) {
> > +if (fd_num == VHOST_MEMORY_MAX_NREGIONS) {
> > +error_report("Failed preparing vhost-user memory table
> msg");
> > +return -1;
> > +}
> >  msg.payload.memory.regions[fd_num].userspace_addr = reg-
> >userspace_addr;
> >  msg.payload.memory.regions[fd_num].memory_size  = reg-
> >memory_size;
> >  msg.payload.memory.regions[fd_num].guest_phys_addr = reg-
> >guest_phys_addr;
> >  msg.payload.memory.regions[fd_num].mmap_offset = offset;
> > -assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
> >  fds[fd_num++] = fd;
> >  }
> >  }




Re: [Qemu-devel] [PATCH v5 2/4][RFC] tap: do not close fd if only vhost failed to initialize

2018-01-09 Thread Zhoujian (jay)
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Wednesday, January 10, 2018 2:02 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; m...@redhat.com; wangxin (U)
> <wangxinxin.w...@huawei.com>; Gonglei (Arei) <arei.gong...@huawei.com>;
> imamm...@redhat.com; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v5 2/4][RFC] tap: do not close fd if only
> vhost failed to initialize
> 
> 
> 
> On 2018年01月10日 12:18, Zhoujian (jay) wrote:
> > Sorry about missing to cc Jason.
> >
> >
> > Close the fd of the tap unconditionally when netdev_add
> > tap,id=net0,vhost=on failed in net_init_tap_one() will make the
> > followed up device_add virtio-net-pci,netdev=net0 failed too, which prints:
> >
> >  TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD
> >  ioctl() failed: Bad file descriptor
> >
> > This patch makes the followed up device_add be able to fall back to
> > userspace virtio when netdev_add failed with vhost turning on but vhost
> force flag does not set.
> >
> > Here is the original discussion:
> > https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05205.html
> >
> >
> > This is a RFC version, if I'm wrong, please let me know, thanks!
> >
> > PS: there're two places updated, see below.
> >
> >
> >> -Original Message-
> >> From: Zhoujian (jay)
> >> Sent: Wednesday, January 10, 2018 12:40 AM
> >> To: qemu-devel@nongnu.org
> >> Cc: m...@redhat.com; imamm...@redhat.com; Huangweidong (C)
> >> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>;
> >> Gonglei
> >> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler)
> >> <liuzh...@huawei.com>; Zhoujian (jay) <jianjay.z...@huawei.com>
> >> Subject: [PATCH v5 2/4][RFC] tap: do not close fd if only vhost
> >> failed to initialize
> >>
> >> Making the followed up device_add to fall back to userspace virtio
> >> when netdev_add fails if vhost force flag does not set.
> >>
> >> Suggested-by: Michael S. Tsirkin <m...@redhat.com>
> >> Suggested-by: Igor Mammedov <imamm...@redhat.com>
> >> Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> >> ---
> >>   net/tap.c | 25 ++---
> >>   1 file changed, 18 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/net/tap.c b/net/tap.c
> >> index 979e622..03f226f 100644
> >> --- a/net/tap.c
> >> +++ b/net/tap.c
> >> @@ -642,7 +642,8 @@ static void net_init_tap_one(const
> >> NetdevTapOptions *tap, NetClientState *peer,
> >>const char *model, const char *name,
> >>const char *ifname, const char *script,
> >>const char *downscript, const char
> *vhostfdname,
> >> - int vnet_hdr, int fd, Error **errp)
> >> + int vnet_hdr, int fd, Error **errp,
> >> + bool *vhost_init_failed)
> >>   {
> >>   Error *err = NULL;
> >>   TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
> >> @@ -
> >> 687,6 +688,7 @@ static void net_init_tap_one(const NetdevTapOptions
> >> *tap, NetClientState *peer,
> >>   vhostfd = monitor_fd_param(cur_mon, vhostfdname, );
> >>   if (vhostfd == -1) {
> >>   error_propagate(errp, err);
> >> +*vhost_init_failed = true;
> >>   return;
> >>   }
> >>   } else {
> >> @@ -694,6 +696,7 @@ static void net_init_tap_one(const
> >> NetdevTapOptions *tap, NetClientState *peer,
> >>   if (vhostfd < 0) {
> >>   error_setg_errno(errp, errno,
> >>"tap: open vhost char device
> >> failed");
> >> +*vhost_init_failed = true;
> >>   return;
> >>   }
> >>   fcntl(vhostfd, F_SETFL, O_NONBLOCK); @@ -704,6 +707,7
> >> @@ static void net_init_tap_one(const NetdevTapOptions *tap,
> NetClientState *peer,
> >>   if (!s->vhost_net) {
> >>   error_setg(errp,
> >>  "vhost-net reques

Re: [Qemu-devel] [PATCH v5 2/4][RFC] tap: do not close fd if only vhost failed to initialize

2018-01-09 Thread Zhoujian (jay)
Sorry about missing to cc Jason.


Close the fd of the tap unconditionally when netdev_add tap,id=net0,vhost=on 
failed
in net_init_tap_one() will make the followed up device_add 
virtio-net-pci,netdev=net0
failed too, which prints:

TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD 
ioctl() failed: Bad file descriptor

This patch makes the followed up device_add be able to fall back to userspace 
virtio
when netdev_add failed with vhost turning on but vhost force flag does not set.

Here is the original discussion:
https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg05205.html


This is a RFC version, if I'm wrong, please let me know, thanks!

PS: there're two places updated, see below.


> -Original Message-
> From: Zhoujian (jay)
> Sent: Wednesday, January 10, 2018 12:40 AM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei
> (Arei) <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>;
> Zhoujian (jay) <jianjay.z...@huawei.com>
> Subject: [PATCH v5 2/4][RFC] tap: do not close fd if only vhost failed to
> initialize
> 
> Making the followed up device_add to fall back to userspace virtio when
> netdev_add fails if vhost force flag does not set.
> 
> Suggested-by: Michael S. Tsirkin <m...@redhat.com>
> Suggested-by: Igor Mammedov <imamm...@redhat.com>
> Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> ---
>  net/tap.c | 25 ++---
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/net/tap.c b/net/tap.c
> index 979e622..03f226f 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -642,7 +642,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap,
> NetClientState *peer,
>   const char *model, const char *name,
>   const char *ifname, const char *script,
>   const char *downscript, const char *vhostfdname,
> - int vnet_hdr, int fd, Error **errp)
> + int vnet_hdr, int fd, Error **errp,
> + bool *vhost_init_failed)
>  {
>  Error *err = NULL;
>  TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr); @@ -
> 687,6 +688,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap,
> NetClientState *peer,
>  vhostfd = monitor_fd_param(cur_mon, vhostfdname, );
>  if (vhostfd == -1) {
>  error_propagate(errp, err);
> +*vhost_init_failed = true;
>  return;
>  }
>  } else {
> @@ -694,6 +696,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap,
> NetClientState *peer,
>  if (vhostfd < 0) {
>  error_setg_errno(errp, errno,
>   "tap: open vhost char device failed");
> +*vhost_init_failed = true;
>  return;
>  }
>  fcntl(vhostfd, F_SETFL, O_NONBLOCK); @@ -704,6 +707,7 @@ static
> void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
>  if (!s->vhost_net) {
>  error_setg(errp,
> "vhost-net requested but could not be initialized");
> +*vhost_init_failed = true;
>  return;
>  }
>  } else if (vhostfdname) {
> @@ -748,6 +752,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
>  Error *err = NULL;
>  const char *vhostfdname;
>  char ifname[128];
> +bool vhost_init_failed = false;
> 
>  assert(netdev->type == NET_CLIENT_DRIVER_TAP);
>  tap = >u.tap;
> @@ -783,7 +788,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
> 
>  net_init_tap_one(tap, peer, "tap", name, NULL,
>   script, downscript,
> - vhostfdname, vnet_hdr, fd, );
> + vhostfdname, vnet_hdr, fd, ,
> + _init_failed);
>  if (err) {
>  error_propagate(errp, err);
>  return -1;
> @@ -835,7 +840,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
>  net_init_tap_one(tap, peer, "tap", name, ifname,
>   script, downscript,
>   tap->has_vhostfds ? vhost_fds[i] : NULL,
> - vnet_hdr, fd, );
> + vnet_hdr, fd, , _init_failed);
>  if (err) {
>  error_propagate(errp, err);
>  

Re: [Qemu-devel] [PATCH v2 2/2] vhost: double check used memslots number

2018-01-03 Thread Zhoujian (jay)
> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Thursday, December 28, 2017 7:29 PM
> To: Michael S. Tsirkin <m...@redhat.com>
> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; wangxin (U)
> <wangxinxin.w...@huawei.com>; qemu-devel@nongnu.org; Liuzhe (Cloud Open
> Labs, NFV) <gary.liu...@huawei.com>; dgilb...@redhat.com; Gonglei (Arei)
> <arei.gong...@huawei.com>; Zhoujian (jay) <jianjay.z...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v2 2/2] vhost: double check used memslots
> number
> 
> On Fri, 22 Dec 2017 23:15:09 +0200
> "Michael S. Tsirkin" <m...@redhat.com> wrote:
> 
> > On Fri, Dec 22, 2017 at 07:48:55PM +0100, Igor Mammedov wrote:
> > > On Fri, 15 Dec 2017 16:45:55 +0800
> > > Jay Zhou <jianjay.z...@huawei.com> wrote:
> > >
> > > > If the VM already has N(N>8) available memory slots for vhost
> > > > user, the VM will be crashed in vhost_user_set_mem_table if we try
> > > > to hotplug the first vhost user NIC.
> > > > This patch checks if memslots number exceeded or not after
> > > > updating vhost_user_used_memslots.
> > > Can't understand commit message, pls rephrase (what is being fixed,
> > > and how it's fixed) also include reproducing steps for crash and
> > > maybe describe call flow/backtrace that triggers crash.
> > >
> > > PS:
> > > I wasn't able to reproduce crash
> > >
> > > >
> > > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > > > ---
> > > >  hw/virtio/vhost.c | 27 +++
> > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index
> > > > 59a32e9..e45f5e2 100644
> > > > --- a/hw/virtio/vhost.c
> > > > +++ b/hw/virtio/vhost.c
> > > > @@ -1234,6 +1234,18 @@ static void vhost_virtqueue_cleanup(struct
> vhost_virtqueue *vq)
> > > >  event_notifier_cleanup(>masked_notifier);
> > > >  }
> > > >
> > > > +static bool vhost_dev_used_memslots_is_exceeded(struct vhost_dev
> > > > +*hdev) {
> > > > +if (hdev->vhost_ops->vhost_get_used_memslots() >
> > > > +hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
> > > > +error_report("vhost backend memory slots limit is less"
> > > > +" than current number of present memory slots");
> > > > +return true;
> > > > +}
> > > > +
> > > > +return false;
> > > > +}
> > > > +
> > > >  int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> > > > VhostBackendType backend_type, uint32_t
> > > > busyloop_timeout)  { @@ -1252,10 +1264,7 @@ int
> > > > vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> > > >  goto fail;
> > > >  }
> > > >
> > > > -if (hdev->vhost_ops->vhost_get_used_memslots() >
> > > > -hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
> > > > -error_report("vhost backend memory slots limit is less"
> > > > -" than current number of present memory slots");
> > > > +if (vhost_dev_used_memslots_is_exceeded(hdev)) {
> > > why do you keep this check?
> > > it seems always be false
> > >
> > >
> > >
> > > >  r = -1;
> > > >  goto fail;
> > > >  }
> > > > @@ -1341,6 +1350,16 @@ int vhost_dev_init(struct vhost_dev *hdev,
> void *opaque,
> > > >  hdev->memory_changed = false;
> > > >  memory_listener_register(>memory_listener,
> _space_memory);
> > > >  QLIST_INSERT_HEAD(_devices, hdev, entry);
> > > > +
> > > > +if (vhost_dev_used_memslots_is_exceeded(hdev)) {
> > > > +r = -1;
> > > > +if (busyloop_timeout) {
> > > > +goto fail_busyloop;
> > > > +} else {
> > > > +goto fail;
> > > > +}
> > > > +}
> > > seem to be right thing to do, since after registering listener for
> > > the first time used_memslots will be updated to actual value.
> > >
> > >
> > > I did some testing and without this hunk/patch
>

Re: [Qemu-devel] [PATCH RESEND v4] vhost: set used memslots for vhost-user and vhost-kernel respectively

2018-01-02 Thread Zhoujian (jay)
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Tuesday, January 02, 2018 11:46 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; Huangweidong (C) <weidong.hu...@huawei.com>;
> m...@redhat.com; wangxin (U) <wangxinxin.w...@huawei.com>; Gonglei (Arei)
> <arei.gong...@huawei.com>; Liuzhe (Ahriy, Euler) <liuzh...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH RESEND v4] vhost: set used memslots for
> vhost-user and vhost-kernel respectively
> 
> On Sat, 30 Dec 2017 14:36:51 +0800
> Jay Zhou <jianjay.z...@huawei.com> wrote:
> 
> > Used_memslots is equal to dev->mem->nregions now, it is true for vhost
> > kernel, but not for vhost user, which uses the memory regions that
> > have file descriptor. In fact, not all of the memory regions have file
> > descriptor.
> 
> > It is usefully in some scenarios, e.g. used_memslots is 8, and only
> > 5 memory slots can be used by vhost user, it is failed to hotplug a
> > new DIMM memory because vhost_has_free_slot just returned false,
> > however we can hotplug it safely in fact.
> It's too long sentence. Please, split and rephrase it so it would be clear
> what is happening.

Will do.

> 
> > Meanwhile, instead of asserting in vhost_user_set_mem_table(), error
> > number is used to gracefully prevent device to start. This fixed the
> > VM crash issue.
> Fix should be in separate patch as I mentioned in previous review.
> It would useful for stable tree, so you should CC it as well.
> 

OK.

> 
> > Suggested-by: Igor Mammedov <imamm...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > Signed-off-by: Liuzhe <liuzh...@huawei.com>
> > ---
> >  hw/virtio/vhost-backend.c | 15 +++-
> >  hw/virtio/vhost-user.c| 74 +++-
> ---
> >  hw/virtio/vhost.c | 18 +-
> >  include/hw/virtio/vhost-backend.h |  6 ++--
> >  4 files changed, 78 insertions(+), 35 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 7f09efa..59def69 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -15,6 +15,8 @@
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "qemu/error-report.h"
> >
> > +static unsigned int vhost_kernel_used_memslots;
> > +
> >  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int
> request,
> >   void *arg)  { @@ -62,6 +64,11 @@ static
> > int vhost_kernel_memslots_limit(struct vhost_dev *dev)
> >  return limit;
> >  }
> >
> > +static bool vhost_kernel_has_free_memslots(struct vhost_dev *dev) {
> > +return vhost_kernel_used_memslots <
> > +vhost_kernel_memslots_limit(dev); }
> > +
> >  static int vhost_kernel_net_set_backend(struct vhost_dev *dev,
> >  struct vhost_vring_file
> > *file)  { @@ -233,11 +240,16 @@ static void
> > vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> >  qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL,
> > NULL);  }
> >
> > +static void vhost_kernel_set_used_memslots(struct vhost_dev *dev) {
> > +vhost_kernel_used_memslots = dev->mem->nregions; }
> > +
> >  static const VhostOps kernel_ops = {
> >  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> >  .vhost_backend_init = vhost_kernel_init,
> >  .vhost_backend_cleanup = vhost_kernel_cleanup,
> > -.vhost_backend_memslots_limit = vhost_kernel_memslots_limit,
> > +.vhost_backend_has_free_memslots =
> > + vhost_kernel_has_free_memslots,
> >  .vhost_net_set_backend = vhost_kernel_net_set_backend,
> >  .vhost_scsi_set_endpoint = vhost_kernel_scsi_set_endpoint,
> >  .vhost_scsi_clear_endpoint =
> > vhost_kernel_scsi_clear_endpoint, @@ -264,6 +276,7 @@ static const
> > VhostOps kernel_ops = {  #endif /* CONFIG_VHOST_VSOCK */
> >  .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> >  .vhost_send_device_iotlb_msg =
> > vhost_kernel_send_device_iotlb_msg,
> > +.vhost_set_used_memslots = vhost_kernel_set_used_memslots,
> >  };
> >
> >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType
> > backend_type) diff --git a/hw/virtio/vhost-user.c
> > b/hw/virtio/vhost-user.c index 093675e..11c7d52 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw

Re: [Qemu-devel] [PATCH v4] vhost: set used memslots for vhost-user and vhost-kernel respectively

2017-12-29 Thread Zhoujian (jay)


> -Original Message-
> From: Zhoujian (jay)
> Sent: Saturday, December 30, 2017 3:06 AM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; imamm...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>;
> Gonglei (Arei) <arei.gong...@huawei.com>; Zhoujian (jay)
> <jianjay.z...@huawei.com>; Liuzhe (Cloud Open Labs, NFV)
> <gary.liu...@huawei.com>
> Subject: [PATCH v4] vhost: set used memslots for vhost-user and vhost-
> kernel respectively
> 
> Used_memslots is equal to dev->mem->nregions now, it is true for vhost
> kernel, but not for vhost user, which uses the memory regions that have
> file descriptor. In fact, not all of the memory regions have file
> descriptor.
> It is usefully in some scenarios, e.g. used_memslots is 8, and only
> 5 memory slots can be used by vhost user, it is failed to hotplug a new
> DIMM memory because vhost_has_free_slot just returned false, however we
> can hotplug it safely in fact.
> 
> Meanwhile, instead of asserting in vhost_user_set_mem_table(), error
> number is used to gracefully prevent device to start. This fixed the VM
> crash issue.
> 
> Suggested-by: Igor Mammedov <imamm...@redhat.com>
> Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> Signed-off-by: Zhe Liu <gary.liu...@huawei.com>

The email address of liuzhe will be updated to liuzh...@huawei.com

[...]

> 
> +static int vhost_user_prepare_msg(struct vhost_dev *dev, VhostUserMemory
> *mem,
> +  int *fds) {
> +int i, fd;
> +

Sorry about forgetting to add "vhost_user_free_memslots = true;" here.
Indeed, it is necessary. If vhost_user_free_memslots is false now,
it will be true if we hot-unplug some DIMMs, will resend v4.

Regards,
Jay

> +for (i = 0, mem->nregions = 0; i < dev->mem->nregions; ++i) {
> +struct vhost_memory_region *reg = dev->mem->regions + i;
> +ram_addr_t offset;
> +MemoryRegion *mr;
> +
> +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> +mr = memory_region_from_host((void *)(uintptr_t)reg-
> >userspace_addr,
> + );
> +fd = memory_region_get_fd(mr);
> +if (fd > 0) {
> +if (mem->nregions == VHOST_MEMORY_MAX_NREGIONS) {
> +vhost_user_free_memslots = false;
> +return -1;
> +}
> +
> +mem->regions[mem->nregions].userspace_addr = reg-
> >userspace_addr;
> +mem->regions[mem->nregions].memory_size = reg->memory_size;
> +mem->regions[mem->nregions].guest_phys_addr = reg-
> >guest_phys_addr;
> +mem->regions[mem->nregions].mmap_offset = offset;
> +fds[mem->nregions++] = fd;
> +}
> +}
> +
> +return 0;
> +}
> +

[...]




Re: [Qemu-devel] [PATCH v3] vhost: add used memslot number for vhost-user and vhost-kernel separately

2017-12-29 Thread Zhoujian (jay)


> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Friday, December 29, 2017 11:08 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; Huangweidong (C) <weidong.hu...@huawei.com>;
> m...@redhat.com; wangxin (U) <wangxinxin.w...@huawei.com>; Liuzhe (Cloud
> Open Labs, NFV) <gary.liu...@huawei.com>; Gonglei (Arei)
> <arei.gong...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v3] vhost: add used memslot number for
> vhost-user and vhost-kernel separately
> 
> On Fri, 29 Dec 2017 12:54:44 +
> "Zhoujian (jay)" <jianjay.z...@huawei.com> wrote:
> 
> >
> >
> > > -Original Message-
> > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > Sent: Friday, December 29, 2017 7:22 PM
> > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > Cc: qemu-devel@nongnu.org; Huangweidong (C)
> > > <weidong.hu...@huawei.com>; m...@redhat.com; wangxin (U)
> > > <wangxinxin.w...@huawei.com>; Liuzhe (Cloud Open Labs, NFV)
> > > <gary.liu...@huawei.com>; Gonglei (Arei) <arei.gong...@huawei.com>
> > > Subject: Re: [Qemu-devel] [PATCH v3] vhost: add used memslot number
> > > for vhost-user and vhost-kernel separately
> > >
> > > On Fri, 29 Dec 2017 10:37:40 +
> > > "Zhoujian (jay)" <jianjay.z...@huawei.com> wrote:
> > >
> > > > Hi Igor,
> > > >
> > > > > -Original Message-
> > > > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > > > Sent: Friday, December 29, 2017 5:31 PM
> > > > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > > > Cc: qemu-devel@nongnu.org; Huangweidong (C)
> > > > > <weidong.hu...@huawei.com>; m...@redhat.com; wangxin (U)
> > > > > <wangxinxin.w...@huawei.com>; Liuzhe (Cloud Open Labs, NFV)
> > > > > <gary.liu...@huawei.com>; Gonglei (Arei)
> > > > > <arei.gong...@huawei.com>
> > > > > Subject: Re: [Qemu-devel] [PATCH v3] vhost: add used memslot
> > > > > number for vhost-user and vhost-kernel separately
> > > > >
> > > > > On Fri, 29 Dec 2017 10:35:11 +0800 Jay Zhou
> > > > > <jianjay.z...@huawei.com> wrote:
> > > > >
> > > > > > Used_memslots is equal to dev->mem->nregions now, it is true
> > > > > > for vhost kernel, but not for vhost user, which uses the
> > > > > > memory regions that have file descriptor. In fact, not all of
> > > > > > the memory regions have file descriptor.
> > > > > > It is usefully in some scenarios, e.g. used_memslots is 8, and
> > > > > > only
> > > > > > 5 memory slots can be used by vhost user, it is failed to
> > > > > > hotplug a new DIMM memory because vhost_has_free_slot just
> > > > > > returned false, however we can hotplug it safely in fact.
> > > > > >
> > > > > > Meanwhile, instead of asserting in vhost_user_set_mem_table(),
> > > > > > error number is used to gracefully prevent device to start.
> > > > > > This fixed the VM crash issue.
> > > > >
> > > > > below mostly style related comments, otherwise patch looks good
> > > > > to me
> > > > > >
> > > > > > Suggested-by: Igor Mammedov <imamm...@redhat.com>
> > > > > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > > > > > Signed-off-by: Zhe Liu <gary.liu...@huawei.com>
> > > > > > ---
> > > > > >  hw/virtio/vhost-backend.c | 14 +++
> > > > > >  hw/virtio/vhost-user.c| 84
> > > +---
> > > > > ---
> > > > > >  hw/virtio/vhost.c | 16 
> > > > > >  include/hw/virtio/vhost-backend.h |  4 ++
> > > > > >  4 files changed, 91 insertions(+), 27 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/virtio/vhost-backend.c
> > > > > > b/hw/virtio/vhost-backend.c index 7f09efa..866718c 100644
> > > > > > --- a/hw/virtio/vhost-backend.c
> > > > > > +++ b/hw/virtio/vhost-backend.c
> > > > > > @@ -15,6 +15,8 @@
> > > > > >  #include "hw/virtio/vhost-backend.h"

Re: [Qemu-devel] [PATCH v3] vhost: add used memslot number for vhost-user and vhost-kernel separately

2017-12-29 Thread Zhoujian (jay)


> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Friday, December 29, 2017 7:22 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; Huangweidong (C) <weidong.hu...@huawei.com>;
> m...@redhat.com; wangxin (U) <wangxinxin.w...@huawei.com>; Liuzhe (Cloud
> Open Labs, NFV) <gary.liu...@huawei.com>; Gonglei (Arei)
> <arei.gong...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v3] vhost: add used memslot number for
> vhost-user and vhost-kernel separately
> 
> On Fri, 29 Dec 2017 10:37:40 +
> "Zhoujian (jay)" <jianjay.z...@huawei.com> wrote:
> 
> > Hi Igor,
> >
> > > -Original Message-
> > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > Sent: Friday, December 29, 2017 5:31 PM
> > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > Cc: qemu-devel@nongnu.org; Huangweidong (C)
> > > <weidong.hu...@huawei.com>; m...@redhat.com; wangxin (U)
> > > <wangxinxin.w...@huawei.com>; Liuzhe (Cloud Open Labs, NFV)
> > > <gary.liu...@huawei.com>; Gonglei (Arei) <arei.gong...@huawei.com>
> > > Subject: Re: [Qemu-devel] [PATCH v3] vhost: add used memslot number
> > > for vhost-user and vhost-kernel separately
> > >
> > > On Fri, 29 Dec 2017 10:35:11 +0800
> > > Jay Zhou <jianjay.z...@huawei.com> wrote:
> > >
> > > > Used_memslots is equal to dev->mem->nregions now, it is true for
> > > > vhost kernel, but not for vhost user, which uses the memory
> > > > regions that have file descriptor. In fact, not all of the memory
> > > > regions have file descriptor.
> > > > It is usefully in some scenarios, e.g. used_memslots is 8, and
> > > > only
> > > > 5 memory slots can be used by vhost user, it is failed to hotplug
> > > > a new DIMM memory because vhost_has_free_slot just returned false,
> > > > however we can hotplug it safely in fact.
> > > >
> > > > Meanwhile, instead of asserting in vhost_user_set_mem_table(),
> > > > error number is used to gracefully prevent device to start. This
> > > > fixed the VM crash issue.
> > >
> > > below mostly style related comments, otherwise patch looks good to
> > > me
> > > >
> > > > Suggested-by: Igor Mammedov <imamm...@redhat.com>
> > > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > > > Signed-off-by: Zhe Liu <gary.liu...@huawei.com>
> > > > ---
> > > >  hw/virtio/vhost-backend.c | 14 +++
> > > >  hw/virtio/vhost-user.c| 84
> +---
> > > ---
> > > >  hw/virtio/vhost.c | 16 
> > > >  include/hw/virtio/vhost-backend.h |  4 ++
> > > >  4 files changed, 91 insertions(+), 27 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > > index 7f09efa..866718c 100644
> > > > --- a/hw/virtio/vhost-backend.c
> > > > +++ b/hw/virtio/vhost-backend.c
> > > > @@ -15,6 +15,8 @@
> > > >  #include "hw/virtio/vhost-backend.h"
> > > >  #include "qemu/error-report.h"
> > > >
> > > > +static unsigned int vhost_kernel_used_memslots;
> > > > +
> > > >  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long
> > > > int
> > > request,
> > > >   void *arg)  { @@ -233,6 +235,16 @@
> > > > static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > > >  qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL,
> > > > NULL);  }
> > > >
> > > > +static void vhost_kernel_set_used_memslots(struct vhost_dev *dev) {
> > > > +vhost_kernel_used_memslots = dev->mem->nregions; }
> > > > +
> > > > +static unsigned int vhost_kernel_get_used_memslots(void)
> > > > +{
> > > > +return vhost_kernel_used_memslots; }
> > > > +
> > > >  static const VhostOps kernel_ops = {
> > > >  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > > >  .vhost_backend_init = vhost_kernel_init, @@ -264,6 +276,8
> > > > @@ static const VhostOps kernel_ops = {  #endif /*
> CONFIG_VHOST_VSOCK */
> > > >  .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
&

Re: [Qemu-devel] [PATCH v3] vhost: add used memslot number for vhost-user and vhost-kernel separately

2017-12-29 Thread Zhoujian (jay)
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Friday, December 29, 2017 5:31 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; Huangweidong (C) <weidong.hu...@huawei.com>;
> m...@redhat.com; wangxin (U) <wangxinxin.w...@huawei.com>; Liuzhe (Cloud
> Open Labs, NFV) <gary.liu...@huawei.com>; Gonglei (Arei)
> <arei.gong...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v3] vhost: add used memslot number for
> vhost-user and vhost-kernel separately
> 
> On Fri, 29 Dec 2017 10:35:11 +0800
> Jay Zhou <jianjay.z...@huawei.com> wrote:
> 
> > Used_memslots is equal to dev->mem->nregions now, it is true for vhost
> > kernel, but not for vhost user, which uses the memory regions that
> > have file descriptor. In fact, not all of the memory regions have file
> > descriptor.
> > It is usefully in some scenarios, e.g. used_memslots is 8, and only
> > 5 memory slots can be used by vhost user, it is failed to hotplug a
> > new DIMM memory because vhost_has_free_slot just returned false,
> > however we can hotplug it safely in fact.
> >
> > Meanwhile, instead of asserting in vhost_user_set_mem_table(), error
> > number is used to gracefully prevent device to start. This fixed the
> > VM crash issue.
> 
> below mostly style related comments,
> otherwise patch looks good to me
> >
> > Suggested-by: Igor Mammedov <imamm...@redhat.com>
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > Signed-off-by: Zhe Liu <gary.liu...@huawei.com>
> > ---
> >  hw/virtio/vhost-backend.c | 14 +++
> >  hw/virtio/vhost-user.c| 84 +---
> ---
> >  hw/virtio/vhost.c | 16 
> >  include/hw/virtio/vhost-backend.h |  4 ++
> >  4 files changed, 91 insertions(+), 27 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 7f09efa..866718c 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -15,6 +15,8 @@
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "qemu/error-report.h"
> >
> > +static unsigned int vhost_kernel_used_memslots;
> > +
> >  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int
> request,
> >   void *arg)  { @@ -233,6 +235,16 @@
> > static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> >  qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL,
> > NULL);  }
> >
> > +static void vhost_kernel_set_used_memslots(struct vhost_dev *dev) {
> > +vhost_kernel_used_memslots = dev->mem->nregions; }
> > +
> > +static unsigned int vhost_kernel_get_used_memslots(void)
> > +{
> > +return vhost_kernel_used_memslots; }
> > +
> >  static const VhostOps kernel_ops = {
> >  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> >  .vhost_backend_init = vhost_kernel_init, @@ -264,6 +276,8 @@
> > static const VhostOps kernel_ops = {  #endif /* CONFIG_VHOST_VSOCK */
> >  .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> >  .vhost_send_device_iotlb_msg =
> > vhost_kernel_send_device_iotlb_msg,
> > +.vhost_set_used_memslots = vhost_kernel_set_used_memslots,
> > +.vhost_get_used_memslots = vhost_kernel_get_used_memslots,
> >  };
> >
> >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType
> > backend_type) diff --git a/hw/virtio/vhost-user.c
> > b/hw/virtio/vhost-user.c index 093675e..0f913be 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -122,6 +122,8 @@ static VhostUserMsg m __attribute__ ((unused));
> >  /* The version of the protocol we support */
> >  #define VHOST_USER_VERSION(0x1)
> >
> > +static unsigned int vhost_user_used_memslots;
> > +
> >  struct vhost_user {
> >  CharBackend *chr;
> >  int slave_fd;
> > @@ -289,12 +291,53 @@ static int vhost_user_set_log_base(struct
> vhost_dev *dev, uint64_t base,
> >  return 0;
> >  }
> >
> > +static int vhost_user_prepare_msg(struct vhost_dev *dev, VhostUserMsg
> *msg,
> > +  int *fds) {
> > +int r = 0;
> > +int i, fd;
> > +size_t fd_num = 0;
> fd_num is redundant
> you can use msg->payload.memory.nregions as a counter

If using msg->payload.memory.nregions as a counter, referencing
the member of msg->pa

Re: [Qemu-devel] [PATCH v2 1/2] vhost: add used memslot number for vhost-user and vhost-kernel separately

2017-12-28 Thread Zhoujian (jay)


> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Thursday, December 28, 2017 7:12 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; m...@redhat.com; wangxin
> (U) <wangxinxin.w...@huawei.com>; dgilb...@redhat.com; Liuzhe (Cloud Open
> Labs, NFV) <gary.liu...@huawei.com>; qemu-devel@nongnu.org; Gonglei (Arei)
> <arei.gong...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v2 1/2] vhost: add used memslot number
> for vhost-user and vhost-kernel separately
> 
> On Sat, 23 Dec 2017 07:09:51 +
> "Zhoujian (jay)" <jianjay.z...@huawei.com> wrote:
> 
> > Hi Igor,
> >
> > > -Original Message-
> > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > Sent: Saturday, December 23, 2017 3:03 AM
> > > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > > Cc: Huangweidong (C) <weidong.hu...@huawei.com>; m...@redhat.com;
> > > wangxin
> > > (U) <wangxinxin.w...@huawei.com>; qemu-devel@nongnu.org; Liuzhe
> > > (Cloud Open Labs, NFV) <gary.liu...@huawei.com>;
> > > dgilb...@redhat.com; Gonglei
> > > (Arei) <arei.gong...@huawei.com>
> > > Subject: Re: [Qemu-devel] [PATCH v2 1/2] vhost: add used memslot
> > > number for vhost-user and vhost-kernel separately
> > >
> > > On Fri, 22 Dec 2017 17:25:13 +0100
> > > Igor Mammedov <imamm...@redhat.com> wrote:
> > >
> > > > On Fri, 15 Dec 2017 16:45:54 +0800 Jay Zhou
> > > > <jianjay.z...@huawei.com> wrote:
> > > [...]
> > >
> > > > > +static void vhost_user_set_used_memslots(struct vhost_dev *dev) {
> > > > > +int counter = 0;
> > > > > +int i;
> > > > > +
> > > > > +for (i = 0; i < dev->mem->nregions; ++i) {
> > > > > +struct vhost_memory_region *reg = dev->mem->regions + i;
> > > > > +ram_addr_t offset;
> > > > > +MemoryRegion *mr;
> > > > > +int fd;
> > > > > +
> > > > > +assert((uintptr_t)reg->userspace_addr == reg-
> >userspace_addr);
> > > > > +mr = memory_region_from_host((void *)(uintptr_t)reg-
> > > >userspace_addr,
> > > > > +);
> > > > > +fd = memory_region_get_fd(mr);
> > > > > +if (fd > 0) {
> > > > > +counter++;
> > > > > +}
> > > > > +}
> > > > vhost_user_set_mem_table() already does the same counting, there
> > > > is no point in duplicating it, just drop vhost_set_used_memslots
> > > > callback
> > > >
> > > > > +vhost_user_used_memslots = counter;
> > > > and update this value from vhost_user_set_mem_table()
> > > never mind, updating it from vhost_user_set_mem_table() as it's
> > > called only when device is started which is not the case when
> > > backend is initialized, so the way you did it should work for both
> > > cases
> > >
> >
> > How about do it like this(not sure whether the best way, any idea?):
> >
> > Define a new function, e.g.
> >
> > static void vhost_user_set_used_memslots_and_msgs(struct vhost_dev *dev,
> > VhostUserMsg *msg, size_t *fd_num, int
> > *fds)
> s/vhost_user_set_used_memslots_and_msgs/vhost_user_prepare_msg/

Ok

> 
> > {
> > int num = 0;
> > int i;
> >
> > for (i = 0; i < dev->mem->nregions; ++i) {
> > struct vhost_memory_region *reg = dev->mem->regions + i;
> > ram_addr_t offset;
> > MemoryRegion *mr;
> > int fd;
> >
> > assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > mr = memory_region_from_host((void *)(uintptr_t)reg-
> >userspace_addr,
> > );
> > fd = memory_region_get_fd(mr);
> > if (fd > 0) {
> > if (msg && fd_num && fds) {
> > msg->payload.memory.regions[num].userspace_addr
> > = reg->userspace_addr;
> > msg->payload.memory.regions[num].memory_size
> > = reg->memory_size;
> > msg->payload.memory.regions[num].guest

Re: [Qemu-devel] [PATCH v2 2/2] vhost: double check used memslots number

2017-12-23 Thread Zhoujian (jay)


[...]

> > ---
> >  hw/virtio/vhost.c | 27 +++
> >  1 file changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index
> > 59a32e9..e45f5e2 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -1234,6 +1234,18 @@ static void vhost_virtqueue_cleanup(struct
> vhost_virtqueue *vq)
> >  event_notifier_cleanup(>masked_notifier);
> >  }
> >
> > +static bool vhost_dev_used_memslots_is_exceeded(struct vhost_dev
> > +*hdev) {
> > +if (hdev->vhost_ops->vhost_get_used_memslots() >
> > +hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
> > +error_report("vhost backend memory slots limit is less"
> > +" than current number of present memory slots");
> > +return true;
> > +}
> > +
> > +return false;
> > +}
> > +
> >  int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> > VhostBackendType backend_type, uint32_t
> > busyloop_timeout)  { @@ -1252,10 +1264,7 @@ int vhost_dev_init(struct
> > vhost_dev *hdev, void *opaque,
> >  goto fail;
> >  }
> >
> > -if (hdev->vhost_ops->vhost_get_used_memslots() >
> > -hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
> > -error_report("vhost backend memory slots limit is less"
> > -" than current number of present memory slots");
> > +if (vhost_dev_used_memslots_is_exceeded(hdev)) {
> why do you keep this check?
> it seems always be false
> 

If a vhost device has been already added successfully, i.e. its memory
Listener has been registered, i.e.
hdev->vhost_ops->vhost_set_used_memslots() has been called(used_memslots
is updated here),
then if we hotplug another same backend type vhost device,
hdev->vhost_ops->vhost_get_used_memslots() will not be 0(
used_memslots is the same for the same type backend vhost device), so it will
not always be false.

Regards,
Jay



Re: [Qemu-devel] [PATCH v2 2/2] vhost: double check used memslots number

2017-12-23 Thread Zhoujian (jay)


> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Saturday, December 23, 2017 2:49 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; m...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; Gonglei (Arei) <arei.gong...@huawei.com>;
> wangxin (U) <wangxinxin.w...@huawei.com>; Liuzhe (Cloud Open Labs, NFV)
> <gary.liu...@huawei.com>; dgilb...@redhat.com
> Subject: Re: [PATCH v2 2/2] vhost: double check used memslots number
> 
> On Fri, 15 Dec 2017 16:45:55 +0800
> Jay Zhou <jianjay.z...@huawei.com> wrote:
> 
> > If the VM already has N(N>8) available memory slots for vhost user,
> > the VM will be crashed in vhost_user_set_mem_table if we try to
> > hotplug the first vhost user NIC.
> > This patch checks if memslots number exceeded or not after updating
> > vhost_user_used_memslots.
> Can't understand commit message, pls rephrase (what is being fixed, and
> how it's fixed) also include reproducing steps for crash and maybe
> describe call flow/backtrace that triggers crash.

Sorry about my pool english

> 
> PS:
> I wasn't able to reproduce crash

Steps to reproduce:
(1) start up a VM successfully without any vhost device
(2) hotplug 8 DIMM memory successfully
(3) hotplug a vhost-user NIC, the VM crashed, it asserted
at the line
assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
in vhost_user_set_mem_table()

Regards,
Jay

> >
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > ---
> >  hw/virtio/vhost.c | 27 +++
> >  1 file changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index
> > 59a32e9..e45f5e2 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -1234,6 +1234,18 @@ static void vhost_virtqueue_cleanup(struct
> vhost_virtqueue *vq)
> >  event_notifier_cleanup(>masked_notifier);
> >  }
> >
> > +static bool vhost_dev_used_memslots_is_exceeded(struct vhost_dev
> > +*hdev) {
> > +if (hdev->vhost_ops->vhost_get_used_memslots() >
> > +hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
> > +error_report("vhost backend memory slots limit is less"
> > +" than current number of present memory slots");
> > +return true;
> > +}
> > +
> > +return false;
> > +}
> > +
> >  int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> > VhostBackendType backend_type, uint32_t
> > busyloop_timeout)  { @@ -1252,10 +1264,7 @@ int vhost_dev_init(struct
> > vhost_dev *hdev, void *opaque,
> >  goto fail;
> >  }
> >
> > -if (hdev->vhost_ops->vhost_get_used_memslots() >
> > -hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
> > -error_report("vhost backend memory slots limit is less"
> > -" than current number of present memory slots");
> > +if (vhost_dev_used_memslots_is_exceeded(hdev)) {
> why do you keep this check?
> it seems always be false
> 
> 
> 
> >  r = -1;
> >  goto fail;
> >  }
> > @@ -1341,6 +1350,16 @@ int vhost_dev_init(struct vhost_dev *hdev, void
> *opaque,
> >  hdev->memory_changed = false;
> >  memory_listener_register(>memory_listener,
> _space_memory);
> >  QLIST_INSERT_HEAD(_devices, hdev, entry);
> > +
> > +if (vhost_dev_used_memslots_is_exceeded(hdev)) {
> > +r = -1;
> > +if (busyloop_timeout) {
> > +goto fail_busyloop;
> > +} else {
> > +goto fail;
> > +}
> > +}
> seem to be right thing to do, since after registering listener for the
> first time used_memslots will be updated to actual value.
> 
> 
> I did some testing and without this hunk/patch
> 
> on 'device_add  virtio-net-pci,netdev=net0' qemu prints:
> 
> qemu-system-x86_64: vhost_set_mem_table failed: Argument list too long (7)
> qemu-system-x86_64: unable to start vhost net: 7: falling back on
> userspace virtio
> 
> and network is operational in guest, but with this patch
> 
> "netdev_add ...,vhost-on" prints:
> 
> vhost backend memory slots limit is less than current number of present
> memory slots vhost-net requested but could not be initialized
> 
> and following "device_add  virtio-net-pci,netdev=net0" prints:
> 
> TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD ioctl()
> failed: Bad file descriptor
> 
> adapter is still hot-plugged but guest networking is broken (can't get IP
> address via DHCP)
> 
> so patch seems introduces a regression or something broken elsewhere and
> this exposes issue, not sure what qemu reaction should be in this case
> i.e. when netdev_add fails
> 1: should we fail followed up device_add or
> 2: make it fall back to userspace virtio
> 
> I'd go for #2,
> Michael what's your take on it?
> 
> > +
> >  return 0;
> >
> >  fail_busyloop:




Re: [Qemu-devel] [PATCH v2 1/2] vhost: add used memslot number for vhost-user and vhost-kernel separately

2017-12-22 Thread Zhoujian (jay)
Hi Igor,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Saturday, December 23, 2017 3:03 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: Huangweidong (C) <weidong.hu...@huawei.com>; m...@redhat.com; wangxin
> (U) <wangxinxin.w...@huawei.com>; qemu-devel@nongnu.org; Liuzhe (Cloud
> Open Labs, NFV) <gary.liu...@huawei.com>; dgilb...@redhat.com; Gonglei
> (Arei) <arei.gong...@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v2 1/2] vhost: add used memslot number
> for vhost-user and vhost-kernel separately
> 
> On Fri, 22 Dec 2017 17:25:13 +0100
> Igor Mammedov <imamm...@redhat.com> wrote:
> 
> > On Fri, 15 Dec 2017 16:45:54 +0800
> > Jay Zhou <jianjay.z...@huawei.com> wrote:
> [...]
> 
> > > +static void vhost_user_set_used_memslots(struct vhost_dev *dev) {
> > > +int counter = 0;
> > > +int i;
> > > +
> > > +for (i = 0; i < dev->mem->nregions; ++i) {
> > > +struct vhost_memory_region *reg = dev->mem->regions + i;
> > > +ram_addr_t offset;
> > > +MemoryRegion *mr;
> > > +int fd;
> > > +
> > > +assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
> > > +mr = memory_region_from_host((void *)(uintptr_t)reg-
> >userspace_addr,
> > > +);
> > > +fd = memory_region_get_fd(mr);
> > > +if (fd > 0) {
> > > +counter++;
> > > +}
> > > +}
> > vhost_user_set_mem_table() already does the same counting, there is no
> > point in duplicating it, just drop vhost_set_used_memslots callback
> >
> > > +vhost_user_used_memslots = counter;
> > and update this value from vhost_user_set_mem_table()
> never mind, updating it from vhost_user_set_mem_table() as it's called
> only when device is started which is not the case when backend is
> initialized, so the way you did it should work for both cases
> 

How about do it like this(not sure whether the best way, any idea?):

Define a new function, e.g.

static void vhost_user_set_used_memslots_and_msgs(struct vhost_dev *dev,
VhostUserMsg *msg, size_t *fd_num, int *fds)
{
int num = 0;
int i;

for (i = 0; i < dev->mem->nregions; ++i) {
struct vhost_memory_region *reg = dev->mem->regions + i;
ram_addr_t offset;
MemoryRegion *mr;
int fd;

assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
);
fd = memory_region_get_fd(mr);
if (fd > 0) {
if (msg && fd_num && fds) {
msg->payload.memory.regions[num].userspace_addr
= reg->userspace_addr;
msg->payload.memory.regions[num].memory_size
= reg->memory_size;
msg->payload.memory.regions[num].guest_phys_addr
= reg->guest_phys_addr;
msg->payload.memory.regions[num].mmap_offset = offset;
assert(num < VHOST_MEMORY_MAX_NREGIONS);
fds[num++] = fd;
}
}
}
vhost_user_used_memslots = num;
if (fd_num)
*fd_num = num;
}

And call it when the backend is initialized and the device is started
respectively,

static void vhost_user_set_used_memslots(struct vhost_dev *dev)
{
vhost_user_set_used_memslots_and_msgs(dev, NULL, NULL, NULL);
}

static int vhost_user_set_mem_table(struct vhost_dev *dev,
struct vhost_memory *mem)
{
[...]

vhost_user_set_used_memslots_and_msgs(dev, , _num, fds);

[...]
}


Regards,
Jay

> 
> >
> > > +}
> > > +
> [...]



Re: [Qemu-devel] [PATCH v2 1/2] vhost: add used memslot number for vhost-user and vhost-kernel separately

2017-12-22 Thread Zhoujian (jay)
> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Saturday, December 23, 2017 12:25 AM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: qemu-devel@nongnu.org; m...@redhat.com; Huangweidong (C)
> <weidong.hu...@huawei.com>; Gonglei (Arei) <arei.gong...@huawei.com>;
> wangxin (U) <wangxinxin.w...@huawei.com>; Liuzhe (Cloud Open Labs, NFV)
> <gary.liu...@huawei.com>; dgilb...@redhat.com
> Subject: Re: [PATCH v2 1/2] vhost: add used memslot number for vhost-user
> and vhost-kernel separately
> 
> On Fri, 15 Dec 2017 16:45:54 +0800
> Jay Zhou <jianjay.z...@huawei.com> wrote:
> 
> > Used_memslots is equal to dev->mem->nregions now, it is true for vhost
> > kernel, but not for vhost user, which uses the memory regions that
> > have file descriptor. In fact, not all of the memory regions have file
> > descriptor.
> > It is usefully in some scenarios, e.g. used_memslots is 8, and only
> > 5 memory slots can be used by vhost user, it is failed to hot plug a
> > new memory RAM because vhost_has_free_slot just returned false, but we
> > can hot plug it safely in fact.
> 
> a bit of rant:
> it seems vhost is chasing after a tail (1-2 regions that are not actually
> RAM), while approach will allow to hotplug 1-2 extra dimms vhost-user
> still will be limited to 8 regions max.
> 
> 
> vhost-user protocol should be extended to allow qemu query limit value
> from backend (which in turn should be tunable), so that backend could be
> configured to meet user needs.

Michael mentioned it can be extended with a protocol flag, here is the 
discussion
thread:
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html

> 
> Jay,
> would you like to look into it?

Yes, will look into it.

> 
> Otherwise it looks like patch will do its job, see for a minor suggestion
> below.
> 
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > Signed-off-by: Zhe Liu <gary.liu...@huawei.com>
> > ---
> >  hw/virtio/vhost-backend.c | 14 ++
> >  hw/virtio/vhost-user.c| 31 +++
> >  hw/virtio/vhost.c | 16 +---
> >  include/hw/virtio/vhost-backend.h |  4 
> >  4 files changed, 58 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 7f09efa..866718c 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -15,6 +15,8 @@
> >  #include "hw/virtio/vhost-backend.h"
> >  #include "qemu/error-report.h"
> >
> > +static unsigned int vhost_kernel_used_memslots;
> > +
> >  static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int
> request,
> >   void *arg)  { @@ -233,6 +235,16 @@
> > static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> >  qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL,
> > NULL);  }
> >
> > +static void vhost_kernel_set_used_memslots(struct vhost_dev *dev) {
> > +vhost_kernel_used_memslots = dev->mem->nregions; }
> > +
> > +static unsigned int vhost_kernel_get_used_memslots(void)
> > +{
> > +return vhost_kernel_used_memslots; }
> > +
> >  static const VhostOps kernel_ops = {
> >  .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> >  .vhost_backend_init = vhost_kernel_init, @@ -264,6 +276,8 @@
> > static const VhostOps kernel_ops = {  #endif /* CONFIG_VHOST_VSOCK */
> >  .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
> >  .vhost_send_device_iotlb_msg =
> > vhost_kernel_send_device_iotlb_msg,
> > +.vhost_set_used_memslots = vhost_kernel_set_used_memslots,
> > +.vhost_get_used_memslots = vhost_kernel_get_used_memslots,
> >  };
> >
> >  int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType
> > backend_type) diff --git a/hw/virtio/vhost-user.c
> > b/hw/virtio/vhost-user.c index 093675e..58a4cec 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -122,6 +122,8 @@ static VhostUserMsg m __attribute__ ((unused));
> >  /* The version of the protocol we support */
> >  #define VHOST_USER_VERSION(0x1)
> >
> > +static unsigned int vhost_user_used_memslots;
> > +
> >  struct vhost_user {
> >  CharBackend *chr;
> >  int slave_fd;
> > @@ -922,6 +924,33 @@ static void vhost_user_set_iotlb_callback(struct
> vhost_dev *dev, int enabled)
>

Re: [Qemu-devel] [RFC PATCH] vhost-user: quit infinite loop while used memslots is more than the backend limit

2017-12-19 Thread Zhoujian (jay)


> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Tuesday, December 19, 2017 5:24 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org
> Cc: m...@redhat.com; Gonglei (Arei) <arei.gong...@huawei.com>;
> marcandre.lur...@redhat.com; Huangweidong (C) <weidong.hu...@huawei.com>;
> wangxin (U) <wangxinxin.w...@huawei.com>
> Subject: Re: [RFC PATCH] vhost-user: quit infinite loop while used
> memslots is more than the backend limit
> 
> 
> 
> On 2017年12月13日 15:04, Jay Zhou wrote:
> > Steps to 100% reproduce:
> > (1) start a VM with two VFIO 82599 PCI NICs successfully
> > (2) hot plug a RAM, which is attached successfully
> > (3) hot plug another RAM, which is attached successfully
> > (4) hot plug a vhost-user NIC, which is attached successfully
> > (5) hot plug another vhost-user NIC, which is hanged
> >
> > then the qemu process infinitely and quickly print the logs below:
> >  [...]
> >  vhost backend memory slots limit is less than current number of
> present memory slots
> >  failed to init vhost_net for queue 0
> >  vhost backend memory slots limit is less than current number of
> present memory slots
> >  failed to init vhost_net for queue 0
> >  vhost backend memory slots limit is less than current number of
> present memory slots
> >  failed to init vhost_net for queue 0
> >  [...]
> >
> > and the cpu utilization of the systemd-journal process on host is
> > 100%, the reason is clear:
> >  "used_memslots > hdev->vhost_ops-
> >vhost_backend_memslots_limit(hdev)"
> > is true in vhost_dev_init, and the main thread in the infinite do and
> > while loop in net_vhost_user_init.
> >
> > One way is to increase the vhost backend memory slots limit, some
> > discussions are here:
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04572.html
> >
> > I think if
> >  "used_memslots > hdev->vhost_ops-
> >vhost_backend_memslots_limit(hdev)"
> > is true, it does not make sense to try to initialize the vhost-user
> > again and again, because the main thread can not handle other qmps at
> > all, which means it can not reduce the number of memslots, it must be
> > failed at last.
> 
> You should include the reason why we hit infinite loop in the commit log,
> and what's more important, is this issue specific to memslots only?
> What if you hit errors other than this in vhost_dev_init() for vhost-user?

If the new hot plugged vhost-user NIC failed to initialize in vhost_dev_init,
the s->started in net_vhost_user_init is always false, so we hit infinite loop
in net_vhost_user_init().
It is not specific to memslots only, e.g. if the backend lacks some feature
masks, it will hit infinite loop too.

Is it available to identify the cases that MAY be failed and MUST be failed
at initialization?
- If it must be failed, quit the loop at once.
- If it may be failed, then we should do the loop to wait the connection
  established, i.e. wait the s->started becomes true. And some sleep time
  may be added to reduce the CPU utilization.

> Is this better to disable event source in this case?

The "event source" you mean is the logs printed?
If it is disabled, the reason why initialization failed is not explicit,
this is the drawback.


Regards,
Jay

> 
> Thanks
> 
> >
> > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > ---
> >   hw/virtio/vhost.c | 3 +++
> >   include/hw/virtio/vhost.h | 2 ++
> >   net/vhost-user.c  | 5 +
> >   3 files changed, 10 insertions(+)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index
> > e4290ce..1cc4a18 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -47,6 +47,8 @@ static unsigned int used_memslots;
> >   static QLIST_HEAD(, vhost_dev) vhost_devices =
> >   QLIST_HEAD_INITIALIZER(vhost_devices);
> >
> > +bool used_memslots_exceeded;
> > +
> >   bool vhost_has_free_slot(void)
> >   {
> >   unsigned int slots_limit = ~0U;
> > @@ -1254,6 +1256,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void
> *opaque,
> >   if (used_memslots > hdev->vhost_ops-
> >vhost_backend_memslots_limit(hdev)) {
> >   error_report("vhost backend memory slots limit is less"
> >   " than current number of present memory slots");
> > +used_memslots_exceeded = true;
> >   r = -1;
> >   goto fail;
> >   }
> > diff -

Re: [Qemu-devel] [PATCH] docs/devel/migration.txt: keep functions consistent with the code

2017-12-17 Thread Zhoujian (jay)
> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Friday, December 15, 2017 11:15 PM
> To: Daniel P. Berrange <berra...@redhat.com>
> Cc: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org;
> quint...@redhat.com; Huangweidong (C) <weidong.hu...@huawei.com>; wangxin
> (U) <wangxinxin.w...@huawei.com>
> Subject: Re: [PATCH] docs/devel/migration.txt: keep functions consistent
> with the code
> 
> * Daniel P. Berrange (berra...@redhat.com) wrote:
> > On Tue, Dec 05, 2017 at 07:58:16PM +, Dr. David Alan Gilbert wrote:
> > > * Jay Zhou (jianjay.z...@huawei.com) wrote:
> > > > Since the commit 11808bb0c422134bf09119f4aa22c59b0ce84bf3 removed
> > > > the put_buffer callback and using an iovec based write handler
> > > > instead, the docs should be sync with the code too.
> > > >
> > > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com>
> > >
> > > Lets check with Dan (added to cc) since he wrote 11808bb; it might
> > > be best just to rever to migration/qemu-file.h for an explanation of
> > > each function.
> >
> > The updates look ok, but I tend to think this entire section of
> > migrate.txt should be deleted, in favour of the inline APIs
> > docs in mijgration/qemu-file.h   As a developer the header file
> > is where I would look for this kind of API description. The
> > migration.txt can just point epople to the header file for API docs.

Yes, agreed.

> 
> OK, I'll do that on top of my rework to rst.
> 

It's OK for me. :)

Regards,
Jay



Re: [Qemu-devel] [PATCH 0/2] vhost: two fixes

2017-12-14 Thread Zhoujian (jay)
Hi Michael,

> -Original Message-
> From: Zhoujian (jay)
> Sent: Friday, December 15, 2017 12:52 PM
> To: 'Michael S. Tsirkin' <m...@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>; qemu-devel@nongnu.org;
> Huangweidong (C) <weidong.hu...@huawei.com>; Gonglei (Arei)
> <arei.gong...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>;
> Liuzhe (Cloud Open Labs, NFV) <gary.liu...@huawei.com>; Igor Mammedov
> <imamm...@redhat.com>
> Subject: RE: [PATCH 0/2] vhost: two fixes
> 
> Hi Michael,
> 
> > -Original Message-
> > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > Sent: Friday, December 15, 2017 12:36 PM
> > To: Zhoujian (jay) <jianjay.z...@huawei.com>
> > Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>;
> > qemu-devel@nongnu.org; Huangweidong (C) <weidong.hu...@huawei.com>;
> > Gonglei (Arei) <arei.gong...@huawei.com>; wangxin (U)
> > <wangxinxin.w...@huawei.com>; Liuzhe (Cloud Open Labs, NFV)
> > <gary.liu...@huawei.com>; Igor Mammedov <imamm...@redhat.com>
> > Subject: Re: [PATCH 0/2] vhost: two fixes
> >
> > On Fri, Dec 15, 2017 at 02:38:35AM +, Zhoujian (jay) wrote:
> > > Hi Dave,
> > >
> > > > -Original Message-
> > > > From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> > > > Sent: Friday, December 15, 2017 3:49 AM
> > > > To: Michael S. Tsirkin <m...@redhat.com>
> > > > Cc: Zhoujian (jay) <jianjay.z...@huawei.com>;
> > > > qemu-devel@nongnu.org; Huangweidong (C)
> > > > <weidong.hu...@huawei.com>; Gonglei (Arei)
> > > > <arei.gong...@huawei.com>; wangxin (U)
> > > > <wangxinxin.w...@huawei.com>; Liuzhe (Cloud Open Labs, NFV)
> > > > <gary.liu...@huawei.com>; Igor Mammedov <imamm...@redhat.com>
> > > > Subject: Re: [PATCH 0/2] vhost: two fixes
> > > >
> > > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > > On Fri, Dec 15, 2017 at 12:36:30AM +0800, Jay Zhou wrote:
> > > > > > Jay Zhou (2):
> > > > > >   vhost: add used memslot number for vhost-user
> > > > > >   vhost: double check memslot number
> > > > > >
> > > > > >  hw/virtio/vhost-user.c| 31
> +
> > > > > >  hw/virtio/vhost.c | 49
> > > > ++-
> > > > > >  include/hw/virtio/vhost-backend.h |  4 
> > > > > >  3 files changed, 78 insertions(+), 6 deletions(-)
> > > > >
> > > > > Cc two developers working on these files right now.
> > > >
> > > > I have to admit to not understanding the 'used_memslots' variable.
> > > >
> > > > * It's a global in vhost.c
> > > > * but set by vhost_set_memory that's called from the listener
> > associated
> > > >   with each individual vhost
> > > > * While they're probably always the same, the merging code calls
> > > >   the vhost_backend_can_merge method for each device, so the number
> > > >   of regions can be different.
> > > >
> > >
> > > Your mean for some devices the new added MemoryRegionSection can be
> > > merged, but for others it can not be merged? IIUC the vhost_mem for
> > > each vhost_dev is the same.
> > >
> > > Meanwhile, I think it is more reasonable to add globals in
> > > vhost-backend.c and vhost-user.c respectively instead of
> > > 'used_memslots'. The reason is explained in patch 1. What do you think?
> > >
> > > Regards,
> > > Jay
> >
> > I'd rather avoid globals completely if possible.
> >
> 
> It is possible, we could add a 'used_memslots' variable in struct
> vhost_dev for per device. I will try to do it in v2.
> 

If the globals don't exist, the disadvantage I found is that the check
"if memslots number exceeds" will be moved from the beginning to the end
in vhost_dev_init, does it acceptable? Or are there other ideas to avoid
globals?

To be honest, I prefer to add globals in vhost-backend.c and vhost-user.c
respectively, the value of used_memslots for the same type of backend is the
same.

Regards,
Jay



Re: [Qemu-devel] [PATCH 0/2] vhost: two fixes

2017-12-14 Thread Zhoujian (jay)
Hi Michael,

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Friday, December 15, 2017 12:36 PM
> To: Zhoujian (jay) <jianjay.z...@huawei.com>
> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>; qemu-devel@nongnu.org;
> Huangweidong (C) <weidong.hu...@huawei.com>; Gonglei (Arei)
> <arei.gong...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>;
> Liuzhe (Cloud Open Labs, NFV) <gary.liu...@huawei.com>; Igor Mammedov
> <imamm...@redhat.com>
> Subject: Re: [PATCH 0/2] vhost: two fixes
> 
> On Fri, Dec 15, 2017 at 02:38:35AM +, Zhoujian (jay) wrote:
> > Hi Dave,
> >
> > > -Original Message-
> > > From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> > > Sent: Friday, December 15, 2017 3:49 AM
> > > To: Michael S. Tsirkin <m...@redhat.com>
> > > Cc: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org;
> > > Huangweidong (C) <weidong.hu...@huawei.com>; Gonglei (Arei)
> > > <arei.gong...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>;
> > > Liuzhe (Cloud Open Labs, NFV) <gary.liu...@huawei.com>; Igor
> > > Mammedov <imamm...@redhat.com>
> > > Subject: Re: [PATCH 0/2] vhost: two fixes
> > >
> > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > On Fri, Dec 15, 2017 at 12:36:30AM +0800, Jay Zhou wrote:
> > > > > Jay Zhou (2):
> > > > >   vhost: add used memslot number for vhost-user
> > > > >   vhost: double check memslot number
> > > > >
> > > > >  hw/virtio/vhost-user.c| 31 +
> > > > >  hw/virtio/vhost.c | 49
> > > ++-
> > > > >  include/hw/virtio/vhost-backend.h |  4 
> > > > >  3 files changed, 78 insertions(+), 6 deletions(-)
> > > >
> > > > Cc two developers working on these files right now.
> > >
> > > I have to admit to not understanding the 'used_memslots' variable.
> > >
> > > * It's a global in vhost.c
> > > * but set by vhost_set_memory that's called from the listener
> associated
> > >   with each individual vhost
> > > * While they're probably always the same, the merging code calls
> > >   the vhost_backend_can_merge method for each device, so the number
> > >   of regions can be different.
> > >
> >
> > Your mean for some devices the new added MemoryRegionSection can be
> > merged, but for others it can not be merged? IIUC the vhost_mem for
> > each vhost_dev is the same.
> >
> > Meanwhile, I think it is more reasonable to add globals in
> > vhost-backend.c and vhost-user.c respectively instead of
> > 'used_memslots'. The reason is explained in patch 1. What do you think?
> >
> > Regards,
> > Jay
> 
> I'd rather avoid globals completely if possible.
> 

It is possible, we could add a 'used_memslots' variable in struct vhost_dev
for per device. I will try to do it in v2.

Regards,
Jay



Re: [Qemu-devel] [PATCH 0/2] vhost: two fixes

2017-12-14 Thread Zhoujian (jay)
Hi Dave,

> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Friday, December 15, 2017 3:49 AM
> To: Michael S. Tsirkin <m...@redhat.com>
> Cc: Zhoujian (jay) <jianjay.z...@huawei.com>; qemu-devel@nongnu.org;
> Huangweidong (C) <weidong.hu...@huawei.com>; Gonglei (Arei)
> <arei.gong...@huawei.com>; wangxin (U) <wangxinxin.w...@huawei.com>;
> Liuzhe (Cloud Open Labs, NFV) <gary.liu...@huawei.com>; Igor Mammedov
> <imamm...@redhat.com>
> Subject: Re: [PATCH 0/2] vhost: two fixes
> 
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Fri, Dec 15, 2017 at 12:36:30AM +0800, Jay Zhou wrote:
> > > Jay Zhou (2):
> > >   vhost: add used memslot number for vhost-user
> > >   vhost: double check memslot number
> > >
> > >  hw/virtio/vhost-user.c| 31 +
> > >  hw/virtio/vhost.c | 49
> ++-
> > >  include/hw/virtio/vhost-backend.h |  4 
> > >  3 files changed, 78 insertions(+), 6 deletions(-)
> >
> > Cc two developers working on these files right now.
> 
> I have to admit to not understanding the 'used_memslots' variable.
> 
> * It's a global in vhost.c
> * but set by vhost_set_memory that's called from the listener associated
>   with each individual vhost
> * While they're probably always the same, the merging code calls
>   the vhost_backend_can_merge method for each device, so the number
>   of regions can be different.
> 

Your mean for some devices the new added MemoryRegionSection can be merged,
but for others it can not be merged? IIUC the vhost_mem for each vhost_dev
is the same.

Meanwhile, I think it is more reasonable to add globals in vhost-backend.c
and vhost-user.c respectively instead of 'used_memslots'. The reason
is explained in patch 1. What do you think?

Regards,
Jay



Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-16 Thread Zhoujian (jay)
Hi Wanpeng,

> > On 11/05/2017 14:07, Zhoujian (jay) wrote:
> >> -* Scan sptes if dirty logging has been stopped, dropping those
> >> -* which can be collapsed into a single large-page spte.  Later
> >> -* page faults will create the large-page sptes.
> >> +* Reset each vcpu's mmu, then page faults will create the
> large-page
> >> +* sptes later.
> >>  */
> >> if ((change != KVM_MR_DELETE) &&
> >> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> >> -   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >> -   kvm_mmu_zap_collapsible_sptes(kvm, new);
> 
> This is an unlikely branch(unless guest live migration fails and continue
> to run on the source machine) instead of hot path, do you have any
> performance number for your real workloads?
> 

Sorry to bother you again.

Recently, I have tested the performance before migration and after migration 
failure
using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard 
performance
evaluation tool.

These are the results:
**
Before migration the score is 153, and the TLB miss statistics of the qemu 
process is:
linux-sjrfac:/mnt/zhoujian # perf stat -e 
dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10

Performance counter stats for process id '26463':

   698,938  dTLB-load-misses  #0.13% of all dTLB cache 
hits   (50.46%)
   543,303,875  dTLB-loads  
  (50.43%)
   199,597  dTLB-store-misses   
  (16.51%)
60,128,561  dTLB-stores 
  (16.67%)
69,986  iTLB-load-misses  #6.17% of all iTLB cache 
hits   (16.67%)
 1,134,097  iTLB-loads  
  (33.33%)

  10.000684064 seconds time elapsed

After migration failure the score is 149, and the TLB miss statistics of 
the qemu process is:
linux-sjrfac:/mnt/zhoujian # perf stat -e 
dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10

Performance counter stats for process id '26463':

   765,400  dTLB-load-misses  #0.14% of all dTLB cache 
hits   (50.50%)
   540,972,144  dTLB-loads  
  (50.47%)
   207,670  dTLB-store-misses   
  (16.50%)
58,363,787  dTLB-stores 
  (16.67%)
   109,772  iTLB-load-misses  #9.52% of all iTLB cache 
hits   (16.67%)
 1,152,784  iTLB-loads  
  (33.32%)

  10.000703078 seconds time elapsed
**

These are the steps:
==
 (1) the version of kmod is 4.4.11(with slightly modified) and the version of 
qemu is 2.6.0
(with slightly modified), the kmod is applied with the following patch 
according to
Paolo's advice:

diff --git a/source/x86/x86.c b/source/x86/x86.c
index 054a7d3..75a4bb3 100644
--- a/source/x86/x86.c
+++ b/source/x86/x86.c
@@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 */
if ((change != KVM_MR_DELETE) &&
(old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-   kvm_mmu_zap_collapsible_sptes(kvm, new);
+   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+   printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   }
 
/*
 * Set up write protection and/or dirty logging for the new slot.

(2) I started up a memory preoccupied 10G VM(suse11sp3), which means its "RES 
column" in top is 10G,
in order to set up the EPT table in advance.
(3) And then, I run the test case 429.mcf of spec cpu2006 before migration and 
after migration failure.
The 429.mcf is a memory intensive workload, and the migration failure is 
constructed deliberately
with the following patch of qemu:

diff --git a/migration/migration.c b/migration/migration.c
index 5d725d0..88dfc59 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -625,6 +625,9 @@ static void process_incoming_migration_co(void *opaque)
   MIGRATION_STATUS_ACTIVE);
 ret = qemu_loadvm_state(f);
 
+// deliberately construct the migration failure
+exit(EXIT_FAILURE); 
+
 ps = postcopy_state_get();
 trace_process_incoming_migration_co_end(ret, ps);
 if (ps != POSTCOPY_INC

Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-11 Thread Zhoujian (jay)
Hi Wanpeng,

> 2017-05-11 21:43 GMT+08:00 Wanpeng Li <kernel...@gmail.com>:
> > 2017-05-11 20:24 GMT+08:00 Paolo Bonzini <pbonz...@redhat.com>:
> >>
> >>
> >> On 11/05/2017 14:07, Zhoujian (jay) wrote:
> >>> -* Scan sptes if dirty logging has been stopped, dropping
> those
> >>> -* which can be collapsed into a single large-page spte.
> Later
> >>> -* page faults will create the large-page sptes.
> >>> +* Reset each vcpu's mmu, then page faults will create the
> large-page
> >>> +* sptes later.
> >>>  */
> >>> if ((change != KVM_MR_DELETE) &&
> >>> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> >>> -   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >>> -   kvm_mmu_zap_collapsible_sptes(kvm, new);
> >
> > This is an unlikely branch(unless guest live migration fails and
> > continue to run on the source machine) instead of hot path, do you
> > have any performance number for your real workloads?
> 
> I find the original discussion by google.
> https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg04143.html
> You will not go to this branch if the guest live migration successfully.

 In our tests, this branch is taken when living migration is successful.
 AFAIK, the kmod does not know whether living migration successful or not
 when dealing with KVM_SET_USER_MEMORY_REGION ioctl. Do I miss something?

Regards,
Jay Zhou


Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-11 Thread Zhoujian (jay)
Hi all,

After applying the patch below, the time which
memory_global_dirty_log_stop() function takes is down to milliseconds
of a 4T memory guest, but I'm not sure whether this patch will trigger
other problems. Does this patch make sense?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 464da93..fe26ee5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8313,6 +8313,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
enum kvm_mr_change change)
 {
int nr_mmu_pages = 0;
+   int i;
+   struct kvm_vcpu *vcpu;
 
if (!kvm->arch.n_requested_mmu_pages)
nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
@@ -8328,14 +8330,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 * in the source machine (for example if live migration fails), small
 * sptes will remain around and cause bad performance.
 *
-* Scan sptes if dirty logging has been stopped, dropping those
-* which can be collapsed into a single large-page spte.  Later
-* page faults will create the large-page sptes.
+* Reset each vcpu's mmu, then page faults will create the large-page
+* sptes later.
 */
if ((change != KVM_MR_DELETE) &&
(old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-   kvm_mmu_zap_collapsible_sptes(kvm, new);
+   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_mmu_reset_context(vcpu);
+   }
 
/*
 * Set up write protection and/or dirty logging for the new slot.

> * Yang Hongyang (yanghongy...@huawei.com) wrote:
> >
> >
> > On 2017/4/24 20:06, Juan Quintela wrote:
> > > Yang Hongyang  wrote:
> > >> Hi all,
> > >>
> > >> We found dirty log switch costs more then 13 seconds while
> > >> migrating a 4T memory guest, and dirty log switch is currently
> > >> protected by QEMU BQL. This causes guest freeze for a long time
> > >> when switching dirty log on, and the migration downtime is
> unacceptable.
> > >> Are there any chance to optimize the time cost for dirty log switch
> operation?
> > >> Or move the time consuming operation out of the QEMU BQL?
> > >
> > > Hi
> > >
> > > Could you specify what do you mean by dirty log switch?
> > > The one inside kvm?
> > > The merge between kvm one and migration bitmap?
> >
> > The call of the following functions:
> > memory_global_dirty_log_start/stop();
> 
> I suppose there's a few questions;
>   a) Do we actually need the BQL - and if so why
>   b) What actually takes 13s?  It's probably worth figuring out where it
> goes,  the whole bitmap is only 1GB isn't it even on a 4TB machine, and
> even the simplest way to fill that takes way less than 13s.
> 
> Dave
> 
> >
> > >
> > > Thanks, Juan.
> > >
> >
> > --
> > Thanks,
> > Yang
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Regards,
Jay Zhou