Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-26 Thread Tan, Jianfeng


> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Monday, February 26, 2018 10:43 PM
> To: Igor Mammedov; Tan, Jianfeng
> Cc: Jason Wang; Maxime Coquelin; qemu-devel@nongnu.org; Michael S .
> Tsirkin
> Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
> migration
> 
> On 26/02/2018 13:55, Igor Mammedov wrote:
> >>> So how about just adding a new option --mem-share to decide if that's a
> >>> private memory or shared memory? That seems much straightforward
> way
> > Above options are legacy (which we can't remove for compat reasons),
> > their replacement is 'memory-backend-file' backend which has all of
> > the above including 'share' property.
> 
> More precisely, we have added "-object memory-backend-file" to avoid
> proliferation of options related to memory.  Besides unifying the cases
> of 1 and >1 NUMA node, using -object also has the advantage of
> supporting memory hotplug.
> 
> You wrote "I find adding a backend for nonnuma pc RAM is roundabout way"
> but basically the command line says "this VM has only one NUMA node,
> backed by this memory object" which is a precise description of what the
> VM memory looks like.
> 
> > So just add 'memdev' property to machine and reuse memory-backend-file
> > with it instead of duplicating functionality in the legacy code.
> 
> That would however also have a different RAMBlock id, effectively
> producing the same output as "-numa node,memdev=...".

Is it possible that we force that RAMBlock id to be "pc.ram"?

> 
> I think this should be solved at the libvirt level.  Libvirt should
> write in the migration XML cookie whether the VM is using -object or
> -mem-path to declare its memory, and newly-started VMs should always use
> -object.  This won't fix the problem for VMs that are already running,
> but it will fix it the next time they are started.

In this case, we return to the start point :-)

Thanks,
Jianfeng


Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-26 Thread Tan, Jianfeng


> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Monday, February 26, 2018 8:56 PM
> To: Tan, Jianfeng
> Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-devel@nongnu.org;
> Michael S . Tsirkin
> Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
> migration
> 
> On Sat, 24 Feb 2018 03:11:30 +
> "Tan, Jianfeng" <jianfeng@intel.com> wrote:
> 
> > > -Original Message-
> > > From: Tan, Jianfeng
> > > Sent: Saturday, February 24, 2018 11:08 AM
> > > To: 'Igor Mammedov'
> > > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-
> de...@nongnu.org;
> > > Michael S . Tsirkin
> > > Subject: RE: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
> > > migration
> > >
> > > Hi Igor and all,
> > >
> > > > -Original Message-
> > > > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > > > Sent: Thursday, February 8, 2018 7:30 PM
> > > > To: Tan, Jianfeng
> > > > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-
> > > de...@nongnu.org;
> > > > Michael S . Tsirkin
> > > > Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
> > > > migration
> > > >
> > > [...]
> > > > > > It could be solved by adding memdev option to machine,
> > > > > > which would allow to specify backend object. And then on
> > > > > > top make -mem-path alias new option to clean thing up.
> > > > >
> > > > > Do you mean?
> > > > >
> > > > > src vm: -m xG
> > > > > dst vm: -m xG,memdev=pc.ram -object memory-backend-
> > > file,id=pc.ram,size=xG,mem-path=xxx,share=on ...
> > > > Yep, I've meant something like it
> > > >
> > > > src vm: -m xG,memdev=SHARED_RAM -object memory-backend-
> > > file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on
> > > > dst vm: -m xG,memdev=SHARED_RAM -object memory-backend-
> > > file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on
> > >
> > > After a second thought, I find adding a backend for nonnuma pc RAM is
> > > roundabout way.
> > >
> > > And we actually have an existing way to add a file-backed RAM: commit
> > > c902760fb25f ("Add option to use file backed guest memory"). Basically,
> this
> > > commit adds two options, --mem-path and --mem-prealloc, without
> specify
> > > a backend explicitly.
> > >
> > > So how about just adding a new option --mem-share to decide if that's a
> > > private memory or shared memory? That seems much straightforward
> way
> Above options are legacy (which we can't remove for compat reasons),
> their replacement is 'memory-backend-file' backend which has all of
> the above including 'share' property.

OK, such options are legacy. I've no idea of that. Thanks! That makes sense.

> 
> So just add 'memdev' property to machine and reuse memory-backend-file
> with it instead of duplicating functionality in the legacy code.

To "-m" or "-machine"?




Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-23 Thread Tan, Jianfeng


> -Original Message-
> From: Tan, Jianfeng
> Sent: Saturday, February 24, 2018 11:08 AM
> To: 'Igor Mammedov'
> Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-devel@nongnu.org;
> Michael S . Tsirkin
> Subject: RE: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
> migration
> 
> Hi Igor and all,
> 
> > -Original Message-
> > From: Igor Mammedov [mailto:imamm...@redhat.com]
> > Sent: Thursday, February 8, 2018 7:30 PM
> > To: Tan, Jianfeng
> > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-
> de...@nongnu.org;
> > Michael S . Tsirkin
> > Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
> > migration
> >
> [...]
> > > > It could be solved by adding memdev option to machine,
> > > > which would allow to specify backend object. And then on
> > > > top make -mem-path alias new option to clean thing up.
> > >
> > > Do you mean?
> > >
> > > src vm: -m xG
> > > dst vm: -m xG,memdev=pc.ram -object memory-backend-
> file,id=pc.ram,size=xG,mem-path=xxx,share=on ...
> > Yep, I've meant something like it
> >
> > src vm: -m xG,memdev=SHARED_RAM -object memory-backend-
> file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on
> > dst vm: -m xG,memdev=SHARED_RAM -object memory-backend-
> file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on
> 
> After a second thought, I find adding a backend for nonnuma pc RAM is
> roundabout way.
> 
> And we actually have an existing way to add a file-backed RAM: commit
> c902760fb25f ("Add option to use file backed guest memory"). Basically, this
> commit adds two options, --mem-path and --mem-prealloc, without specify
> a backend explicitly.
> 
> So how about just adding a new option --mem-share to decide if that's a
> private memory or shared memory? That seems much straightforward way
> to me; after this change we can migrate like:
> 
> src vm: -m xG
> dst vm: -m xG --mem-path xxx --mem-share
> 

Attach the patch FYI. Look forward to your thoughts.

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 31612ca..5eaf367 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -127,6 +127,7 @@ extern bool enable_mlock;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 extern const char *mem_path;
+extern int mem_share;
 extern int mem_prealloc;
 
 #define MAX_NODES 128
diff --git a/numa.c b/numa.c
index 7b9c33a..322289f 100644
--- a/numa.c
+++ b/numa.c
@@ -456,7 +456,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
*mr, Object *owner,
 if (mem_path) {
 #ifdef __linux__
 Error *err = NULL;
-memory_region_init_ram_from_file(mr, owner, name, ram_size, false,
+memory_region_init_ram_from_file(mr, owner, name, ram_size, mem_share,
  mem_path, );
 if (err) {
 error_report_err(err);
diff --git a/qemu-options.hx b/qemu-options.hx
index 678181c..c968c53 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -389,6 +389,15 @@ STEXI
 Allocate guest RAM from a temporarily created file in @var{path}.
 ETEXI
 
+DEF("mem-share", 0, QEMU_OPTION_memshare,
+"-mem-share   make guest memory shareable (use with -mem-path)\n",
+QEMU_ARCH_ALL)
+STEXI
+@item -mem-share
+@findex -mem-share
+Make file-backed guest RAM shareable when using -mem-path.
+ETEXI
+
 DEF("mem-prealloc", 0, QEMU_OPTION_mem_prealloc,
 "-mem-prealloc   preallocate guest memory (use with -mem-path)\n",
 QEMU_ARCH_ALL)
diff --git a/vl.c b/vl.c
index 444b750..0ff06c2 100644
--- a/vl.c
+++ b/vl.c
@@ -140,6 +140,7 @@ int display_opengl;
 const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
 const char *mem_path = NULL;
+int mem_share = 0;
 int mem_prealloc = 0; /* force preallocation of physical target memory */
 bool enable_mlock = false;
 int nb_nics;
@@ -3395,6 +3396,9 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_mempath:
 mem_path = optarg;
 break;
+case QEMU_OPTION_memshare:
+mem_share = 1;
+break;
 case QEMU_OPTION_mem_prealloc:
 mem_prealloc = 1;
 break;



Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-23 Thread Tan, Jianfeng
Hi Igor and all,

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Thursday, February 8, 2018 7:30 PM
> To: Tan, Jianfeng
> Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-devel@nongnu.org;
> Michael S . Tsirkin
> Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
> migration
> 
[...]
> > > It could be solved by adding memdev option to machine,
> > > which would allow to specify backend object. And then on
> > > top make -mem-path alias new option to clean thing up.
> >
> > Do you mean?
> >
> > src vm: -m xG
> > dst vm: -m xG,memdev=pc.ram -object 
> > memory-backend-file,id=pc.ram,size=xG,mem-path=xxx,share=on ...
> Yep, I've meant something like it
> 
> src vm: -m xG,memdev=SHARED_RAM -object 
> memory-backend-file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on
> dst vm: -m xG,memdev=SHARED_RAM -object 
> memory-backend-file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on

After a second thought, I find adding a backend for nonnuma pc RAM is 
roundabout way.

And we actually have an existing way to add a file-backed RAM: commit 
c902760fb25f ("Add option to use file backed guest memory"). Basically, this 
commit adds two options, --mem-path and --mem-prealloc, without specify a 
backend explicitly.

So how about just adding a new option --mem-share to decide if that's a private 
memory or shared memory? That seems much straightforward way to me; after this 
change we can migrate like:

src vm: -m xG
dst vm: -m xG --mem-path xxx --mem-share

Thanks,
Jianfeng

> 
> or it could be -machine FOO,inital_ram_memdev=...
> maybe making -M optional in this case as size is specified by backend
> 
> PS:
> it's not a good idea to use QEMU's internal id 'pc.ram'
> for user specified objects as it might cause problems.



Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-08 Thread Tan, Jianfeng



On 2/8/2018 5:51 PM, Igor Mammedov wrote:

On Thu, 8 Feb 2018 09:20:45 +0800
"Tan, Jianfeng" <jianfeng@intel.com> wrote:


On 2/7/2018 8:06 PM, Igor Mammedov wrote:

On Wed, 7 Feb 2018 07:49:58 +0000
"Tan, Jianfeng" <jianfeng@intel.com> wrote:
  

-Original Message-
From: Paolo Bonzini [mailto:pbonz...@redhat.com]
Sent: Tuesday, February 6, 2018 1:32 AM
To: Igor Mammedov
Cc: Tan, Jianfeng; qemu-devel@nongnu.org; Jason Wang; Maxime Coquelin;
Michael S . Tsirkin
Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
migration

On 05/02/2018 18:15, Igor Mammedov wrote:

Then we would have both ram block named pc.ram:
Block NamePSize
pc.ram 4 KiB
/objects/pc.ram2 MiB

But I assume it's a corner case which not really happen.

Yeah, you're right. :/  I hadn't thought of hotplug.  It can happen indeed.

perhaps we should fail object_add memory-backend-foo if it resulted
in creating ramblock with duplicate id

Note that it would only be duplicated with Jianfeng's patch.  So I'm
worried that his patch is worse than what we have now, because it may
create conflicts with system RAMBlock names are not necessarily
predictable.  Right now, -object creates RAMBlock names that are nicely
constrained within /object/.

So we are trading off between the benefit it takes and the bad effect it brings.

I'm wondering if the above example is the only failed case this patch leads to, i.e, only there is 
a ram named "pc.ram" and "/object/pc.ram" in the src VM?

Please also consider the second option, that adding an alias name for RAMBlock; 
I'm not a big fan for that one, as it just pushes the problem to 
OpenStack/Libvirt.

looking at provided CLI examples it's configuration issue on src and dst,
one shall not mix numa and non numa variants.

Aha, that's another thing we also want to change. We now add numa at dst
node, only because without -numa, we cannot set up the file-baked memory
with share=on.

then shouldn't you start src with the same -numa to begin with,
changing such things on the fly is not supported.


Yes, you are describing the best practice. But we are originally trying 
to migrate without any changes to QEMU.



General rule is that machine on dst has to be the same as on src.


OK.


(with backend not visible to guest it possible might be changed
but it's hard to tell if something would break due to that
or would continue working in future since doesn't go along with above rule)


For example, "-m xG -mem-path xxx" can set up a file-baked memory, but
the file is not share-able.

It could be solved by adding memdev option to machine,
which would allow to specify backend object. And then on
top make -mem-path alias new option to clean thing up.


Do you mean?

src vm: -m xG
dst vm: -m xG,memdev=pc.ram -object 
memory-backend-file,id=pc.ram,size=xG,mem-path=xxx,share=on ...





But then again, You'd need to start both src and dst
with the same option.


Yeah, got it :-)



Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-07 Thread Tan, Jianfeng



On 2/7/2018 8:06 PM, Igor Mammedov wrote:

On Wed, 7 Feb 2018 07:49:58 +
"Tan, Jianfeng" <jianfeng@intel.com> wrote:


-Original Message-
From: Paolo Bonzini [mailto:pbonz...@redhat.com]
Sent: Tuesday, February 6, 2018 1:32 AM
To: Igor Mammedov
Cc: Tan, Jianfeng; qemu-devel@nongnu.org; Jason Wang; Maxime Coquelin;
Michael S . Tsirkin
Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
migration

On 05/02/2018 18:15, Igor Mammedov wrote:

Then we would have both ram block named pc.ram:
   Block NamePSize
   pc.ram 4 KiB
   /objects/pc.ram2 MiB

But I assume it's a corner case which not really happen.

Yeah, you're right. :/  I hadn't thought of hotplug.  It can happen indeed.

perhaps we should fail object_add memory-backend-foo if it resulted
in creating ramblock with duplicate id

Note that it would only be duplicated with Jianfeng's patch.  So I'm
worried that his patch is worse than what we have now, because it may
create conflicts with system RAMBlock names are not necessarily
predictable.  Right now, -object creates RAMBlock names that are nicely
constrained within /object/.

So we are trading off between the benefit it takes and the bad effect it brings.

I'm wondering if the above example is the only failed case this patch leads to, i.e, only there is 
a ram named "pc.ram" and "/object/pc.ram" in the src VM?

Please also consider the second option, that adding an alias name for RAMBlock; 
I'm not a big fan for that one, as it just pushes the problem to 
OpenStack/Libvirt.

looking at provided CLI examples it's configuration issue on src and dst,
one shall not mix numa and non numa variants.


Aha, that's another thing we also want to change. We now add numa at dst 
node, only because without -numa, we cannot set up the file-baked memory 
with share=on.


For example, "-m xG -mem-path xxx" can set up a file-baked memory, but 
the file is not share-able.





Or any other suggestions?

Fix configuration, namely dst side of it (i.e. use the same -m only variant
without -numa as it's on src).

BTW, what are you trying to achieve adding -numa on dst?


See above reply.

Thanks,
Jianfeng



Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-06 Thread Tan, Jianfeng


> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, February 6, 2018 1:32 AM
> To: Igor Mammedov
> Cc: Tan, Jianfeng; qemu-devel@nongnu.org; Jason Wang; Maxime Coquelin;
> Michael S . Tsirkin
> Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as
> migration
> 
> On 05/02/2018 18:15, Igor Mammedov wrote:
> >>>
> >>> Then we would have both ram block named pc.ram:
> >>>   Block Name    PSize
> >>>   pc.ram 4 KiB
> >>>   /objects/pc.ram    2 MiB
> >>>
> >>> But I assume it's a corner case which not really happen.
> >> Yeah, you're right. :/  I hadn't thought of hotplug.  It can happen indeed.
> >
> > perhaps we should fail object_add memory-backend-foo if it resulted
> > in creating ramblock with duplicate id
> 
> Note that it would only be duplicated with Jianfeng's patch.  So I'm
> worried that his patch is worse than what we have now, because it may
> create conflicts with system RAMBlock names are not necessarily
> predictable.  Right now, -object creates RAMBlock names that are nicely
> constrained within /object/.

So we are trading off between the benefit it takes and the bad effect it brings.

I'm wondering if the above example is the only failed case this patch leads to, 
i.e, only there is a ram named "pc.ram" and "/object/pc.ram" in the src VM?

Please also consider the second option, that adding an alias name for RAMBlock; 
I'm not a big fan for that one, as it just pushes the problem to 
OpenStack/Libvirt.

Or any other suggestions?

Thanks,
Jianfeng


Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-05 Thread Tan, Jianfeng



On 2/6/2018 12:19 AM, Paolo Bonzini wrote:

On 05/02/2018 17:12, Tan, Jianfeng wrote:

Hi Paolo,

On 2/5/2018 11:53 PM, Paolo Bonzini wrote:

On 05/02/2018 15:58, Jianfeng Tan wrote:

Here are some options to fix this:

1. When we do ram name comparison, we truncate the prefix as this
patch shows.
It cannot cover the corner case: the source VM could have two ram blocks
with name of "pc.ram" and "/object/pc.ram".

That shouldn't happen ("pc.ram" exists even in the "-numa
node,memdev=..." case, but it has no RAM block).

Suppose we have a VM started with "-m xG", and then hot plugged with a
ram block:
   (qemu) object_add
memory-backend-file,id=pc.ram,size=1G,mem-path=/dev/hugepages
   (qemu) device_add pc-dimm,id=pc.ram,memdev=pc.ram

Then we would have both ram block named pc.ram:
   Block NamePSize
   pc.ram 4 KiB
   /objects/pc.ram2 MiB

But I assume it's a corner case which not really happen.

Yeah, you're right. :/  I hadn't thought of hotplug.  It can happen indeed.


However, note that

-m xG -numa node,memdev=pc.ram \
-object memory-backend-file,id=pc.ram,...

works for both vhost-kernel and vhost-user, so I'd rather consider this
a configuration problem and not do anything.

That configuration indeed works for both. But in the production env,
lots of VMs are already started with previous mem config. If we do
nothing, it will take a long time (shutdown/start for each VM) to
migrate to the new setup. This patch is to make this process more smooth
without any bad effect if possible.

I understand.  However it's not as bad as "there's no possibility at all
to migrate from vhost-kernel to vhost-user".  There are cases that are
more problematic: for example, there's no possibility at all to add
memory NUMA policy during a live migration, unless -object
memory-backend-* was used on the source.


Please help me to understand: Are you saying it's always recommended to 
use -object memory-backend-* configuration even with vhost-kernel 
backend for the reason you mentioned? Or just another more serious 
problem that we shall work on in priority?


Thanks,
Jianfeng



Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-05 Thread Tan, Jianfeng



On 2/6/2018 12:29 AM, Igor Mammedov wrote:

On Mon,  5 Feb 2018 14:58:55 +
Jianfeng Tan <jianfeng@intel.com> wrote:


Existing VMs with virtio devices and vhost-kernel as the backend
are always started with mem config:

   "-m xG"
   (with a ram block named "pc.ram")

while new VMs with virtio devices and vhost-user as the backend
are always started with mem config:

   "-m xG -numa node,memdev=pc.ram -object memory-backend-file,id=pc.ram,..."
   (with a ram block named "/object/pc.ram")

could you elaborate more on what src command line migrating to what dst command 
line?


The src cmdline:
$QEMU -enable-kvm -cpu host -smp 4 /path/to/img \
  -m 2G \
  -netdev tap,id=mynet1,vhost=on \
  -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 ...

The dst cmdline:
$QEMU -enable-kvm -cpu host -smp 4 /path/to/img \
  -m 2G -numa node,memdev=pc.ram -mem-prealloc \
  -object 
memory-backend-file,id=pc.ram,size=2G,mem-path=/dev/hugepages,share=on \

  -chardev socket,id=char0,path=/tmp/sock0 \
  -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
  -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
  -incoming tcp:0: ...

Thanks,
Jianfeng



Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration

2018-02-05 Thread Tan, Jianfeng

Hi Paolo,

On 2/5/2018 11:53 PM, Paolo Bonzini wrote:

On 05/02/2018 15:58, Jianfeng Tan wrote:

Here are some options to fix this:

1. When we do ram name comparison, we truncate the prefix as this patch shows.
It cannot cover the corner case: the source VM could have two ram blocks
with name of "pc.ram" and "/object/pc.ram".

That shouldn't happen ("pc.ram" exists even in the "-numa
node,memdev=..." case, but it has no RAM block).


Suppose we have a VM started with "-m xG", and then hot plugged with a 
ram block:
  (qemu) object_add 
memory-backend-file,id=pc.ram,size=1G,mem-path=/dev/hugepages

  (qemu) device_add pc-dimm,id=pc.ram,memdev=pc.ram

Then we would have both ram block named pc.ram:
  Block NamePSize
  pc.ram 4 KiB
  /objects/pc.ram2 MiB

But I assume it's a corner case which not really happen.




  RAMBLOCK_FOREACH(block) {
-if (!strcmp(name, block->idstr)) {
+name2 = strdup(block->idstr);
+   id2 = basename(name2);
+if (!strcmp(id1, id2)) {
+free(name1);
+   free(name2);
  return block;
  }
+   free(name2);

Instead of this, perhaps just skip "/object/" in both id1 and
block->idstr?  This also removes the need for strdup/free.


Not familiar with this before, so there would not be something like 
/object/xxx/id? In that case, we can just skip "/object/". Thanks!




However, note that

   -m xG -numa node,memdev=pc.ram \
   -object memory-backend-file,id=pc.ram,...

works for both vhost-kernel and vhost-user, so I'd rather consider this
a configuration problem and not do anything.


That configuration indeed works for both. But in the production env, 
lots of VMs are already started with previous mem config. If we do 
nothing, it will take a long time (shutdown/start for each VM) to 
migrate to the new setup. This patch is to make this process more smooth 
without any bad effect if possible.


Thanks,
Jianfeng



Thanks,

Paolo


  }
  
+free(name1);

  return NULL;
  }
  






Re: [Qemu-devel] [PATCH 3/4] cryptodev-vhost-user: add crypto session handler

2017-11-29 Thread Tan, Jianfeng



On 11/29/2017 5:11 PM, Paolo Bonzini wrote:

On 29/11/2017 05:10, Michael S. Tsirkin wrote:

" Interrupt mode for vhost-user is still not supported in current
implementation. But we are evaluating the necessity now.

That's more or less a spec violation. Guest must get interrupts
if it does not disable them. And it must notify host
if host does not disable notifications.


By that, I mean the transmission from virtio to vhost, if virtio is not 
actively sending packets, at vhost side, we can stop polling from that 
vhost port (i.e., virtio); if VM wants to send packets, besides putting 
packets on virtqueue, it also kicks the backend through pio/mmio.


So no need to modify spec.


I understood that as "the vhost-user server is always using poll mode
for the virtqueues" and cannot use e.g. ioeventfds.


We can change to work like this: stop polling, and wait for the eventfd, 
and then get back to polling.


Thanks,
Jianfeng



Thanks,

Paolo





Re: [Qemu-devel] [dpdk-dev] [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type

2017-11-28 Thread Tan, Jianfeng



On 11/28/2017 1:01 AM, Aaron Conole wrote:

"Tan, Jianfeng" <jianfeng@intel.com> writes:


On 11/27/2017 10:27 PM, Yuanhan Liu wrote:

On Fri, Nov 24, 2017 at 05:59:09PM +0800, Chen Hailin wrote:

Hi Aaron Conole && Jianfeng,

The stp could not work in ovs-dpdk vhostuser.
Because the attached vhost device doesn't have MAC address.

Now we have two ways to solve this problem.
1. The vhost learns MAC address from packet like as my first patch.

I do agree with Aaron this is not the right way.

I do think it should be the vswitch's responsibility to learn mac of
vhost port.

Except that it's the only feasible way without modifying the spec
(yuanhan already makes it very clear below), we can treat the vswitch
as a phsical switch, VM as a physical server, virtio/vhost port as a
back-to-back connected NICs, the only way of the physical switch to
know the mac of the NIC on the other side is ARP learning.

Might I ask why you don't think it's a right way?

As a quick example, I think a malicious guest in a multi-tenant
environment could send traffic out to manipulate this feature into
learning an incorrect mac address.


Instead, I think it's not right to stop such mac spoofing behavior 
(suppose someone wants to have such an experiment in an overlay 
networking). And it actually only affects one “LAN", instead of all "LANs".


And it's usually not the switch's responsibility to detect mac spoofing 
behavior IMHO.



To get this right requires doing deep packet inspection, and making sure
to only learn based on certain l2 traffic.



Yes, should learn based on ARP packets. Your concern is the performance? 
I suppose there is not to many such packets.


Thanks,
Jianfeng



Re: [Qemu-devel] [dpdk-dev] [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type

2017-11-27 Thread Tan, Jianfeng



On 11/28/2017 12:14 AM, Aaron Conole wrote:

Yuanhan Liu  writes:


On Fri, Nov 24, 2017 at 05:59:09PM +0800, Chen Hailin wrote:

Hi Aaron Conole && Jianfeng,

The stp could not work in ovs-dpdk vhostuser.
Because the attached vhost device doesn't have MAC address.

Now we have two ways to solve this problem.
1. The vhost learns MAC address from packet like as my first patch.

I do agree with Aaron this is not the right way.


2. The virtio notifies MAC address actively to vhost user .

Unfortunately, AFAIK, there is no way to achieve that so far. we could
either let virtio/QEMU to expose the CQ to vhost or add a new VHOST_USER
message to carry the mac address. While vhost-user is a generic interface
adding a virtio-net specific message also doesn't seem quite right.
Exposing CQ is probably the best we can do.

Anyway, both need spec change.

There are other possible ways.  libvirt knows about the mac address from
the domain xml file.  Perhaps it's possible to set the mac column in the
database to the correct value when the port is being created in ovs?
This would be an action taken by the orchestration tool.


In OVS db, we can only see vhost port, but not virtio port. That is to 
say, we could use different mac for vhost port from virtio port, 
especially when it works as a vrouter instead of vswitch.




Additionally, there's a mechanism in virtio-net to set the mac address
from the host to the guest.  Is it possible to expose that functionality
through vhost-user?


We can assign mac addr when starting QEMU. After that, I suppose we 
cannot set mac addr any more, let alone setting it from vhost-user side 
(vhost-user protocol does not support it yet).


Thanks,
Jianfeng



Then when the orchestration tool sets the mac, it can be propagated, and
mac_in_use can reflect the appropriate value.  I think that's a workable
solution, but I might have missed something.


--yliu

In my opinions,  if we treat it as a device,  we should allocate
MAC address for the device when the VM started.

Which one do you think better?



Best Regards,
Chen Hailin
che...@arraynetworks.com.cn
  
From: Aaron Conole

Date: 2017-11-18 10:00
To: Hailin Chen
CC: ovs-...@openvswitch.org; Maxime Coquelin; cl...@arraynetworks.com.cn
Subject: Re: [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain
mac address when received first packet in vhost type
Hi Hailin,
  
Hailin Chen  writes:
  

The stp could not work on netdev-dpdk if network is loop.
Because the stp protocol negotiates designate port by sending
BPDU packets which contains MAC address.
However the device doesn't have MAC address in vhostuser type.
Thus, function send_bpdu_cb would not send BPDU packets.

This patch will set the MAC for device when received first packet.

Signed-off-by: Hailin Chen 
---
  
Thanks for the patch.
  
In general, I don't think this is the right approach to deal with this

type of issue.  I believe the problem statement is that OvS bridge is
unaware of the guest MAC address - did I get it right?  In that case, I
would think that a better way to solve this would be to have virtio tell
the mac address of the guest.  I don't recall right now if that's
allowed in the virtio spec, but I do remember some kind of negotiation
features.
  
I've CC'd Maxime, who is one of the maintainers of the virtio code from

DPDK side.  Perhaps there is an alternate way to solve this.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev





Re: [Qemu-devel] [dpdk-dev] [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type

2017-11-27 Thread Tan, Jianfeng



On 11/27/2017 10:27 PM, Yuanhan Liu wrote:

On Fri, Nov 24, 2017 at 05:59:09PM +0800, Chen Hailin wrote:

Hi Aaron Conole && Jianfeng,

The stp could not work in ovs-dpdk vhostuser.
Because the attached vhost device doesn't have MAC address.

Now we have two ways to solve this problem.
1. The vhost learns MAC address from packet like as my first patch.

I do agree with Aaron this is not the right way.


I do think it should be the vswitch's responsibility to learn mac of 
vhost port.


Except that it's the only feasible way without modifying the spec 
(yuanhan already makes it very clear below), we can treat the vswitch as 
a phsical switch, VM as a physical server, virtio/vhost port as a 
back-to-back connected NICs, the only way of the physical switch to know 
the mac of the NIC on the other side is ARP learning.


Might I ask why you don't think it's a right way?

Thanks,
Jianfeng




2. The virtio notifies MAC address actively to vhost user .

Unfortunately, AFAIK, there is no way to achieve that so far. we could
either let virtio/QEMU to expose the CQ to vhost or add a new VHOST_USER
message to carry the mac address. While vhost-user is a generic interface
adding a virtio-net specific message also doesn't seem quite right.
Exposing CQ is probably the best we can do.

Anyway, both need spec change.

--yliu

In my opinions,  if we treat it as a device,  we should allocate
MAC address for the device when the VM started.

Which one do you think better?



Best Regards,
Chen Hailin
che...@arraynetworks.com.cn
  
From: Aaron Conole

Date: 2017-11-18 10:00
To: Hailin Chen
CC: ovs-...@openvswitch.org; Maxime Coquelin; cl...@arraynetworks.com.cn
Subject: Re: [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address 
when received first packet in vhost type
Hi Hailin,
  
Hailin Chen  writes:
  

The stp could not work on netdev-dpdk if network is loop.
Because the stp protocol negotiates designate port by sending
BPDU packets which contains MAC address.
However the device doesn't have MAC address in vhostuser type.
Thus, function send_bpdu_cb would not send BPDU packets.

This patch will set the MAC for device when received first packet.

Signed-off-by: Hailin Chen 
---
  
Thanks for the patch.
  
In general, I don't think this is the right approach to deal with this

type of issue.  I believe the problem statement is that OvS bridge is
unaware of the guest MAC address - did I get it right?  In that case, I
would think that a better way to solve this would be to have virtio tell
the mac address of the guest.  I don't recall right now if that's
allowed in the virtio spec, but I do remember some kind of negotiation
features.
  
I've CC'd Maxime, who is one of the maintainers of the virtio code from

DPDK side.  Perhaps there is an alternate way to solve this.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev