Re: [PATCH] KVM: APIC: avoid instruction emulation for EOI writes

2011-09-11 Thread Avi Kivity

On 09/10/2011 11:41 AM, ya su wrote:

0x80637b85:  testl $0x1000, 0xfffe0300
0x80637b8f:   jne 0x80637b85
0x80637b91:  mov %ecx, 0xfffe0300
0x80637b97:  testl $0x1000, 0xfffe0300
0x80637ba1:  jne 0x80637b97

 I wonder why testl operation will also cause a ICR write, from the
asm code, there should only issue one IPI, but from trace-cmd, it
issued 3 IPI, is there something wrong?


It's a bug in test insn emulation, coincidentally I wrote a patch to fix 
it yesterday, not imagining that it actually happens in practice.



Is it also possible to optimize ICR write emulation, from the
result, winxp vm will produce a lot of ICR writes



Unfortunately not.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: APIC: avoid instruction emulation for EOI writes

2011-09-11 Thread Jan Kiszka
On 2011-09-11 09:11, Avi Kivity wrote:
 On 09/10/2011 11:41 AM, ya su wrote:
 0x80637b85:  testl $0x1000, 0xfffe0300
 0x80637b8f:   jne 0x80637b85
 0x80637b91:  mov %ecx, 0xfffe0300
 0x80637b97:  testl $0x1000, 0xfffe0300
 0x80637ba1:  jne 0x80637b97

  I wonder why testl operation will also cause a ICR write, from the
 asm code, there should only issue one IPI, but from trace-cmd, it
 issued 3 IPI, is there something wrong?
 
 It's a bug in test insn emulation, coincidentally I wrote a patch to fix
 it yesterday, not imagining that it actually happens in practice.
 
 Is it also possible to optimize ICR write emulation, from the
 result, winxp vm will produce a lot of ICR writes

 
 Unfortunately not.

I'm just hoping we'll see hardware-assisted APIC, ideally also IOAPIC
virtualization soon.

Jan




signature.asc
Description: OpenPGP digital signature


[PATCH 1/2] KVM: Clean up unneeded void pointer casts

2011-09-11 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 virt/kvm/assigned-dev.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 4e9eaeb..ea1bf5d 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -143,7 +143,7 @@ static void deassign_host_irq(struct kvm *kvm,
 
for (i = 0; i  assigned_dev-entries_nr; i++)
free_irq(assigned_dev-host_msix_entries[i].vector,
-(void *)assigned_dev);
+assigned_dev);
 
assigned_dev-entries_nr = 0;
kfree(assigned_dev-host_msix_entries);
@@ -153,7 +153,7 @@ static void deassign_host_irq(struct kvm *kvm,
/* Deal with MSI and INTx */
disable_irq(assigned_dev-host_irq);
 
-   free_irq(assigned_dev-host_irq, (void *)assigned_dev);
+   free_irq(assigned_dev-host_irq, assigned_dev);
 
if (assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSI)
pci_disable_msi(assigned_dev-dev);
@@ -237,7 +237,7 @@ static int assigned_device_enable_host_intx(struct kvm *kvm,
 * are going to be long delays in accepting, acking, etc.
 */
if (request_threaded_irq(dev-host_irq, NULL, kvm_assigned_dev_thread,
-IRQF_ONESHOT, dev-irq_name, (void *)dev))
+IRQF_ONESHOT, dev-irq_name, dev))
return -EIO;
return 0;
 }
@@ -256,7 +256,7 @@ static int assigned_device_enable_host_msi(struct kvm *kvm,
 
dev-host_irq = dev-dev-irq;
if (request_threaded_irq(dev-host_irq, NULL, kvm_assigned_dev_thread,
-0, dev-irq_name, (void *)dev)) {
+0, dev-irq_name, dev)) {
pci_disable_msi(dev-dev);
return -EIO;
}
@@ -283,7 +283,7 @@ static int assigned_device_enable_host_msix(struct kvm *kvm,
for (i = 0; i  dev-entries_nr; i++) {
r = request_threaded_irq(dev-host_msix_entries[i].vector,
 NULL, kvm_assigned_dev_thread,
-0, dev-irq_name, (void *)dev);
+0, dev-irq_name, dev);
if (r)
goto err;
}
@@ -291,7 +291,7 @@ static int assigned_device_enable_host_msix(struct kvm *kvm,
return 0;
 err:
for (i -= 1; i = 0; i--)
-   free_irq(dev-host_msix_entries[i].vector, (void *)dev);
+   free_irq(dev-host_msix_entries[i].vector, dev);
pci_disable_msix(dev-dev);
return r;
 }
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: Avoid needless registrations of IRQ ack notifier for assigned devices

2011-09-11 Thread Jan Kiszka
From: Jan Kiszka jan.kis...@siemens.com

We only perform work in kvm_assigned_dev_ack_irq if the guest IRQ is of
INTx type. This completely avoids the callback invocation in non-INTx
cases by registering the IRQ ack notifier only for INTx.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Part of my old INTx sharing series, but actually not depending on it.

 virt/kvm/assigned-dev.c |   18 --
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index ea1bf5d..84ead54 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -86,13 +86,9 @@ static irqreturn_t kvm_assigned_dev_thread(int irq, void 
*dev_id)
 /* Ack the irq line for an assigned device */
 static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian)
 {
-   struct kvm_assigned_dev_kernel *dev;
-
-   if (kian-gsi == -1)
-   return;
-
-   dev = container_of(kian, struct kvm_assigned_dev_kernel,
-  ack_notifier);
+   struct kvm_assigned_dev_kernel *dev =
+   container_of(kian, struct kvm_assigned_dev_kernel,
+ack_notifier);
 
kvm_set_irq(dev-kvm, dev-irq_source_id, dev-guest_irq, 0);
 
@@ -110,8 +106,9 @@ static void kvm_assigned_dev_ack_irq(struct 
kvm_irq_ack_notifier *kian)
 static void deassign_guest_irq(struct kvm *kvm,
   struct kvm_assigned_dev_kernel *assigned_dev)
 {
-   kvm_unregister_irq_ack_notifier(kvm, assigned_dev-ack_notifier);
-   assigned_dev-ack_notifier.gsi = -1;
+   if (assigned_dev-ack_notifier.gsi != -1)
+   kvm_unregister_irq_ack_notifier(kvm,
+   assigned_dev-ack_notifier);
 
kvm_set_irq(assigned_dev-kvm, assigned_dev-irq_source_id,
assigned_dev-guest_irq, 0);
@@ -404,7 +401,8 @@ static int assign_guest_irq(struct kvm *kvm,
 
if (!r) {
dev-irq_requested_type |= guest_irq_type;
-   kvm_register_irq_ack_notifier(kvm, dev-ack_notifier);
+   if (dev-ack_notifier.gsi != -1)
+   kvm_register_irq_ack_notifier(kvm, dev-ack_notifier);
} else
kvm_free_irq_source_id(kvm, dev-irq_source_id);
 
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86 emulator: disable writeback for TEST

2011-09-11 Thread Avi Kivity
The TEST instruction doesn't write its destination operand.  This
could cause problems if an MMIO register was accessed using the TEST
instruction.  Recently Windows XP was observed to use TEST against
the APIC ICR; this can cause spurious IPIs.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/emulate.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c636ee7..c37f67e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1670,6 +1670,8 @@ static int em_grp3(struct x86_emulate_ctxt *ctxt)
switch (ctxt-modrm_reg) {
case 0 ... 1:   /* test */
emulate_2op_SrcV(ctxt, test);
+   /* Disable writeback. */
+   ctxt-dst.type = OP_NONE;
break;
case 2: /* not */
ctxt-dst.val = ~ctxt-dst.val;
@@ -2513,6 +2515,8 @@ static int em_cmp(struct x86_emulate_ctxt *ctxt)
 static int em_test(struct x86_emulate_ctxt *ctxt)
 {
emulate_2op_SrcV(ctxt, test);
+   /* Disable writeback. */
+   ctxt-dst.type = OP_NONE;
return X86EMUL_CONTINUE;
 }
 
-- 
1.7.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About hotplug multifunction

2011-09-11 Thread Michael S. Tsirkin
On Sat, Sep 10, 2011 at 02:43:11AM +0900, Isaku Yamahata wrote:
 pci/pcie hot plug needs clean up for multifunction hotplug in long term.
 Only single function device case works. Multifunction case is broken somwehat.
 Especially the current acpi based hotplug should be replaced by
 the standardized hot plug controller in long term.

We'll need to keep supporting windows XP, which IIUC only
supports hotplug through ACPI. So it looks like we'll
need both.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About hotplug multifunction

2011-09-11 Thread Michael S. Tsirkin
On Fri, Sep 09, 2011 at 03:34:26PM -0300, Marcelo Tosatti wrote:
   something I noted when readin our acpi code:
   we currently pass eject request for function 0 only:
  Name (_ADR, nr##)
   We either need a device per function there (acpi 1.0),
   send eject request for them all, or use 
   as function number (newer acpi, not sure which version).
   Need to see which guests (windows,linux) can handle which form.
  
  I'd guess we need to change that to .
 
 No need, only make sure function 0 is there and all other functions
 should be removed automatically by the guest on eject notification.

Hmm, the ACPI spec explicitly says:

High word = Device #, Low word = Function #.
(e.g., device 3, function 2 is 0x00030002). To refer
to all the functions on a device #, use a function
number of ).


 ACPI PCI hotplug is based on slots, not on functions. It does not
 support addition/removal of individual functions.

Interesting. Is this just based on general logic,
reading of the linux driver or the ACPI spec?

The ACPI spec itself seems pretty vague. All tables
list devices, where each device has an _ADR entry,
which is built up of PCI device # and function #.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

2011-09-11 Thread Michael S. Tsirkin
On Fri, Sep 09, 2011 at 09:33:33AM -0700, Roopa Prabhu wrote:
 
 
 
 On 9/8/11 10:55 PM, Michael S. Tsirkin m...@redhat.com wrote:
 
  On Thu, Sep 08, 2011 at 07:53:11PM -0700, Roopa Prabhu wrote:
  Phase 1: Goal: Enable hardware filtering for all macvlan modes
  - In macvlan passthru mode the single guest virtio-nic connected will
receive traffic that he requested for
  - In macvlan non-passthru mode all guest virtio-nics sharing the
physical nic will see all other guest traffic
but the filtering at guest virtio-nic
  
  I don't think guests currently filter anything.
  
  I was referring to Qemu-kvm virtio-net in
  virtion_net_receive-receive_filter. I think It only passes pkts that the
  guest OS is interested. It uses the filter table that I am passing to
  macvtap in this patch.
  
  This happens after userspace thread gets woken up and data
  is copied there. So relying on filtering at that level is
  going to be very inefficient on a system with
  multiple active guests. Further, and for that reason, vhost-net
  doesn't do filtering at all, relying on the backends
  to pass it correct packets.
 
 Ok thanks for the info. So in which case, phase 1 is best for PASSTHRU mode
 and for non-PASSTHRU when there is a single guest connected to a VF.
 For non-PASSTHRU multi guest sharing the same VF, Phase 1 is definitely
 better than putting the VF in promiscuous mode.
 But to address the concern you mention above, in phase 2 when we have more
 than one guest sharing the VF,

It's probably more interesting for a card without SRIOV support.

 we will have to add filter lookup in macvlan
 to filter pkts for each guest.

Any chance to enable hardware filters for that?

 This will need some performance tests too.
 
 Will start investigating the netlink interface comments for phase 1 first.
 
 Thanks!
 -Roopa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

2011-09-11 Thread Michael S. Tsirkin
On Thu, Sep 08, 2011 at 08:00:53PM -0700, Roopa Prabhu wrote:
 
 
 
 On 9/8/11 12:33 PM, Michael S. Tsirkin m...@redhat.com wrote:
 
  On Thu, Sep 08, 2011 at 12:23:56PM -0700, Roopa Prabhu wrote:
  
  I think the main usecase for passthru mode is to assign a SR-IOV VF to
  a single guest.
  
  Yes and for the passthru usecase this patch should be enough to enable
  filtering in hw (eventually like I indicated before I need to fix vlan
  filtering too).
  
  So with filtering in hw, and in sriov VF case, VFs
  actually share a filtering table. How will that
  be partitioned?
 
 AFAIK, though it might maintain a single filter table space in hw, hw does
 know which filter belongs to which VF. And the OS driver does not need to do
 anything special. The VF driver exposes a VF netdev. And any uc/mc addresses
 registered with a VF netdev are registered with the hw by the driver. And hw
 will filter and send only pkts that the VF has expressed interest in.
 
 No special filter partitioning in hw is required.
 
 Thanks,
 Roopa

Yes, but what I mean is, if the size of the single filter table
is limited, we need to decide how many addresses is
each guest allowed. If we let one guest ask for
as many as it wants, it can lock others out.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking

2011-09-11 Thread Avi Kivity

On 09/08/2011 08:29 PM, Jeremy Fitzhardinge wrote:

  I don't think it's that expensive, especially compared to the
  double-context-switch and vmexit of the spinner going to sleep.  On
  AMD we do have to take an extra vmexit (on IRET) though.

Fair enough - so if the vcpu blocks itself, it ends up being rescheduled
in the NMI handler, which then returns to the lock slowpath.  And if its
a normal hlt, then you can also take interrupts if they're enabled while
spinning.


Yes.  To be clear, just execute 'hlt' and inherit the interrupt enable 
flag from the environment.



And if you get nested NMIs (since you can get multiple spurious kicks,
or from other NMI sources), then one NMI will get latched and any others
will get dropped?


While we're in the NMI handler, any further NMIs will be collapsed and 
queued (so one NMI can be in service and just one other queued behind 
it).  We can detect this condition by checking %rip on stack.




  Well we could have a specialized sleep/wakeup hypercall pair like Xen,
  but I'd like to avoid it if at all possible.

Yeah, that's something that just falls out of the existing event channel
machinery, so it isn't something that I specifically added.  But it does
mean that you simply end up with a hypercall returning on kick, with no
real complexities.


It also has to return on interrupt, MNI, INIT etc.  No real 
complexities is a meaningless phrase on x86, though it is fertile 
ground for math puns.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

2011-09-11 Thread Roopa Prabhu



On 9/11/11 2:44 AM, Michael S. Tsirkin m...@redhat.com wrote:

 
 AFAIK, though it might maintain a single filter table space in hw, hw does
 know which filter belongs to which VF. And the OS driver does not need to do
 anything special. The VF driver exposes a VF netdev. And any uc/mc addresses
 registered with a VF netdev are registered with the hw by the driver. And hw
 will filter and send only pkts that the VF has expressed interest in.
 
 No special filter partitioning in hw is required.
 
 Thanks,
 Roopa
 
 Yes, but what I mean is, if the size of the single filter table
 is limited, we need to decide how many addresses is
 each guest allowed. If we let one guest ask for
 as many as it wants, it can lock others out.

Yes true. In these cases ie when the number of unicast addresses being
registered is more than it can handle, The VF driver will put the VF  in
promiscuous mode (Or at least its supposed to do. I think all drivers do
that).


Thanks,
Roopa


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

2011-09-11 Thread Roopa Prabhu



On 9/11/11 2:38 AM, Michael S. Tsirkin m...@redhat.com wrote:

 On Fri, Sep 09, 2011 at 09:33:33AM -0700, Roopa Prabhu wrote:
 
 
 
 On 9/8/11 10:55 PM, Michael S. Tsirkin m...@redhat.com wrote:
 
 On Thu, Sep 08, 2011 at 07:53:11PM -0700, Roopa Prabhu wrote:
 Phase 1: Goal: Enable hardware filtering for all macvlan modes
 - In macvlan passthru mode the single guest virtio-nic connected will
   receive traffic that he requested for
 - In macvlan non-passthru mode all guest virtio-nics sharing the
   physical nic will see all other guest traffic
   but the filtering at guest virtio-nic
 
 I don't think guests currently filter anything.
 
 I was referring to Qemu-kvm virtio-net in
 virtion_net_receive-receive_filter. I think It only passes pkts that the
 guest OS is interested. It uses the filter table that I am passing to
 macvtap in this patch.
 
 This happens after userspace thread gets woken up and data
 is copied there. So relying on filtering at that level is
 going to be very inefficient on a system with
 multiple active guests. Further, and for that reason, vhost-net
 doesn't do filtering at all, relying on the backends
 to pass it correct packets.
 
 Ok thanks for the info. So in which case, phase 1 is best for PASSTHRU mode
 and for non-PASSTHRU when there is a single guest connected to a VF.
 For non-PASSTHRU multi guest sharing the same VF, Phase 1 is definitely
 better than putting the VF in promiscuous mode.
 But to address the concern you mention above, in phase 2 when we have more
 than one guest sharing the VF,
 
 It's probably more interesting for a card without SRIOV support.
 
If its an SRIOV card I am assuming people likely using PASSTHRU mode.
Non-SRIOV cards will use any of the non-PASSTHRU mode.


 we will have to add filter lookup in macvlan
 to filter pkts for each guest.
 
 Any chance to enable hardware filters for that?
 
NAFAIK. Am not sure how you would do it too. Its still a single device from
where the host receives traffic from.

Thanks,
Roopa
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About hotplug multifunction

2011-09-11 Thread Marcelo Tosatti
On Sun, Sep 11, 2011 at 12:23:57PM +0300, Michael S. Tsirkin wrote:
 On Fri, Sep 09, 2011 at 03:34:26PM -0300, Marcelo Tosatti wrote:
something I noted when readin our acpi code:
we currently pass eject request for function 0 only:
   Name (_ADR, nr##)
We either need a device per function there (acpi 1.0),
send eject request for them all, or use 
as function number (newer acpi, not sure which version).
Need to see which guests (windows,linux) can handle which form.
   
   I'd guess we need to change that to .
  
  No need, only make sure function 0 is there and all other functions
  should be removed automatically by the guest on eject notification.
 
 Hmm, the ACPI spec explicitly says:
 
 High word = Device #, Low word = Function #.
 (e.g., device 3, function 2 is 0x00030002). To refer
 to all the functions on a device #, use a function
 number of ).

Right, but this is the _ADR of the device instance in ACPI. 
The communication between QEMU and the ACPI DSL code is all 
based in slots.

  ACPI PCI hotplug is based on slots, not on functions. It does not
  support addition/removal of individual functions.
 
 Interesting. Is this just based on general logic,
 reading of the linux driver or the ACPI spec?

Its based on Seabios ACPI DST implementation and its relationship with
the QEMU implementation in acpi_piix4.c.

 The ACPI spec itself seems pretty vague. All tables
 list devices, where each device has an _ADR entry,
 which is built up of PCI device # and function #.

Yes, it is vague. Given the mandate from the PCI spec a device _must
contain_ function 0, usage (including hotplug/unplug) of individual
functions other than 0 as separate devices is a no-go.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About hotplug multifunction

2011-09-11 Thread Michael S. Tsirkin
On Sun, Sep 11, 2011 at 12:01:49PM -0300, Marcelo Tosatti wrote:
 On Sun, Sep 11, 2011 at 12:23:57PM +0300, Michael S. Tsirkin wrote:
  On Fri, Sep 09, 2011 at 03:34:26PM -0300, Marcelo Tosatti wrote:
 something I noted when readin our acpi code:
 we currently pass eject request for function 0 only:
Name (_ADR, nr##)
 We either need a device per function there (acpi 1.0),
 send eject request for them all, or use 
 as function number (newer acpi, not sure which version).
 Need to see which guests (windows,linux) can handle which form.

I'd guess we need to change that to .
   
   No need, only make sure function 0 is there and all other functions
   should be removed automatically by the guest on eject notification.
  
  Hmm, the ACPI spec explicitly says:
  
  High word = Device #, Low word = Function #.
  (e.g., device 3, function 2 is 0x00030002). To refer
  to all the functions on a device #, use a function
  number of ).
 
 Right, but this is the _ADR of the device instance in ACPI. 
 The communication between QEMU and the ACPI DSL code is all 
 based in slots.

It's easy to extend that if we like though.

   ACPI PCI hotplug is based on slots, not on functions. It does not
   support addition/removal of individual functions.
  
  Interesting. Is this just based on general logic,
  reading of the linux driver or the ACPI spec?
 
 Its based on Seabios ACPI DST implementation and its relationship with
 the QEMU implementation in acpi_piix4.c.
 
  The ACPI spec itself seems pretty vague. All tables
  list devices, where each device has an _ADR entry,
  which is built up of PCI device # and function #.
 
 Yes, it is vague. Given the mandate from the PCI spec a device _must
 contain_ function 0, usage (including hotplug/unplug) of individual
 functions other than 0 as separate devices is a no-go.

It doesn't seem to be a big issue.
We could, for example, keep a stub function 0 around.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

2011-09-11 Thread Michael S. Tsirkin
On Sun, Sep 11, 2011 at 06:18:02AM -0700, Roopa Prabhu wrote:
 
 
 
 On 9/11/11 2:38 AM, Michael S. Tsirkin m...@redhat.com wrote:
 
  On Fri, Sep 09, 2011 at 09:33:33AM -0700, Roopa Prabhu wrote:
  
  
  
  On 9/8/11 10:55 PM, Michael S. Tsirkin m...@redhat.com wrote:
  
  On Thu, Sep 08, 2011 at 07:53:11PM -0700, Roopa Prabhu wrote:
  Phase 1: Goal: Enable hardware filtering for all macvlan modes
  - In macvlan passthru mode the single guest virtio-nic connected 
  will
receive traffic that he requested for
  - In macvlan non-passthru mode all guest virtio-nics sharing the
physical nic will see all other guest traffic
but the filtering at guest virtio-nic
  
  I don't think guests currently filter anything.
  
  I was referring to Qemu-kvm virtio-net in
  virtion_net_receive-receive_filter. I think It only passes pkts that the
  guest OS is interested. It uses the filter table that I am passing to
  macvtap in this patch.
  
  This happens after userspace thread gets woken up and data
  is copied there. So relying on filtering at that level is
  going to be very inefficient on a system with
  multiple active guests. Further, and for that reason, vhost-net
  doesn't do filtering at all, relying on the backends
  to pass it correct packets.
  
  Ok thanks for the info. So in which case, phase 1 is best for PASSTHRU mode
  and for non-PASSTHRU when there is a single guest connected to a VF.
  For non-PASSTHRU multi guest sharing the same VF, Phase 1 is definitely
  better than putting the VF in promiscuous mode.
  But to address the concern you mention above, in phase 2 when we have more
  than one guest sharing the VF,
  
  It's probably more interesting for a card without SRIOV support.
  
 If its an SRIOV card I am assuming people likely using PASSTHRU mode.
 Non-SRIOV cards will use any of the non-PASSTHRU mode.
 
 
  we will have to add filter lookup in macvlan
  to filter pkts for each guest.
  
  Any chance to enable hardware filters for that?
  
 NAFAIK. Am not sure how you would do it too. Its still a single device from
 where the host receives traffic from.
 
 Thanks,
 Roopa

VMDQ cards might let you program mac addresses for individula rings.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

2011-09-11 Thread Michael S. Tsirkin
On Sun, Sep 11, 2011 at 06:18:01AM -0700, Roopa Prabhu wrote:
 
 
 
 On 9/11/11 2:44 AM, Michael S. Tsirkin m...@redhat.com wrote:
 
  
  AFAIK, though it might maintain a single filter table space in hw, hw does
  know which filter belongs to which VF. And the OS driver does not need to 
  do
  anything special. The VF driver exposes a VF netdev. And any uc/mc 
  addresses
  registered with a VF netdev are registered with the hw by the driver. And 
  hw
  will filter and send only pkts that the VF has expressed interest in.
  
  No special filter partitioning in hw is required.
  
  Thanks,
  Roopa
  
  Yes, but what I mean is, if the size of the single filter table
  is limited, we need to decide how many addresses is
  each guest allowed. If we let one guest ask for
  as many as it wants, it can lock others out.
 
 Yes true. In these cases ie when the number of unicast addresses being
 registered is more than it can handle, The VF driver will put the VF  in
 promiscuous mode (Or at least its supposed to do. I think all drivers do
 that).
 
 
 Thanks,
 Roopa

Right, so that works at least but likely performs worse
than a hardware filter. So we better allocate it in
some fair way, as a minimum. Maybe a way for
the admin to control that allocation is useful.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode

2011-09-11 Thread Sridhar Samudrala

On 9/11/2011 6:18 AM, Roopa Prabhu wrote:



On 9/11/11 2:44 AM, Michael S. Tsirkinm...@redhat.com  wrote:


AFAIK, though it might maintain a single filter table space in hw, hw does
know which filter belongs to which VF. And the OS driver does not need to do
anything special. The VF driver exposes a VF netdev. And any uc/mc addresses
registered with a VF netdev are registered with the hw by the driver. And hw
will filter and send only pkts that the VF has expressed interest in.

No special filter partitioning in hw is required.

Thanks,
Roopa

Yes, but what I mean is, if the size of the single filter table
is limited, we need to decide how many addresses is
each guest allowed. If we let one guest ask for
as many as it wants, it can lock others out.

Yes true. In these cases ie when the number of unicast addresses being
registered is more than it can handle, The VF driver will put the VF  in
promiscuous mode (Or at least its supposed to do. I think all drivers do
that).

What does putting VF in promiscuous mode mean?  How can the NIC decide 
which set
of mac addresses are passed to the VF? Does it mean VF sees all the 
packets received

by the NIC including packets destined for other VFs/PF?

Thanks
Sridhar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html