Re: [PATCH v5 6/9] KVM: MMU: fast path of handling guest page fault

2012-05-24 Thread Xiao Guangrong
On 05/23/2012 07:34 PM, Avi Kivity wrote:


  static bool spte_has_volatile_bits(u64 spte)
  {
 +/*
 + * Always atomicly update spte if it can be updated
 + * out of mmu-lock.
 + */
 +if (spte_can_lockless_update(spte))
 +return true;
 
 
 This is a really subtle point, but is it really needed?
 
 Lockless spte updates should always set the dirty and accessed bits, so
 we won't be overwriting any volatile bits there.
 


Avi,

Currently, The spte update/clear paths in mmu-lock think the Dirty bit is
not volatile if the spte is readonly. Then the Dirty bit caused by
lockless update can be lost.

And, for tlb flush:

|* If we overwrite a writable spte with a read-only one we
|* should flush remote TLBs. Otherwise rmap_write_protect
|* will find a read-only spte, even though the writable spte
|* might be cached on a CPU's TLB.
|*/
|   if (is_writable_pte(entry)  !is_writable_pte(*sptep))
|kvm_flush_remote_tlbs(vcpu-kvm);

Atomically update spte can help us to get a stable is_writable_pte().


 +
  if (!shadow_accessed_mask)
  return false;

 @@ -498,13 +517,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
  return ret;
  }

 -new_spte |= old_spte  shadow_dirty_mask;
 -
 -mask = shadow_accessed_mask;
 -if (is_writable_pte(old_spte))
 -mask |= shadow_dirty_mask;
 -
 -if (!spte_has_volatile_bits(old_spte) || (new_spte  mask) == mask)
 +if (!spte_has_volatile_bits(old_spte))
  __update_clear_spte_fast(sptep, new_spte);
  else
  old_spte = __update_clear_spte_slow(sptep, new_spte);
 
 
 It looks like the old code is bad.. why can we ignore volatile bits in
 the old spte?  Suppose pfn is changing?
 


/* Rules for using mmu_spte_update:
 * Update the state bits, it means the mapped pfn is not changged.

If pfn is changed, we should clear spte first, then set the spte to
the new pfn, in kvm_set_pte_rmapp(), we have:

| mmu_spte_clear_track_bits(sptep);
| mmu_spte_set(sptep, new_spte);

 +
 +static bool
 +fast_pf_fix_indirect_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 +  u64 *sptep, u64 spte, gfn_t gfn)
 +{
 +pfn_t pfn;
 +bool ret = false;
 +
 +/*
 + * For the indirect spte, it is hard to get a stable gfn since
 + * we just use a cmpxchg to avoid all the races which is not
 + * enough to avoid the ABA problem: the host can arbitrarily
 + * change spte and the mapping from gfn to pfn.
 + *
 + * What we do is call gfn_to_pfn_atomic to bind the gfn and the
 + * pfn because after the call:
 + * - we have held the refcount of pfn that means the pfn can not
 + *   be freed and be reused for another gfn.
 + * - the pfn is writable that means it can not be shared by different
 + *   gfn.
 + */
 +pfn = gfn_to_pfn_atomic(vcpu-kvm, gfn);
 +
 +/* The host page is swapped out or merged. */
 +if (mmu_invalid_pfn(pfn))
 +goto exit;
 +
 +ret = true;
 +
 +if (pfn != spte_to_pfn(spte))
 +goto exit;
 +
 +if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
 +mark_page_dirty(vcpu-kvm, gfn);
 
 Isn't it better to kvm_release_pfn_dirty() here?
 


Right, kvm_release_pfn_dirty is better.

 +
 +exit:
 +kvm_release_pfn_clean(pfn);
 +return ret;
 +}
 +
 + +
 +/*
 + * Return value:
 + * - true: let the vcpu to access on the same address again.
 + * - false: let the real page fault path to fix it.
 + */
 +static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
 +int level, u32 error_code)
 +{
 +struct kvm_shadow_walk_iterator iterator;
 +struct kvm_mmu_page *sp;
 +bool ret = false;
 +u64 spte = 0ull;
 +
 +if (!page_fault_can_be_fast(vcpu, gfn, error_code))
 +return false;
 +
 
 No need to pass gfn here.


Right, will fix it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/9] KVM: fast page fault

2012-05-24 Thread Xiao Guangrong
On 05/23/2012 07:37 PM, Avi Kivity wrote:

 On 05/23/2012 11:51 AM, Xiao Guangrong wrote:
 Changlog:
 This are some changes from Marcelo's review:
 - drop SPTE_WRITE_PROTECT bit, now, only one bit is needed to do
   lockless update.

 - always atomicly update spte if it can be updated out of mmu-lock.

 - flod the judgement of tlb flush into mmu_spte_update(), make the
   code easilyer audited.

 Performance result:
 - autoest migration test 
 (smp2.Fedora.16.64.migrate.with_autotest.dbench.unix):
before after
 smp2.Fedora.16.64.migrate.unix93  91+2.1%
 smp2.Fedora.16.64.migrate.with_autotest.dbench.unix   218 188   +13.7%

 - the benchmark attached is used to measure the resuming time
   after dirty-log
 beforeafter
 Run 10 times, Avg time:   512466818 ns.  269231261 ns   +47.5%
 
 Still scary (esp. indirect sptes), but looks pretty good.


Hmm, i do not have a better way to solve the ABA problem on indirect spte now. 
:(
How about only allow fast page fault to work for direct spte?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/9] KVM: fast page fault

2012-05-24 Thread Avi Kivity
On 05/24/2012 09:31 AM, Xiao Guangrong wrote:
 On 05/23/2012 07:37 PM, Avi Kivity wrote:
 
 On 05/23/2012 11:51 AM, Xiao Guangrong wrote:
 Changlog:
 This are some changes from Marcelo's review:
 - drop SPTE_WRITE_PROTECT bit, now, only one bit is needed to do
   lockless update.

 - always atomicly update spte if it can be updated out of mmu-lock.

 - flod the judgement of tlb flush into mmu_spte_update(), make the
   code easilyer audited.

 Performance result:
 - autoest migration test 
 (smp2.Fedora.16.64.migrate.with_autotest.dbench.unix):
before after
 smp2.Fedora.16.64.migrate.unix93  91+2.1%
 smp2.Fedora.16.64.migrate.with_autotest.dbench.unix   218 188   +13.7%

 - the benchmark attached is used to measure the resuming time
   after dirty-log
 beforeafter
 Run 10 times, Avg time:   512466818 ns.  269231261 ns   +47.5%
 
 Still scary (esp. indirect sptes), but looks pretty good.
 
 
 Hmm, i do not have a better way to solve the ABA problem on indirect spte 
 now. :(
 How about only allow fast page fault to work for direct spte?
 

I'll certainly be more comfortable with that, at least to start with.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] PCI: Introduce INTx check mask API

2012-05-24 Thread Alexey Kardashevskiy
[Found while debugging VFIO on POWER but it is platform independent]

There is a feature in PCI (=2.3?) to mask/unmask INTx via PCI_COMMAND and
PCI_STATUS registers.

And there is some API to support that (commit 
a2e27787f893621c5a6b865acf6b7766f8671328).

I have a network adapter:
0001:00:01.0 Ethernet controller: Chelsio Communications Inc T310 10GbE Single 
Port Adapter
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR- INTx-

pci_intx_mask_supported() reports that the feature is supported for this adapter
BUT the adapter does not set PCI_STATUS_INTERRUPT so 
pci_check_and_set_intx_mask()
never changes PCI_COMMAND and INTx does not work on it when we use it as 
VFIO-PCI device.

If I remove the check of this bit, it works fine as it is called from an 
interrupt handler and
Status bit check is redundant.

Opened a spec:
PCI LOCAL BUS SPECIFICATION, REV. 3.0, Table 6-2: Status Register Bits
===
3   This read-only bit reflects the state of the interrupt in the
device/function. Only when the Interrupt Disable bit in the command
register is a 0 and this Interrupt Status bit is a 1, will the
device’s/function’s INTx# signal be asserted. Setting the Interrupt
   Disable bit to a 1 has no effect on the state of this bit.
===
With this adapter, INTx# is asserted but Status bit is still 0.

Is it mandatory for a device to set Status bit if it supports INTx masking?

2 Alex: if it is mandatory, then we need to be able to disable pci_2_3 in 
VFIO-PCI
somehow.


Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 drivers/pci/pci.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ab6c2a6..ee4c804 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2978,60 +2978,60 @@ EXPORT_SYMBOL_GPL(pci_intx_mask_supported);
 
 static bool pci_check_and_set_intx_mask(struct pci_dev *dev, bool mask)
 {
struct pci_bus *bus = dev-bus;
bool mask_updated = true;
u32 cmd_status_dword;
u16 origcmd, newcmd;
unsigned long flags;
bool irq_pending;
 
/*
 * We do a single dword read to retrieve both command and status.
 * Document assumptions that make this possible.
 */
BUILD_BUG_ON(PCI_COMMAND % 4);
BUILD_BUG_ON(PCI_COMMAND + 2 != PCI_STATUS);
 
raw_spin_lock_irqsave(pci_lock, flags);
 
bus-ops-read(bus, dev-devfn, PCI_COMMAND, 4, cmd_status_dword);
 
irq_pending = (cmd_status_dword  16)  PCI_STATUS_INTERRUPT;
 
/*
 * Check interrupt status register to see whether our device
 * triggered the interrupt (when masking) or the next IRQ is
 * already pending (when unmasking).
 */
-   if (mask != irq_pending) {
+/* if (mask != irq_pending) {
mask_updated = false;
goto done;
-   }
+   }*/
 
origcmd = cmd_status_dword;
newcmd = origcmd  ~PCI_COMMAND_INTX_DISABLE;
if (mask)
newcmd |= PCI_COMMAND_INTX_DISABLE;
if (newcmd != origcmd)
bus-ops-write(bus, dev-devfn, PCI_COMMAND, 2, newcmd);
 
 done:
raw_spin_unlock_irqrestore(pci_lock, flags);
 
return mask_updated;
 }
 
 /**
  * pci_check_and_mask_intx - mask INTx on pending interrupt
  * @dev: the PCI device to operate on
  *
  * Check if the device dev has its INTx line asserted, mask it and
  * return true in that case. False is returned if not interrupt was
  * pending.
  */
 bool pci_check_and_mask_intx(struct pci_dev *dev)
 {
return pci_check_and_set_intx_mask(dev, true);
 }
 EXPORT_SYMBOL_GPL(pci_check_and_mask_intx);
 
-- 
1.7.7.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU 1.1 PATCH v3] Expose CPUID leaf 7 only for -cpu host

2012-05-24 Thread Gleb Natapov
On Mon, May 21, 2012 at 11:27:02AM -0300, Eduardo Habkost wrote:
 Changes v2 - v3;
   - Check for kvm_enabled() before setting cpuid_7_0_ebx_features
 
 Changes v1 - v2:
   - Use kvm_arch_get_supported_cpuid() instead of host_cpuid() on
 cpu_x86_fill_host().
 
   We should use GET_SUPPORTED_CPUID for all bits on -cpu host
   eventually, but I am not changing all the other CPUID leaves because
   we may not be able to test such an intrusive change in time for 1.1.
 
 Description of the bug:
 
 Since QEMU 0.15, the CPUID information on CPUID[EAX=7,ECX=0] is being
 returned unfiltered to the guest, directly from the GET_SUPPORTED_CPUID
 return value.
 
 The problem is that this makes the resulting CPU feature flags
 unpredictable and dependent on the host CPU and kernel version. This
 breaks live-migration badly if migrating from a host CPU that supports
 some features on that CPUID leaf (running a recent kernel) to a kernel
 or host CPU that doesn't support it.
 
 Migration also is incorrect (the virtual CPU changes under the guest's
 feet) if you migrate in the opposite direction (from an old CPU/kernel
 to a new CPU/kernel), but with less serious consequences (guests
 normally query CPUID information only once on boot).
 
 Fortunately, the bug affects only users using cpudefs with level = 7.
 
 The right behavior should be to explicitly enable those features on
 [cpudef] config sections or on the -cpu command-line arguments. Right
 now there is no predefined CPU model on QEMU that has those features:
 the latest Intel model we have is Sandy Bridge.
 
 I would like to get this fixed on 1.1, so I am submitting this patch,
 that enables those features only if -cpu host is being used (as we
 don't have any pre-defined CPU model that actually have those features).
 After 1.1 is released, we can make those features properly configurable
 on [cpudef] and -cpu configuration.
 
 One problem is: with this patch, users with the following setup:
 - Running QEMU 1.0;
 - Using a cpudef having level = 7;
 - Running a kernel that supports the features on CPUID leaf 7; and
 - Running on a CPU that supports some features on CPUID leaf 7
 won't be able to live-migrate to QEMU 1.1. But for these users
 live-migration is already broken (they can't live-migrate to hosts with
 older CPUs or older kernels, already), I don't see how to avoid this
 problem.
 
 Signed-off-by: Eduardo Habkost ehabk...@redhat.com
Looks good to me.

 ---
  target-i386/cpu.c |   22 +++---
  target-i386/cpu.h |2 ++
  2 files changed, 17 insertions(+), 7 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 89b4ac7..388bc5c 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -238,6 +238,8 @@ typedef struct x86_def_t {
  /* Store the results of Centaur's CPUID instructions */
  uint32_t ext4_features;
  uint32_t xlevel2;
 +/* The feature bits on CPUID[EAX=7,ECX=0].EBX */
 +uint32_t cpuid_7_0_ebx_features;
  } x86_def_t;
  
  #define I486_FEATURES (CPUID_FP87 | CPUID_VME | CPUID_PSE)
 @@ -521,6 +523,12 @@ static int cpu_x86_fill_host(x86_def_t *x86_cpu_def)
  x86_cpu_def-ext_features = ecx;
  x86_cpu_def-features = edx;
  
 +if (kvm_enabled()  x86_cpu_def-level = 7) {
 +x86_cpu_def-cpuid_7_0_ebx_features = 
 kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
 +} else {
 +x86_cpu_def-cpuid_7_0_ebx_features = 0;
 +}
 +
  host_cpuid(0x8000, 0, eax, ebx, ecx, edx);
  x86_cpu_def-xlevel = eax;
  
 @@ -1185,6 +1193,7 @@ int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
  env-cpuid_kvm_features = def-kvm_features;
  env-cpuid_svm_features = def-svm_features;
  env-cpuid_ext4_features = def-ext4_features;
 +env-cpuid_7_0_ebx = def-cpuid_7_0_ebx_features;
  env-cpuid_xlevel2 = def-xlevel2;
  object_property_set_int(OBJECT(cpu), (int64_t)def-tsc_khz * 1000,
  tsc-frequency, error);
 @@ -1451,13 +1460,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
 uint32_t count,
  *edx = 0;
  break;
  case 7:
 -if (kvm_enabled()) {
 -KVMState *s = env-kvm_state;
 -
 -*eax = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EAX);
 -*ebx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EBX);
 -*ecx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_ECX);
 -*edx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EDX);
 +/* Structured Extended Feature Flags Enumeration Leaf */
 +if (count == 0) {
 +*eax = 0; /* Maximum ECX value for sub-leaves */
 +*ebx = env-cpuid_7_0_ebx; /* Feature flags */
 +*ecx = 0; /* Reserved */
 +*edx = 0; /* Reserved */
  } else {
  *eax = 0;
  *ebx = 0;
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index b5b9a50..2460f63 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -741,6 

Re: [PATCH v5 6/9] KVM: MMU: fast path of handling guest page fault

2012-05-24 Thread Avi Kivity
On 05/24/2012 09:26 AM, Xiao Guangrong wrote:
 On 05/23/2012 07:34 PM, Avi Kivity wrote:
 
 
  static bool spte_has_volatile_bits(u64 spte)
  {
 +   /*
 +* Always atomicly update spte if it can be updated
 +* out of mmu-lock.
 +*/
 +   if (spte_can_lockless_update(spte))
 +   return true;
 
 
 This is a really subtle point, but is it really needed?
 
 Lockless spte updates should always set the dirty and accessed bits, so
 we won't be overwriting any volatile bits there.
 
 
 
 Avi,
 
 Currently, The spte update/clear paths in mmu-lock think the Dirty bit is
 not volatile if the spte is readonly. Then the Dirty bit caused by
 lockless update can be lost.
 

Maybe it's better to change that.  In fact, changing

if ((spte  shadow_accessed_mask) 
  (!is_writable_pte(spte) || (spte  shadow_dirty_mask)))
return false;

to

if (~spte  (shadow_accessed_mask | shadow_dirty_mask))
return false;

is almost the same thing - we miss the case where the page is COW or
shadowed though.

If we release the page as dirty, as below, perhaps the whole thing
doesn't matter; the mm must drop spte.w (or spte.d) before it needs to
access spte.d again.


 And, for tlb flush:
 
 |* If we overwrite a writable spte with a read-only one we
 |* should flush remote TLBs. Otherwise rmap_write_protect
 |* will find a read-only spte, even though the writable spte
 |* might be cached on a CPU's TLB.
 |*/
 |   if (is_writable_pte(entry)  !is_writable_pte(*sptep))
 |kvm_flush_remote_tlbs(vcpu-kvm);
 
 Atomically update spte can help us to get a stable is_writable_pte().

Why is it unstable? mmu_set_spte() before cleared SPTE_MMU_WRITEABLE, so
the lockless path will keep its hands off *spte.

 
 
 +
 if (!shadow_accessed_mask)
 return false;

 @@ -498,13 +517,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 return ret;
 }

 -   new_spte |= old_spte  shadow_dirty_mask;
 -
 -   mask = shadow_accessed_mask;
 -   if (is_writable_pte(old_spte))
 -   mask |= shadow_dirty_mask;
 -
 -   if (!spte_has_volatile_bits(old_spte) || (new_spte  mask) == mask)
 +   if (!spte_has_volatile_bits(old_spte))
 __update_clear_spte_fast(sptep, new_spte);
 else
 old_spte = __update_clear_spte_slow(sptep, new_spte);
 
 
 It looks like the old code is bad.. why can we ignore volatile bits in
 the old spte?  Suppose pfn is changing?
 
 
 
 /* Rules for using mmu_spte_update:
  * Update the state bits, it means the mapped pfn is not changged.
 
 If pfn is changed, we should clear spte first, then set the spte to
 the new pfn, in kvm_set_pte_rmapp(), we have:
 
 | mmu_spte_clear_track_bits(sptep);
 | mmu_spte_set(sptep, new_spte);

Okay, thanks.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RfC PATCH 0/2] vga: make ram size configurable

2012-05-24 Thread Gerd Hoffmann
  Hi,

This is part of the plan to vanish the remaining differences between
qemu and qemu-kvm.  qemu has 8 MB vram, whereas qemu-kvm has 16 MB.

Making it configurable allows to keep qemu's default for compatibility
reasons while satisfying qemu-kvm users too.  Also adds new options like
reducing video memory (1 MB is minimum) for textmode-only guests or qemu
scalability tests.

Comments?

cheers,
  Gerd

Gerd Hoffmann (2):
  vga: raise xres+yres limits
  vga: make vram size configurable

 hw/cirrus_vga.c |8 ++--
 hw/qxl.c|5 -
 hw/vga-isa.c|8 +++-
 hw/vga-pci.c|8 +++-
 hw/vga.c|   13 ++---
 hw/vga_int.h|8 
 hw/vmware_vga.c |   13 ++---
 7 files changed, 48 insertions(+), 15 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RfC PATCH 1/2] vga: raise xres+yres limits

2012-05-24 Thread Gerd Hoffmann
The vgabios will check whenever any given video mode will fit into the
given video memory before adding it to the list of available modes, so
there is no need to keep xmax * ymax * 32bpp lower than VGA_RAM_SIZE.

Lets raise the limits a bit.  Should be good for a few years, display
sizes are not growing that fast.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/vga_int.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/vga_int.h b/hw/vga_int.h
index 7685b2b..297c2f1 100644
--- a/hw/vga_int.h
+++ b/hw/vga_int.h
@@ -31,8 +31,8 @@
 /* bochs VBE support */
 #define CONFIG_BOCHS_VBE
 
-#define VBE_DISPI_MAX_XRES  1600
-#define VBE_DISPI_MAX_YRES  1200
+#define VBE_DISPI_MAX_XRES  16000
+#define VBE_DISPI_MAX_YRES  12000
 #define VBE_DISPI_MAX_BPP   32
 
 #define VBE_DISPI_INDEX_ID  0x0
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RfC PATCH 2/2] vga: make vram size configurable

2012-05-24 Thread Gerd Hoffmann
Zap the global VGA_RAM_SIZE #define, make the vga ram size configurable
for standard vga and vmware vga.  cirrus and qxl are left with a fixed
size (and private VGA_RAM_SIZE #define) for now.

qxl needs some non-trivial adjustments in the mode list handling deal
with a runtime-configurable size, which calls for a separate qxl patch.

cirrus emulates cards which have 2 MB (isa) and 4 MB (pci), so I guess
it would make sense to use these sizes.  That change would break
migration though, so I left it fixed at 8 MB size.  Making it
configurabls is pretty pointless for cirrus as we have to match real
hardware.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/cirrus_vga.c |8 ++--
 hw/qxl.c|5 -
 hw/vga-isa.c|8 +++-
 hw/vga-pci.c|8 +++-
 hw/vga.c|   13 ++---
 hw/vga_int.h|4 ++--
 hw/vmware_vga.c |   13 ++---
 7 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index afedaa4..623dd68 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -43,6 +43,8 @@
 //#define DEBUG_CIRRUS
 //#define DEBUG_BITBLT
 
+#define VGA_RAM_SIZE (8192 * 1024)
+
 /***
  *
  *  definitions
@@ -2891,7 +2893,8 @@ static int vga_initfn(ISADevice *dev)
 ISACirrusVGAState *d = DO_UPCAST(ISACirrusVGAState, dev, dev);
 VGACommonState *s = d-cirrus_vga.vga;
 
-vga_common_init(s, VGA_RAM_SIZE);
+s-vram_size_mb = VGA_RAM_SIZE  20;
+vga_common_init(s);
 cirrus_init_common(d-cirrus_vga, CIRRUS_ID_CLGD5430, 0,
isa_address_space(dev));
 s-ds = graphic_console_init(s-update, s-invalidate,
@@ -2933,7 +2936,8 @@ static int pci_cirrus_vga_initfn(PCIDevice *dev)
  int16_t device_id = pc-device_id;
 
  /* setup VGA */
- vga_common_init(s-vga, VGA_RAM_SIZE);
+ s-vga.vram_size_mb = VGA_RAM_SIZE  20;
+ vga_common_init(s-vga);
  cirrus_init_common(s, device_id, 1, pci_address_space(dev));
  s-vga.ds = graphic_console_init(s-vga.update, s-vga.invalidate,
   s-vga.screen_dump, s-vga.text_update,
diff --git a/hw/qxl.c b/hw/qxl.c
index 3da3399..9a32f14 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -27,6 +27,8 @@
 
 #include qxl.h
 
+#define VGA_RAM_SIZE (8192 * 1024)
+
 /*
  * NOTE: SPICE_RING_PROD_ITEM accesses memory on the pci bar and as
  * such can be changed by the guest, so to avoid a guest trigerrable
@@ -1835,7 +1837,8 @@ static int qxl_init_primary(PCIDevice *dev)
 
 qxl-id = 0;
 qxl_init_ramsize(qxl, 32);
-vga_common_init(vga, qxl-vga.vram_size);
+vga-vram_size_mb = qxl-vga.vram_size  20;
+vga_common_init(vga);
 vga_init(vga, pci_address_space(dev), pci_address_space_io(dev), false);
 portio_list_init(qxl_vga_port_list, qxl_vga_portio_list, vga, vga);
 portio_list_add(qxl_vga_port_list, pci_address_space_io(dev), 0x3b0);
diff --git a/hw/vga-isa.c b/hw/vga-isa.c
index 4bcc4db..d290473 100644
--- a/hw/vga-isa.c
+++ b/hw/vga-isa.c
@@ -49,7 +49,7 @@ static int vga_initfn(ISADevice *dev)
 MemoryRegion *vga_io_memory;
 const MemoryRegionPortio *vga_ports, *vbe_ports;
 
-vga_common_init(s, VGA_RAM_SIZE);
+vga_common_init(s);
 s-legacy_address_space = isa_address_space(dev);
 vga_io_memory = vga_init_io(s, vga_ports, vbe_ports);
 isa_register_portio_list(dev, 0x3b0, vga_ports, s, vga);
@@ -69,6 +69,11 @@ static int vga_initfn(ISADevice *dev)
 return 0;
 }
 
+static Property vga_isa_properties[] = {
+DEFINE_PROP_UINT32(vgamem_mb, ISAVGAState, state.vram_size_mb, 8),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void vga_class_initfn(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -76,6 +81,7 @@ static void vga_class_initfn(ObjectClass *klass, void *data)
 ic-init = vga_initfn;
 dc-reset = vga_reset_isa;
 dc-vmsd = vmstate_vga_common;
+dc-props = vga_isa_properties;
 }
 
 static TypeInfo vga_info = {
diff --git a/hw/vga-pci.c b/hw/vga-pci.c
index 465b643..0848126 100644
--- a/hw/vga-pci.c
+++ b/hw/vga-pci.c
@@ -53,7 +53,7 @@ static int pci_vga_initfn(PCIDevice *dev)
  VGACommonState *s = d-vga;
 
  // vga + console init
- vga_common_init(s, VGA_RAM_SIZE);
+ vga_common_init(s);
  vga_init(s, pci_address_space(dev), pci_address_space_io(dev), true);
 
  s-ds = graphic_console_init(s-update, s-invalidate,
@@ -75,6 +75,11 @@ DeviceState *pci_vga_init(PCIBus *bus)
 return pci_create_simple(bus, -1, VGA)-qdev;
 }
 
+static Property vga_pci_properties[] = {
+DEFINE_PROP_UINT32(vgamem_mb, PCIVGAState, vga.vram_size_mb, 8),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void vga_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -87,6 +92,7 @@ static void vga_class_init(ObjectClass *klass, void *data)
 k-device_id = PCI_DEVICE_ID_QEMU_VGA;
 k-class_id = PCI_CLASS_DISPLAY_VGA;
 dc-vmsd = 

Re: [PATCH v5 6/9] KVM: MMU: fast path of handling guest page fault

2012-05-24 Thread Xiao Guangrong
On 05/24/2012 04:25 PM, Avi Kivity wrote:

 On 05/24/2012 09:26 AM, Xiao Guangrong wrote:
 On 05/23/2012 07:34 PM, Avi Kivity wrote:


  static bool spte_has_volatile_bits(u64 spte)
  {
 +  /*
 +   * Always atomicly update spte if it can be updated
 +   * out of mmu-lock.
 +   */
 +  if (spte_can_lockless_update(spte))
 +  return true;


 This is a really subtle point, but is it really needed?

 Lockless spte updates should always set the dirty and accessed bits, so
 we won't be overwriting any volatile bits there.



 Avi,

 Currently, The spte update/clear paths in mmu-lock think the Dirty bit is
 not volatile if the spte is readonly. Then the Dirty bit caused by
 lockless update can be lost.

 
 Maybe it's better to change that.  In fact, changing
 
   if ((spte  shadow_accessed_mask) 
 (!is_writable_pte(spte) || (spte  shadow_dirty_mask)))
   return false;
 
 to
 
   if (~spte  (shadow_accessed_mask | shadow_dirty_mask))
   return false;
 


Okay, i like this way.

 is almost the same thing - we miss the case where the page is COW or
 shadowed though.
 
 If we release the page as dirty, as below, perhaps the whole thing
 doesn't matter; the mm must drop spte.w (or spte.d) before it needs to
 access spte.d again.
 
 
 And, for tlb flush:

 |* If we overwrite a writable spte with a read-only one we
 |* should flush remote TLBs. Otherwise rmap_write_protect
 |* will find a read-only spte, even though the writable spte
 |* might be cached on a CPU's TLB.
 |*/
 |   if (is_writable_pte(entry)  !is_writable_pte(*sptep))
 |kvm_flush_remote_tlbs(vcpu-kvm);

 Atomically update spte can help us to get a stable is_writable_pte().
 
 Why is it unstable? mmu_set_spte() before cleared SPTE_MMU_WRITEABLE, so
 the lockless path will keep its hands off *spte.
 


Since dirty-log path does not clear SPTE_MMU_WRITEABLE bit, so in the
system, we have this kind of spte which is readonly but SPTE_MMU_WRITEABLE
is set.

In mmu_set_spte(), we may read a read-only spte (which indicates the TLb need
not be flushed), but it can be marked writeable by fast page fault, then the
TLB is dirty.

If you do not like the way in this patch, we can change it to:

   if (spte_can_be_wriable(entry)  !is_writable_pte(*sptep))
kvm_flush_remote_tlbs(vcpu-kvm);

And actually, this kind of TLB flush can be delay until page table protection
happen, we can simply mask tlb dirty after your patchset of flush tlb out
of mmu lock


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


which branch of kvm.git should I do regular testing against?

2012-05-24 Thread Ren, Yongjie
Hi Avi,
Our team spare some effort in regular nightly testing against KVM upstream.
We're using master branch now and all the test reports I sent out are based on 
master branch.
Which branch of kvm.git should we do regular testing against?
Your suggestion?

I know next branch contains latest KVM patches. 
But I found some recent KVM patches in master branch never existed in next 
branch.
And, the next branch is based on linux3.4-rc3, while master branch is based on 
linux3.4-rc7.
Another question is whether the patches in next branch will be finally merged 
into master branch?


Best Regards,
 Yongjie (Jay)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: which branch of kvm.git should I do regular testing against?

2012-05-24 Thread Avi Kivity
On 05/24/2012 12:24 PM, Ren, Yongjie wrote:
 Hi Avi,
 Our team spare some effort in regular nightly testing against KVM upstream.
 We're using master branch now and all the test reports I sent out are based 
 on master branch.
 Which branch of kvm.git should we do regular testing against?
 Your suggestion?

kvm.git next.  In theory kvm.git auto-next is even better, but we
sometimes forget to update it.

 
 I know next branch contains latest KVM patches. 
 But I found some recent KVM patches in master branch never existed in next 
 branch.
 And, the next branch is based on linux3.4-rc3, while master branch is based 
 on linux3.4-rc7.

'master' contains fixes, 'next' contains updates queued for the merge
window. 'auto-next' should contain both, plus the latest upstream.


 Another question is whether the patches in next branch will be finally merged 
 into master branch?
 

During the merge window.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: which branch of kvm.git should I do regular testing against?

2012-05-24 Thread Ren, Yongjie
 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Thursday, May 24, 2012 5:29 PM
 To: Ren, Yongjie
 Cc: KVM; Marcelo Tosatti
 Subject: Re: which branch of kvm.git should I do regular testing against?
 
 On 05/24/2012 12:24 PM, Ren, Yongjie wrote:
  Hi Avi,
  Our team spare some effort in regular nightly testing against KVM
 upstream.
  We're using master branch now and all the test reports I sent out are
 based on master branch.
  Which branch of kvm.git should we do regular testing against?
  Your suggestion?
 
 kvm.git next.  In theory kvm.git auto-next is even better, but we
 sometimes forget to update it.
 
Thanks. Get it. I'll use kvm.git next.
As for qemu-kvm.git, master or next branch?

 
  I know next branch contains latest KVM patches.
  But I found some recent KVM patches in master branch never existed in
 next branch.
  And, the next branch is based on linux3.4-rc3, while master branch is
 based on linux3.4-rc7.
 
 'master' contains fixes, 'next' contains updates queued for the merge
 window. 'auto-next' should contain both, plus the latest upstream.
 
Get it.
 
  Another question is whether the patches in next branch will be finally
 merged into master branch?
 
 
 During the merge window.
 
 
 --
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: introduce readonly memory region

2012-05-24 Thread Xiao Guangrong
In current code, if we map a readonly memory space from host to guest
and the page is not currently mapped in the host, we will get a fault-pfn
and async is not allowed, then the vm will crash

Address Avi's idea, we introduce readonly memory region to map ROM/ROMD
to the guest

Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 Documentation/virtual/kvm/api.txt |9 +--
 include/linux/kvm.h   |5 ++-
 virt/kvm/kvm_main.c   |   43 ++---
 3 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 9301266..e2a82c3 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -857,7 +857,8 @@ struct kvm_userspace_memory_region {
 };

 /* for kvm_memory_region::flags */
-#define KVM_MEM_LOG_DIRTY_PAGES  1UL
+#define KVM_MEM_LOG_DIRTY_PAGES1UL
+#define KVM_MEM_READ_ONLY  (1UL  2)

 This ioctl allows the user to create or modify a guest physical memory
 slot.  When changing an existing slot, it may be moved in the guest
@@ -873,9 +874,11 @@ It is recommended that the lower 21 bits of 
guest_phys_addr and userspace_addr
 be identical.  This allows large pages in the guest to be backed by large
 pages in the host.

-The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
+The flags field supports two flags, KVM_MEM_LOG_DIRTY_PAGES, which
 instructs kvm to keep track of writes to memory within the slot.  See
-the KVM_GET_DIRTY_LOG ioctl.
+the KVM_GET_DIRTY_LOG ioctl. Another flag is KVM_MEM_READ_ONLY, which
+indicates the guest memory is read-only, that means, guest is only allowed
+to read it.

 When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
 region are automatically reflected into the guest.  For example, an mmap()
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 09f2b3a..d178e3d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -102,8 +102,9 @@ struct kvm_userspace_memory_region {
 };

 /* for kvm_memory_region::flags */
-#define KVM_MEM_LOG_DIRTY_PAGES  1UL
-#define KVM_MEMSLOT_INVALID  (1UL  1)
+#define KVM_MEM_LOG_DIRTY_PAGES1UL
+#define KVM_MEMSLOT_INVALID(1UL  1)
+#define KVM_MEM_READ_ONLY  (1UL  2)

 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..27283e4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1009,10 +1009,11 @@ out:
return size;
 }

-static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
-gfn_t *nr_pages)
+static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
+  gfn_t *nr_pages, bool write)
 {
-   if (!slot || slot-flags  KVM_MEMSLOT_INVALID)
+   if (!slot || slot-flags  KVM_MEMSLOT_INVALID ||
+ ((slot-flags  KVM_MEM_READ_ONLY)  write))
return bad_hva();

if (nr_pages)
@@ -1021,6 +1022,17 @@ static unsigned long gfn_to_hva_many(struct 
kvm_memory_slot *slot, gfn_t gfn,
return gfn_to_hva_memslot(slot, gfn);
 }

+static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
+gfn_t *nr_pages)
+{
+   return __gfn_to_hva_many(slot, gfn, nr_pages, true);
+}
+
+unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool write)
+{
+   return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, write);
+}
+
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 {
return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
@@ -1053,6 +1065,21 @@ static inline int check_user_page_hwpoison(unsigned long 
addr)
return rc == -EHWPOISON;
 }

+static bool vma_is_avalid(struct vm_area_struct *vma, bool write_fault)
+{
+   if (write_fault) {
+   if (unlikely(!(vma-vm_flags  VM_WRITE)))
+   return false;
+
+   return true;
+   }
+
+   if (unlikely(!(vma-vm_flags  (VM_READ | VM_EXEC | VM_WRITE
+   return false;
+
+   return true;
+}
+
 static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic,
bool *async, bool write_fault, bool *writable)
 {
@@ -1076,7 +1103,6 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long 
addr, bool atomic,

if (writable)
*writable = write_fault;
-
if (async) {
down_read(current-mm-mmap_sem);
npages = get_user_page_nowait(current, current-mm,
@@ -1123,8 +1149,9 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long 
addr, bool atomic,
vma-vm_pgoff;
BUG_ON(!kvm_is_mmio_pfn(pfn));
} else {
-   if (async  

Re: which branch of kvm.git should I do regular testing against?

2012-05-24 Thread Avi Kivity
On 05/24/2012 12:33 PM, Ren, Yongjie wrote:
 
 On 05/24/2012 12:24 PM, Ren, Yongjie wrote:
  Hi Avi,
  Our team spare some effort in regular nightly testing against KVM
 upstream.
  We're using master branch now and all the test reports I sent out are
 based on master branch.
  Which branch of kvm.git should we do regular testing against?
  Your suggestion?
 
 kvm.git next.  In theory kvm.git auto-next is even better, but we
 sometimes forget to update it.
 
 Thanks. Get it. I'll use kvm.git next.
 As for qemu-kvm.git, master or next branch?

Unlike kvm.git, these should be pretty close.  'next' is untested,
'master' is tested, they are synced after testing.  I suggest 'master'.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: which branch of kvm.git should I do regular testing against?

2012-05-24 Thread Ren, Yongjie
 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Thursday, May 24, 2012 5:46 PM
 To: Ren, Yongjie
 Cc: KVM; Marcelo Tosatti
 Subject: Re: which branch of kvm.git should I do regular testing against?
 
 On 05/24/2012 12:33 PM, Ren, Yongjie wrote:
 
  On 05/24/2012 12:24 PM, Ren, Yongjie wrote:
   Hi Avi,
   Our team spare some effort in regular nightly testing against KVM
  upstream.
   We're using master branch now and all the test reports I sent out are
  based on master branch.
   Which branch of kvm.git should we do regular testing against?
   Your suggestion?
 
  kvm.git next.  In theory kvm.git auto-next is even better, but we
  sometimes forget to update it.
 
  Thanks. Get it. I'll use kvm.git next.
  As for qemu-kvm.git, master or next branch?
 
 Unlike kvm.git, these should be pretty close.  'next' is untested,
 'master' is tested, they are synced after testing.  I suggest 'master'.
 
Get it. Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: introduce readonly memory region

2012-05-24 Thread Gleb Natapov
On Thu, May 24, 2012 at 05:24:34PM +0800, Xiao Guangrong wrote:
 In current code, if we map a readonly memory space from host to guest
 and the page is not currently mapped in the host, we will get a fault-pfn
 and async is not allowed, then the vm will crash
 
 Address Avi's idea, we introduce readonly memory region to map ROM/ROMD
 to the guest
 
As far as I can tell this implements only ROMD. i.e write access to read
only slot will generate IO exit.

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  Documentation/virtual/kvm/api.txt |9 +--
  include/linux/kvm.h   |5 ++-
  virt/kvm/kvm_main.c   |   43 
 ++---
  3 files changed, 44 insertions(+), 13 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 9301266..e2a82c3 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -857,7 +857,8 @@ struct kvm_userspace_memory_region {
  };
 
  /* for kvm_memory_region::flags */
 -#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 +#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 +#define KVM_MEM_READ_ONLY(1UL  2)
 
  This ioctl allows the user to create or modify a guest physical memory
  slot.  When changing an existing slot, it may be moved in the guest
 @@ -873,9 +874,11 @@ It is recommended that the lower 21 bits of 
 guest_phys_addr and userspace_addr
  be identical.  This allows large pages in the guest to be backed by large
  pages in the host.
 
 -The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
 +The flags field supports two flags, KVM_MEM_LOG_DIRTY_PAGES, which
  instructs kvm to keep track of writes to memory within the slot.  See
 -the KVM_GET_DIRTY_LOG ioctl.
 +the KVM_GET_DIRTY_LOG ioctl. Another flag is KVM_MEM_READ_ONLY, which
 +indicates the guest memory is read-only, that means, guest is only allowed
 +to read it.
 
  When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
  region are automatically reflected into the guest.  For example, an mmap()
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 09f2b3a..d178e3d 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -102,8 +102,9 @@ struct kvm_userspace_memory_region {
  };
 
  /* for kvm_memory_region::flags */
 -#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 -#define KVM_MEMSLOT_INVALID  (1UL  1)
 +#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 +#define KVM_MEMSLOT_INVALID  (1UL  1)
 +#define KVM_MEM_READ_ONLY(1UL  2)
 
  /* for KVM_IRQ_LINE */
  struct kvm_irq_level {
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 7e14068..27283e4 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1009,10 +1009,11 @@ out:
   return size;
  }
 
 -static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 -  gfn_t *nr_pages)
 +static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t 
 gfn,
 +gfn_t *nr_pages, bool write)
  {
 - if (!slot || slot-flags  KVM_MEMSLOT_INVALID)
 + if (!slot || slot-flags  KVM_MEMSLOT_INVALID ||
 +   ((slot-flags  KVM_MEM_READ_ONLY)  write))
   return bad_hva();
 
   if (nr_pages)
 @@ -1021,6 +1022,17 @@ static unsigned long gfn_to_hva_many(struct 
 kvm_memory_slot *slot, gfn_t gfn,
   return gfn_to_hva_memslot(slot, gfn);
  }
 
 +static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 +  gfn_t *nr_pages)
 +{
 + return __gfn_to_hva_many(slot, gfn, nr_pages, true);
 +}
 +
 +unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool write)
 +{
 + return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, write);
 +}
 +
  unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
  {
   return gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL);
 @@ -1053,6 +1065,21 @@ static inline int check_user_page_hwpoison(unsigned 
 long addr)
   return rc == -EHWPOISON;
  }
 
 +static bool vma_is_avalid(struct vm_area_struct *vma, bool write_fault)
 +{
 + if (write_fault) {
 + if (unlikely(!(vma-vm_flags  VM_WRITE)))
 + return false;
 +
 + return true;
 + }
 +
 + if (unlikely(!(vma-vm_flags  (VM_READ | VM_EXEC | VM_WRITE
 + return false;
 +
 + return true;
 +}
 +
  static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic,
   bool *async, bool write_fault, bool *writable)
  {
 @@ -1076,7 +1103,6 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long 
 addr, bool atomic,
 
   if (writable)
   *writable = write_fault;
 -
   if (async) {
   down_read(current-mm-mmap_sem);
   npages = get_user_page_nowait(current, current-mm,
 @@ -1123,8 +1149,9 @@ 

Re: [PATCH v2 15/15] net: invoke qemu_can_send_packet only before net queue sending function

2012-05-24 Thread Paolo Bonzini
Il 24/05/2012 06:05, Zhi Yong Wu ha scritto:
 On Thu, May 24, 2012 at 12:00 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 23/05/2012 17:14, zwu.ker...@gmail.com ha scritto:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net/queue.c  |4 ++--
  net/slirp.c  |7 ---
  net/tap.c|2 +-
  slirp/if.c   |5 -
  slirp/libslirp.h |1 -
  5 files changed, 3 insertions(+), 16 deletions(-)

 diff --git a/net/queue.c b/net/queue.c
 index 0afd783..d2e57de 100644
 --- a/net/queue.c
 +++ b/net/queue.c
 @@ -176,7 +176,7 @@ ssize_t qemu_net_queue_send(NetQueue *queue,
  {
  ssize_t ret;

 -if (queue-delivering) {
 +if (queue-delivering || !qemu_can_send_packet(sender)) {
  return qemu_net_queue_append(queue, sender, flags, data, size, 
 NULL);
  }

 @@ -200,7 +200,7 @@ ssize_t qemu_net_queue_send_iov(NetQueue *queue,
  {
  ssize_t ret;

 -if (queue-delivering) {
 +if (queue-delivering || !qemu_can_send_packet(sender)) {
  return qemu_net_queue_append_iov(queue, sender, flags, iov, 
 iovcnt, NULL);
  }

 diff --git a/net/slirp.c b/net/slirp.c
 index a6ede2b..248f7ff 100644
 --- a/net/slirp.c
 +++ b/net/slirp.c
 @@ -96,13 +96,6 @@ static void slirp_smb_cleanup(SlirpState *s);
  static inline void slirp_smb_cleanup(SlirpState *s) { }
  #endif

 -int slirp_can_output(void *opaque)
 -{
 -SlirpState *s = opaque;
 -
 -return qemu_can_send_packet(s-nc);
 -}
 -
  void slirp_output(void *opaque, const uint8_t *pkt, int pkt_len)
  {
  SlirpState *s = opaque;
 diff --git a/net/tap.c b/net/tap.c
 index 65f45b8..7b1992b 100644
 --- a/net/tap.c
 +++ b/net/tap.c
 @@ -210,7 +210,7 @@ static void tap_send(void *opaque)
  if (size == 0) {
  tap_read_poll(s, 0);
  }
 -} while (size  0  qemu_can_send_packet(s-nc));
 +} while (size  0);

 Can you explain this?  Also, have you benchmarked the change to see what
 Its code execution flow is like below:
 tap_send -- qemu_send_packet_async
 -qemu_send_packet_async_with_flags -qemu_net_queue_send
 
 So it will finally invoke qemu_can_send_packet to determine if it can
 send packets. this code change delay this determination.

But you will copy packets uselessly.  The code before the patch simply
left them on the tap file descriptor.  This is better because it
involves the kernel in flow control.  You are introducing bufferbloat.

 Also, can you explain why you didn't implement this?
 Hub can now do its own flow control if it provides its can_recieve.

But you didn't add can_receive.

 Why need we add some counts to track in-flight packets?

To implement can_receive.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: introduce readonly memory region

2012-05-24 Thread Avi Kivity
On 05/24/2012 12:59 PM, Gleb Natapov wrote:
 On Thu, May 24, 2012 at 05:24:34PM +0800, Xiao Guangrong wrote:
 In current code, if we map a readonly memory space from host to guest
 and the page is not currently mapped in the host, we will get a fault-pfn
 and async is not allowed, then the vm will crash
 
 Address Avi's idea, we introduce readonly memory region to map ROM/ROMD
 to the guest
 
 As far as I can tell this implements only ROMD. i.e write access to read
 only slot will generate IO exit.

Which userspace can then ignore.  The question is whether writes to ROM
are frequent, and whether the performance in that case matters.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Provide fast path for rep ins emulation if possible.

2012-05-24 Thread Roedel, Joerg
On Wed, May 23, 2012 at 05:49:31PM +0300, Avi Kivity wrote:
 On 05/23/2012 05:40 PM, Avi Kivity wrote:
  On 05/23/2012 05:08 PM, Gleb Natapov wrote:
  If decode assists are not available, we still need to emulate, see 15.33.5.
  
 
 Joerg, the 2010 version of the manual says that the effective segment
 (10:12) is only available with decode assists.  The 2012 version says
 it's unconditional. What's correct?

It is still conditional and only available with decode-assists. I will
talk to the APM authors to clarify this in the documentation.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Provide fast path for rep ins emulation if possible.

2012-05-24 Thread Avi Kivity
On 05/24/2012 01:34 PM, Roedel, Joerg wrote:
 On Wed, May 23, 2012 at 05:49:31PM +0300, Avi Kivity wrote:
 On 05/23/2012 05:40 PM, Avi Kivity wrote:
  On 05/23/2012 05:08 PM, Gleb Natapov wrote:
  If decode assists are not available, we still need to emulate, see 15.33.5.
  
 
 Joerg, the 2010 version of the manual says that the effective segment
 (10:12) is only available with decode assists.  The 2012 version says
 it's unconditional. What's correct?
 
 It is still conditional and only available with decode-assists. I will
 talk to the APM authors to clarify this in the documentation.

Thanks.  As it happens it doesn't matter for INS emulation, only OUTS,
unless other bits in that word also depend on decode assists.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Provide fast path for rep ins emulation if possible.

2012-05-24 Thread Roedel, Joerg
On Thu, May 24, 2012 at 01:36:51PM +0300, Avi Kivity wrote:
 On 05/24/2012 01:34 PM, Roedel, Joerg wrote:
  On Wed, May 23, 2012 at 05:49:31PM +0300, Avi Kivity wrote:
  On 05/23/2012 05:40 PM, Avi Kivity wrote:
   On 05/23/2012 05:08 PM, Gleb Natapov wrote:
   If decode assists are not available, we still need to emulate, see 
   15.33.5.
   
  
  Joerg, the 2010 version of the manual says that the effective segment
  (10:12) is only available with decode assists.  The 2012 version says
  it's unconditional. What's correct?
  
  It is still conditional and only available with decode-assists. I will
  talk to the APM authors to clarify this in the documentation.
 
 Thanks.  As it happens it doesn't matter for INS emulation, only OUTS,
 unless other bits in that word also depend on decode assists.

Doesn't look like it. All other bits EXITINFO1 bits for the IOIO
intercept are documented from the very ancient beginnings of SVM. So the
segment number is the only addition.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/15] net: invoke qemu_can_send_packet only before net queue sending function

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 6:07 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/05/2012 06:05, Zhi Yong Wu ha scritto:
 On Thu, May 24, 2012 at 12:00 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 23/05/2012 17:14, zwu.ker...@gmail.com ha scritto:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net/queue.c      |    4 ++--
  net/slirp.c      |    7 ---
  net/tap.c        |    2 +-
  slirp/if.c       |    5 -
  slirp/libslirp.h |    1 -
  5 files changed, 3 insertions(+), 16 deletions(-)

 diff --git a/net/queue.c b/net/queue.c
 index 0afd783..d2e57de 100644
 --- a/net/queue.c
 +++ b/net/queue.c
 @@ -176,7 +176,7 @@ ssize_t qemu_net_queue_send(NetQueue *queue,
  {
      ssize_t ret;

 -    if (queue-delivering) {
 +    if (queue-delivering || !qemu_can_send_packet(sender)) {
          return qemu_net_queue_append(queue, sender, flags, data, size, 
 NULL);
      }

 @@ -200,7 +200,7 @@ ssize_t qemu_net_queue_send_iov(NetQueue *queue,
  {
      ssize_t ret;

 -    if (queue-delivering) {
 +    if (queue-delivering || !qemu_can_send_packet(sender)) {
          return qemu_net_queue_append_iov(queue, sender, flags, iov, 
 iovcnt, NULL);
      }

 diff --git a/net/slirp.c b/net/slirp.c
 index a6ede2b..248f7ff 100644
 --- a/net/slirp.c
 +++ b/net/slirp.c
 @@ -96,13 +96,6 @@ static void slirp_smb_cleanup(SlirpState *s);
  static inline void slirp_smb_cleanup(SlirpState *s) { }
  #endif

 -int slirp_can_output(void *opaque)
 -{
 -    SlirpState *s = opaque;
 -
 -    return qemu_can_send_packet(s-nc);
 -}
 -
  void slirp_output(void *opaque, const uint8_t *pkt, int pkt_len)
  {
      SlirpState *s = opaque;
 diff --git a/net/tap.c b/net/tap.c
 index 65f45b8..7b1992b 100644
 --- a/net/tap.c
 +++ b/net/tap.c
 @@ -210,7 +210,7 @@ static void tap_send(void *opaque)
          if (size == 0) {
              tap_read_poll(s, 0);
          }
 -    } while (size  0  qemu_can_send_packet(s-nc));
 +    } while (size  0);

 Can you explain this?  Also, have you benchmarked the change to see what
 Its code execution flow is like below:
 tap_send -- qemu_send_packet_async
 -qemu_send_packet_async_with_flags -qemu_net_queue_send

 So it will finally invoke qemu_can_send_packet to determine if it can
 send packets. this code change delay this determination.

 But you will copy packets uselessly.  The code before the patch simply
 left them on the tap file descriptor.  This is better because it
 involves the kernel in flow control.  You are introducing bufferbloat.
You are correct, but can_send_packet will be invoked twice for one
packet delivery.

 Also, can you explain why you didn't implement this?
 Hub can now do its own flow control if it provides its can_recieve.

 But you didn't add can_receive.

 Why need we add some counts to track in-flight packets?

 To implement can_receive.
Let me try.

 Paolo



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/15] net: invoke qemu_can_send_packet only before net queue sending function

2012-05-24 Thread Paolo Bonzini
Il 24/05/2012 13:58, Zhi Yong Wu ha scritto:
  But you will copy packets uselessly.  The code before the patch simply
  left them on the tap file descriptor.  This is better because it
  involves the kernel in flow control.  You are introducing bufferbloat.
 You are correct, but can_send_packet will be invoked twice for one
 packet delivery.

Doesn't matter, it's cheap.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-05-24 Thread Jan Kiszka
On 2012-05-24 04:44, Alexey Kardashevskiy wrote:
 [Found while debugging VFIO on POWER but it is platform independent]
 
 There is a feature in PCI (=2.3?) to mask/unmask INTx via PCI_COMMAND and
 PCI_STATUS registers.

Yes, 2.3 introduced this. Masking is done via command register, checking
if the source was the PCI in question via the status register. The
latter is important for supporting IRQ sharing - and that's why we
introduced this masking API to the PCI layer.

 
 And there is some API to support that (commit 
 a2e27787f893621c5a6b865acf6b7766f8671328).
 
 I have a network adapter:
 0001:00:01.0 Ethernet controller: Chelsio Communications Inc T310 10GbE 
 Single Port Adapter
   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
 Stepping- SERR+ FastB2B- DisINTx-
   Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
 MAbort- SERR- PERR- INTx-
 
 pci_intx_mask_supported() reports that the feature is supported for this 
 adapter
 BUT the adapter does not set PCI_STATUS_INTERRUPT so 
 pci_check_and_set_intx_mask()
 never changes PCI_COMMAND and INTx does not work on it when we use it as 
 VFIO-PCI device.
 
 If I remove the check of this bit, it works fine as it is called from an 
 interrupt handler and
 Status bit check is redundant.
 
 Opened a spec:
 PCI LOCAL BUS SPECIFICATION, REV. 3.0, Table 6-2: Status Register Bits
 ===
 3 This read-only bit reflects the state of the interrupt in the
 device/function. Only when the Interrupt Disable bit in the command
 register is a 0 and this Interrupt Status bit is a 1, will the
 device’s/function’s INTx# signal be asserted. Setting the Interrupt
Disable bit to a 1 has no effect on the state of this bit.
 ===
 With this adapter, INTx# is asserted but Status bit is still 0.
 
 Is it mandatory for a device to set Status bit if it supports INTx masking?
 
 2 Alex: if it is mandatory, then we need to be able to disable pci_2_3 in 
 VFIO-PCI
 somehow.

Since PCI 2.3, this bit is mandatory, and it should be independent of
the masking bit. The question is, if your device is supposed to support
2.3, thus is just buggy, or if our detection algorithm is unreliable. It
basically builds on the assumption that, if we can flip the mask bit,
the feature should be present. I guess that is the best we can do. Maybe
we can augment this with a blacklist of devices that support flipping
without actually providing the feature.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: introduce readonly memory region

2012-05-24 Thread Avi Kivity
On 05/24/2012 12:24 PM, Xiao Guangrong wrote:
 In current code, if we map a readonly memory space from host to guest
 and the page is not currently mapped in the host, we will get a fault-pfn
 and async is not allowed, then the vm will crash
 
 Address Avi's idea, we introduce readonly memory region to map ROM/ROMD
 to the guest
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  Documentation/virtual/kvm/api.txt |9 +--
  include/linux/kvm.h   |5 ++-
  virt/kvm/kvm_main.c   |   43 
 ++---
  3 files changed, 44 insertions(+), 13 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 9301266..e2a82c3 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -857,7 +857,8 @@ struct kvm_userspace_memory_region {
  };
 
  /* for kvm_memory_region::flags */
 -#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 +#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 +#define KVM_MEM_READ_ONLY(1UL  2)

Bit 1 should be fine too, see below.

 
  This ioctl allows the user to create or modify a guest physical memory
  slot.  When changing an existing slot, it may be moved in the guest
 @@ -873,9 +874,11 @@ It is recommended that the lower 21 bits of 
 guest_phys_addr and userspace_addr
  be identical.  This allows large pages in the guest to be backed by large
  pages in the host.
 
 -The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
 +The flags field supports two flags, KVM_MEM_LOG_DIRTY_PAGES, which
  instructs kvm to keep track of writes to memory within the slot.  See
 -the KVM_GET_DIRTY_LOG ioctl.
 +the KVM_GET_DIRTY_LOG ioctl. Another flag is KVM_MEM_READ_ONLY, which
 +indicates the guest memory is read-only, that means, guest is only allowed
 +to read it.

+ Writes will be posted to userspace as KVM_EXIT_MMIO exits.

 
  /* for kvm_memory_region::flags */
 -#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 -#define KVM_MEMSLOT_INVALID  (1UL  1)
 +#define KVM_MEM_LOG_DIRTY_PAGES  1UL
 +#define KVM_MEMSLOT_INVALID  (1UL  1)
 +#define KVM_MEM_READ_ONLY(1UL  2)

KVM_MEMSLOT_INVALID is actually an internal symbol, not used by
userspace.  Please move it to kvm_host.h.

I see that we don't check flags for validity.  Please add a check that
we don't use undefined flags and return -EINVAL.  Should be a separate
patch since we may want to backport it.

We need a KVM_CAP_ so userspace knows it can use the feature.  Only x86
should respond to it now, until (or if) other archs are updated.

 
 +static bool vma_is_avalid(struct vm_area_struct *vma, bool write_fault)

s/avalid/valid/.

 +{
 + if (write_fault) {
 + if (unlikely(!(vma-vm_flags  VM_WRITE)))
 + return false;
 +
 + return true;
 + }
 +
 + if (unlikely(!(vma-vm_flags  (VM_READ | VM_EXEC | VM_WRITE
 + return false;
 +

Strange check.  VM_EXEC doesn't concern us at all.  Maybe we should
check for VM_READ always, and VM_WRITE for write faults.

 + return true;
 +}
 +
  static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic,
   bool *async, bool write_fault, bool *writable)
  {
 @@ -1076,7 +1103,6 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long 
 addr, bool atomic,
 
   if (writable)
   *writable = write_fault;
 -
   if (async) {
   down_read(current-mm-mmap_sem);
   npages = get_user_page_nowait(current, current-mm,
 @@ -1123,8 +1149,9 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long 
 addr, bool atomic,
   vma-vm_pgoff;
   BUG_ON(!kvm_is_mmio_pfn(pfn));
   } else {
 - if (async  (vma-vm_flags  VM_WRITE))
 + if (async  vma_is_avalid(vma, write_fault))
   *async = true;
 +


This checks based on the fault type, not memslot type.  So we have the
risk of the pfn later used for writes?

   pfn = get_fault_pfn();
   }
   up_read(current-mm-mmap_sem);
 @@ -1148,7 +1175,7 @@ static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, 
 bool atomic, bool *async,
   if (async)
   *async = false;
 
 - addr = gfn_to_hva(kvm, gfn);
 + addr = gfn_to_hva_prot(kvm, gfn, write_fault);
   if (kvm_is_error_hva(addr)) {
   get_page(bad_page);
   return page_to_pfn(bad_page);
 @@ -1293,7 +1320,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, 
 void *data, int offset,
   int r;
   unsigned long addr;
 
 - addr = gfn_to_hva(kvm, gfn);
 + addr = gfn_to_hva_prot(kvm, gfn, false);
   if (kvm_is_error_hva(addr))
   return -EFAULT;
   r = __copy_from_user(data, (void __user *)addr + offset, len);
 @@ -1331,7 

Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Jan Kiszka
On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
  NetClientState *nc, *peer;
  net_client_type type;

 -monitor_printf(mon, Devices not on any VLAN:\n);
  QTAILQ_FOREACH(nc, net_clients, next) {
  peer = nc-peer;
  type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

That is true. But the output formatting is still improvable.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?
 
 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=(null),
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=(null),
 hub 1
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0

What about a layout like this:

hub.0
 \ virtio-net-pci.0: ...
 \ virtio-net-pci.1: ...
 \ user.0: ...
hub.1
 \ e1000.0: ...
e1000.1: ...
 \ user.1: ...

ie. printing the hubs first, listing all the peers of their ports
underneath them. Also, things like type=(null) should be avoided.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RfC PATCH 2/2] vga: make vram size configurable

2012-05-24 Thread Jan Kiszka
On 2012-05-24 05:44, Gerd Hoffmann wrote:
 Zap the global VGA_RAM_SIZE #define, make the vga ram size configurable
 for standard vga and vmware vga.  cirrus and qxl are left with a fixed
 size (and private VGA_RAM_SIZE #define) for now.
 
 qxl needs some non-trivial adjustments in the mode list handling deal
 with a runtime-configurable size, which calls for a separate qxl patch.
 
 cirrus emulates cards which have 2 MB (isa) and 4 MB (pci), so I guess
 it would make sense to use these sizes.  That change would break
 migration though, so I left it fixed at 8 MB size.  Making it
 configurabls is pretty pointless for cirrus as we have to match real
 hardware.

We still have the concept of compat machines. So raise to defaults to
more handy sizes should be feasible, provided we keep the old values for
legacy machines.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 8:09 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |    1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
      NetClientState *nc, *peer;
      net_client_type type;

 -    monitor_printf(mon, Devices not on any VLAN:\n);
      QTAILQ_FOREACH(nc, net_clients, next) {
          peer = nc-peer;
          type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

 That is true. But the output formatting is still improvable.
Please see below.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=(null),
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=(null),
 hub 1
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0

 What about a layout like this:

 hub.0
  \ virtio-net-pci.0: ...
  \ virtio-net-pci.1: ...
  \ user.0: ...
 hub.1
  \ e1000.0: ...
 e1000.1: ...
  \ user.1: ...
It is completely wrong. In one hub, one NIC emulated driver such as
virtio-net peers with one hub port; while its network backend such as
user peers with another hub port in the same hub. This is hub work
logic. Of course, you can add one dump network backend to this hub via
one hub port.


 ie. printing the hubs first, listing all the peers of their ports
 underneath them. Also, things like type=(null) should be avoided.

 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 8:09 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |    1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
      NetClientState *nc, *peer;
      net_client_type type;

 -    monitor_printf(mon, Devices not on any VLAN:\n);
      QTAILQ_FOREACH(nc, net_clients, next) {
          peer = nc-peer;
          type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

 That is true. But the output formatting is still improvable.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=(null),
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=(null),
 hub 1
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0

 What about a layout like this:

 hub.0
  \ virtio-net-pci.0: ...
  \ virtio-net-pci.1: ...
  \ user.0: ...
 hub.1
  \ e1000.0: ...
 e1000.1: ...
  \ user.1: ...
For this output, we can't find which port peers with which emulated
NIC or network backend.


 ie. printing the hubs first, listing all the peers of their ports
 underneath them. Also, things like type=(null) should be avoided.
For this info, we can simply remove it.

 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] intel-iommu: Add device info into list before doing context mapping

2012-05-24 Thread David Woodhouse
On Fri, 2012-03-23 at 02:54 +, Hao, Xudong wrote:
 Any other comments for this patch? Or can you check-in it in your
 iommu tree?

Apologies for the delayed response. I've just forwarded this to Linus.

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 8:09 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |    1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
      NetClientState *nc, *peer;
      net_client_type type;

 -    monitor_printf(mon, Devices not on any VLAN:\n);
      QTAILQ_FOREACH(nc, net_clients, next) {
          peer = nc-peer;
          type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

 That is true. But the output formatting is still improvable.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=(null),
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=(null),
 hub 1
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0

 What about a layout like this:

 hub.0
  \ virtio-net-pci.0: ...
  \ virtio-net-pci.1: ...
  \ user.0: ...
 hub.1
  \ e1000.0: ...
 e1000.1: ...
  \ user.1: ...

 ie. printing the hubs first, listing all the peers of their ports
 underneath them. Also, things like type=(null) should be avoided.
Let it look like as below, do you think of it?
  virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
   \ hub0port0: type=hubport,
  virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
   \ hub1port0: type=hubport,
  virtio-net-pci.2: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:58
   \ u: type=user,net=10.0.2.0,restrict=off
hub 1
port 1 peer user.1
port 0 peer virtio-net-pci.1
hub 0
port 1 peer user.0
port 0 peer virtio-net-pci.0

 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Jan Kiszka
On 2012-05-24 09:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 8:09 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
  NetClientState *nc, *peer;
  net_client_type type;

 -monitor_printf(mon, Devices not on any VLAN:\n);
  QTAILQ_FOREACH(nc, net_clients, next) {
  peer = nc-peer;
  type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

 That is true. But the output formatting is still improvable.
 Please see below.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=(null),
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=(null),
 hub 1
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0

 What about a layout like this:

 hub.0
  \ virtio-net-pci.0: ...
  \ virtio-net-pci.1: ...
  \ user.0: ...
 hub.1
  \ e1000.0: ...
 e1000.1: ...
  \ user.1: ...
 It is completely wrong.

(Note: my example is not a different representation of yours, it's a
different setup).

 In one hub, one NIC emulated driver such as
 virtio-net peers with one hub port; while its network backend such as
 user peers with another hub port in the same hub. This is hub work
 logic. Of course, you can add one dump network backend to this hub via
 one hub port.

The output should reflect the logical connection in an easily
understandable way, not just the internal relations of netdev peers. If
the data structures do not allow direct dumping in the proper form, it
simply takes a bit more effort to do this.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Jan Kiszka
On 2012-05-24 09:34, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 8:09 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
  NetClientState *nc, *peer;
  net_client_type type;

 -monitor_printf(mon, Devices not on any VLAN:\n);
  QTAILQ_FOREACH(nc, net_clients, next) {
  peer = nc-peer;
  type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

 That is true. But the output formatting is still improvable.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=(null),
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=(null),
 hub 1
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0

 What about a layout like this:

 hub.0
  \ virtio-net-pci.0: ...
  \ virtio-net-pci.1: ...
  \ user.0: ...
 hub.1
  \ e1000.0: ...
 e1000.1: ...
  \ user.1: ...
 For this output, we can't find which port peers with which emulated
 NIC or network backend.

Why? This information should be available at least in the hubs. The info
network routine could call into a dumping helper of the hub to make it
available for visualization. It is surely not impossible, just not as
straightforward as it was so far.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 9:30 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 09:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 8:09 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com 
 wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |    1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
      NetClientState *nc, *peer;
      net_client_type type;

 -    monitor_printf(mon, Devices not on any VLAN:\n);
      QTAILQ_FOREACH(nc, net_clients, next) {
          peer = nc-peer;
          type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

 That is true. But the output formatting is still improvable.
 Please see below.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=(null),
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=(null),
 hub 1
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0

 What about a layout like this:

 hub.0
  \ virtio-net-pci.0: ...
  \ virtio-net-pci.1: ...
  \ user.0: ...
 hub.1
  \ e1000.0: ...
 e1000.1: ...
  \ user.1: ...
 It is completely wrong.

 (Note: my example is not a different representation of yours, it's a
 different setup).
Sorry, i don't understand what the benefit for your layout is? And we
can not see which hub port peers with which NIC driver or network
backend.


 In one hub, one NIC emulated driver such as
 virtio-net peers with one hub port; while its network backend such as
 user peers with another hub port in the same hub. This is hub work
 logic. Of course, you can add one dump network backend to this hub via
 one hub port.

 The output should reflect the logical connection in an easily
Do you think that my output can not reflect this hub logical
connection? To be honest, i think it can than yours. :)
 understandable way, not just the internal relations of netdev peers. If
 the data structures do not allow direct dumping in the proper form, it
 simply takes a bit more effort to do this.

 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 9:33 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 09:34, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 8:09 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com 
 wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |    1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
      NetClientState *nc, *peer;
      net_client_type type;

 -    monitor_printf(mon, Devices not on any VLAN:\n);
      QTAILQ_FOREACH(nc, net_clients, next) {
          peer = nc-peer;
          type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

 That is true. But the output formatting is still improvable.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=(null),
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=(null),
 hub 1
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0

 What about a layout like this:

 hub.0
  \ virtio-net-pci.0: ...
  \ virtio-net-pci.1: ...
  \ user.0: ...
 hub.1
  \ e1000.0: ...
 e1000.1: ...
  \ user.1: ...
 For this output, we can't find which port peers with which emulated
 NIC or network backend.

 Why? This information should be available at least in the hubs. The info
Sorry, More
 18 of 121,110
   Why this ad?Best VPN for China Users -
mystrongvpn.org/~Instant_Approval - Unblock Now - Top Speeds - Secure
24x7 Live Technical and Sales Help512
 Markus .. Anthony (11)[Qemu-devel] [PATCH RFC 2/2] qmp: New command
qom-new - To create objects via QMP. Test case: $ upstream-qemu
--enable-kvm -S -m 384 -vnc :0 -monitor … 10:12 pm
 Jan, Jan, Max (9)[Qemu-devel] [PATCH] TCG: Fix TB invalidation after
breakpoint insertion/deletion - From: Jan Kiszka
jan.kis...@siemens.com tb_invalidate_phys_addr has to called with
the …10:11 pm
 Markus .. Peter, Anthony (6)[Qemu-devel] [PATCH RFC 0/2] QMP command
qom-new - Beware: second patch is the product of voodoo-coding. Markus
Armbruster (2): qom: Give … 10:11 pm
 Alex .. Peter, Arjan (25)[PATCH v7 8/8] x86/tlb: just do tlb flush on
one of siblings of SMT - According to Intel's SDM, flush tlb on both
of siblings of SMT is just wasting time, no any … 10:08 pm
 Stephen, Ingo, Peter (8)linux-next: build failure after merge of the
final tree - Hi all, After merging the final tree, today's linux-next
build (i386 defconfig) failed like …10:07 pm
 Idan, Boaz (4)exofs/ore: allocation of _ore_get_io_state() -
_ore_get_io_state is supposed to allocate a struct ore_io_state, which
is variable length … 10:06 pm
 Devendra Naga[PATCH] [emu10k1]: remove the kcallloc cast (reported
from make coccicheck's drop_kmall... - Signed-off-by: Devendra Naga
devendra.a...@gmail.com --- sound/pci/emu10k1/emufx.c | 2 … 10:04 pm
 David, Rusty (5)[PATCH 00/23] Crypto keys and module signing - Okay
Rusty, Here's a set of patches that does module signing attaching the
signature in the … 10:03 pm
 Paolo, Ori (2)[Qemu-devel] Block job commands in QEMU 1.2 [v2,
including support for replication] - On 24/05/2012 16:41, Paolo
Bonzini wrote:  The dirty bitmap is managed by these QMP commands …
10:01 pm
 Joe, Jan (2)net/wanrouter? - Does anyone still use this? -- To
unsubscribe from this list: send the line unsubscribe … 9:53 pm
 Daniel P. Berrange[libvirt] [PATCH 1/2] Remove uid param from
directory lookup APIs - From: Daniel P. Berrange
berra...@redhat.com Remove the uid param from … 9:49 pm
 Daniel P. Berrange[libvirt] [PATCH 2/2] Add impl of APIs to get user
directories on Win32 - From: Daniel P. Berrange
berra...@redhat.com Add an impl of … 9:49 pm
 Daniel P. Berrange[libvirt] [PATCH 0/2] Fix virGetUserXXXDirectory on
Win32 - The newly added APIs for getting user directories were causing
a win32 build failure, due to … 9:48 pm
 Pavel .. Pavel .. Paolo (12)[Qemu-devel] [PATCH] Prevent disk data
loss when closing qemu - From: Paolo Bonzini
[mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini  Il 23/05
… 9:47 pm
 Igor .. Jan, Andreas (18)[Qemu-devel] [PATCH qom-next 3/5] pc: move
apic_mapped initialization into common apic ... - On 

Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Jan Kiszka
On 2012-05-24 11:12, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 9:30 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 09:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 8:09 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-23 23:42, Zhi Yong Wu wrote:
 On Wed, May 23, 2012 at 11:41 PM, Jan Kiszka jan.kis...@siemens.com 
 wrote:
 On 2012-05-23 12:14, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..8c8e703 100644
 --- a/net.c
 +++ b/net.c
 @@ -1079,7 +1079,6 @@ void do_info_network(Monitor *mon)
  NetClientState *nc, *peer;
  net_client_type type;

 -monitor_printf(mon, Devices not on any VLAN:\n);
  QTAILQ_FOREACH(nc, net_clients, next) {
  peer = nc-peer;
  type = nc-info-type;

 This looks suspicious - or the patch description is improvable. This is
 really just about removing that headline? And what about the indention
 of the lines printed afterward?
 As you have known, vlan concept is replaced with hub. So i think that
 it is more reasonable to remove this in monitor.

 That is true. But the output formatting is still improvable.
 Please see below.


 It also leads me to the question how hub-based networks will be
 visualized on info network, specifically when there are multiple hubs.
 Can you provide some more complex example of an info network output?

 (qemu) info network
   virtio-net-pci.0: 
 type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=(null),
   virtio-net-pci.1: 
 type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=(null),
 hub 1
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0

 What about a layout like this:

 hub.0
  \ virtio-net-pci.0: ...
  \ virtio-net-pci.1: ...
  \ user.0: ...
 hub.1
  \ e1000.0: ...
 e1000.1: ...
  \ user.1: ...
 It is completely wrong.

 (Note: my example is not a different representation of yours, it's a
 different setup).
 Sorry, i don't understand what the benefit for your layout is? And we

To see at one glance which peers are connected via a hub with eachother.

 can not see which hub port peers with which NIC driver or network
 backend.

What is the benefit of printing the port number? Is it part of the
user-visible interface? Does the port number make any difference for the
attached peer?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Jan Kiszka
On 2012-05-24 11:14, Zhi Yong Wu wrote:
 For this output, we can't find which port peers with which emulated
 NIC or network backend.

 Why? This information should be available at least in the hubs. The info
 Sorry, More
  18 of 121,110
Why this ad?Best VPN for China Users -
 mystrongvpn.org/~Instant_Approval - Unblock Now - Top Speeds - Secure
 24x7 Live Technical and Sales Help512
  Markus .. Anthony (11)[Qemu-devel] [PATCH RFC 2/2] qmp: New command
 qom-new - To create objects via QMP. Test case: $ upstream-qemu
 --enable-kvm -S -m 384 -vnc :0 -monitor … 10:12 pm

[...]

Something mangled your reply and made it unreadable. Please retry.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 10:25 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 Something mangled your reply and made it unreadable. Please retry.
Sorry. let it look like below. Do you think of it? typ=hubport

(qemu) info network
  virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
   \ hub0port0: type=hubport,
  virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
   \ hub1port0: type=hubport,
  virtio-net-pci.2: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:58
   \ u: type=user,net=10.0.2.0,restrict=off
  e1000.0: type=nic,model=e1000,macaddr=52:54:00:12:34:59
   \ ur: type=user,net=10.0.2.0,restrict=off
hub 1
port 1 peer user.1
port 0 peer virtio-net-pci.1
hub 0
port 1 peer user.0
port 0 peer virtio-net-pci.0


 Thanks,
 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Jan Kiszka
On 2012-05-24 11:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:25 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 Something mangled your reply and made it unreadable. Please retry.
 Sorry. let it look like below. Do you think of it? typ=hubport
 
 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=hubport,
   virtio-net-pci.2: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:58
\ u: type=user,net=10.0.2.0,restrict=off
   e1000.0: type=nic,model=e1000,macaddr=52:54:00:12:34:59
\ ur: type=user,net=10.0.2.0,restrict=off
 hub 1
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0

My question remains: What added value we get from listing the hubs with
its ports separately from the port connections? Also, how would this be
printed:

-net user -net dump -net nic

The user should only be interested in the fact that user.0, dump.0 and
some_nic.0 are attached to the same hub, not to which port of that hub.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 10:31 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 11:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:25 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 Something mangled your reply and made it unreadable. Please retry.
 Sorry. let it look like below. Do you think of it? typ=hubport

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=hubport,
   virtio-net-pci.2: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:58
    \ u: type=user,net=10.0.2.0,restrict=off
   e1000.0: type=nic,model=e1000,macaddr=52:54:00:12:34:59
    \ ur: type=user,net=10.0.2.0,restrict=off
 hub 1
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0

 My question remains: What added value we get from listing the hubs with
 its ports separately from the port connections? Also, how would this be
 printed:

    -net user -net dump -net nic
(qemu) info network
  virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
   \ hub0port0: type=hubport,
  virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
   \ hub1port0: type=hubport,
hub 1
port 2 peer dump.0
port 1 peer user.1
port 0 peer virtio-net-pci.1
hub 0
port 1 peer user.0
port 0 peer virtio-net-pci.0
(qemu)


 The user should only be interested in the fact that user.0, dump.0 and
 some_nic.0 are attached to the same hub, not to which port of that hub.
OK, then let it seem like below. right?

(qemu) info network
  virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
   \ hub0port0: type=hubport,
  virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
   \ hub1port0: type=hubport,
hub 1
  \ dump.0
  \ user.1
  \ virtio-net-pci.1
hub 0
  \ user.0
  \ virtio-net-pci.0
(qemu)


 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-05-24 Thread Alex Williamson
On Thu, 2012-05-24 at 09:02 -0300, Jan Kiszka wrote:
 On 2012-05-24 04:44, Alexey Kardashevskiy wrote:
  [Found while debugging VFIO on POWER but it is platform independent]
  
  There is a feature in PCI (=2.3?) to mask/unmask INTx via PCI_COMMAND and
  PCI_STATUS registers.
 
 Yes, 2.3 introduced this. Masking is done via command register, checking
 if the source was the PCI in question via the status register. The
 latter is important for supporting IRQ sharing - and that's why we
 introduced this masking API to the PCI layer.
 
  
  And there is some API to support that (commit 
  a2e27787f893621c5a6b865acf6b7766f8671328).
  
  I have a network adapter:
  0001:00:01.0 Ethernet controller: Chelsio Communications Inc T310 10GbE 
  Single Port Adapter
  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
  Stepping- SERR+ FastB2B- DisINTx-
  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
  MAbort- SERR- PERR- INTx-
  
  pci_intx_mask_supported() reports that the feature is supported for this 
  adapter
  BUT the adapter does not set PCI_STATUS_INTERRUPT so 
  pci_check_and_set_intx_mask()
  never changes PCI_COMMAND and INTx does not work on it when we use it as 
  VFIO-PCI device.
  
  If I remove the check of this bit, it works fine as it is called from an 
  interrupt handler and
  Status bit check is redundant.
  
  Opened a spec:
  PCI LOCAL BUS SPECIFICATION, REV. 3.0, Table 6-2: Status Register Bits
  ===
  3   This read-only bit reflects the state of the interrupt in the
  device/function. Only when the Interrupt Disable bit in the command
  register is a 0 and this Interrupt Status bit is a 1, will the
  device’s/function’s INTx# signal be asserted. Setting the Interrupt
 Disable bit to a 1 has no effect on the state of this bit.
  ===
  With this adapter, INTx# is asserted but Status bit is still 0.
  
  Is it mandatory for a device to set Status bit if it supports INTx masking?
  
  2 Alex: if it is mandatory, then we need to be able to disable pci_2_3 in 
  VFIO-PCI
  somehow.
 
 Since PCI 2.3, this bit is mandatory, and it should be independent of
 the masking bit. The question is, if your device is supposed to support
 2.3, thus is just buggy, or if our detection algorithm is unreliable. It
 basically builds on the assumption that, if we can flip the mask bit,
 the feature should be present. I guess that is the best we can do. Maybe
 we can augment this with a blacklist of devices that support flipping
 without actually providing the feature.

Yep, that's what I'd suggest as well, add a blacklist to
pci_intx_mask_supported() so this device returns false and we require an
exclusive interrupt for it.  Thanks,

Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Jan Kiszka
On 2012-05-24 11:38, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:31 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 11:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:25 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 Something mangled your reply and made it unreadable. Please retry.
 Sorry. let it look like below. Do you think of it? typ=hubport

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=hubport,
   virtio-net-pci.2: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:58
\ u: type=user,net=10.0.2.0,restrict=off
   e1000.0: type=nic,model=e1000,macaddr=52:54:00:12:34:59
\ ur: type=user,net=10.0.2.0,restrict=off
 hub 1
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0

 My question remains: What added value we get from listing the hubs with
 its ports separately from the port connections? Also, how would this be
 printed:

-net user -net dump -net nic
 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=hubport,
 hub 1
 port 2 peer dump.0
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0
 (qemu)
 

 The user should only be interested in the fact that user.0, dump.0 and
 some_nic.0 are attached to the same hub, not to which port of that hub.
 OK, then let it seem like below. right?
 
 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=hubport,
 hub 1
   \ dump.0
   \ user.1
   \ virtio-net-pci.1
 hub 0
   \ user.0
   \ virtio-net-pci.0
 (qemu)

And, still, what is the added value of this verbose form compared to my
compact proposal? Please don't remark that it's easier to implement. ;)

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 10:43 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 11:38, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:31 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 11:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:25 PM, Jan Kiszka jan.kis...@siemens.com 
 wrote:
 Something mangled your reply and made it unreadable. Please retry.
 Sorry. let it look like below. Do you think of it? typ=hubport

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=hubport,
   virtio-net-pci.2: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:58
    \ u: type=user,net=10.0.2.0,restrict=off
   e1000.0: type=nic,model=e1000,macaddr=52:54:00:12:34:59
    \ ur: type=user,net=10.0.2.0,restrict=off
 hub 1
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0

 My question remains: What added value we get from listing the hubs with
 its ports separately from the port connections? Also, how would this be
 printed:

    -net user -net dump -net nic
 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=hubport,
 hub 1
     port 2 peer dump.0
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0
 (qemu)


 The user should only be interested in the fact that user.0, dump.0 and
 some_nic.0 are attached to the same hub, not to which port of that hub.
 OK, then let it seem like below. right?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=hubport,
 hub 1
   \ dump.0
   \ user.1
   \ virtio-net-pci.1
 hub 0
   \ user.0
   \ virtio-net-pci.0
 (qemu)

 And, still, what is the added value of this verbose form compared to my
They are same, i think.
 compact proposal? Please don't remark that it's easier to implement. ;)
The implementation is not one difficult thing, if we reach agreement
about its layout.
For those NIC which aren't in one hub, they should been kept compact
with old qemu form.


 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Zhi Yong Wu
On Thu, May 24, 2012 at 10:43 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 11:38, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:31 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 11:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:25 PM, Jan Kiszka jan.kis...@siemens.com 
 wrote:
 Something mangled your reply and made it unreadable. Please retry.
 Sorry. let it look like below. Do you think of it? typ=hubport

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=hubport,
   virtio-net-pci.2: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:58
    \ u: type=user,net=10.0.2.0,restrict=off
   e1000.0: type=nic,model=e1000,macaddr=52:54:00:12:34:59
    \ ur: type=user,net=10.0.2.0,restrict=off
 hub 1
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0

 My question remains: What added value we get from listing the hubs with
 its ports separately from the port connections? Also, how would this be
 printed:

    -net user -net dump -net nic
 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=hubport,
 hub 1
     port 2 peer dump.0
     port 1 peer user.1
     port 0 peer virtio-net-pci.1
 hub 0
     port 1 peer user.0
     port 0 peer virtio-net-pci.0
 (qemu)


 The user should only be interested in the fact that user.0, dump.0 and
 some_nic.0 are attached to the same hub, not to which port of that hub.
 OK, then let it seem like below. right?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
    \ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
    \ hub1port0: type=hubport,
 hub 1
   \ dump.0
   \ user.1
   \ virtio-net-pci.1
 hub 0
   \ user.0
   \ virtio-net-pci.0
 (qemu)

 And, still, what is the added value of this verbose form compared to my
 compact proposal? Please don't remark that it's easier to implement. ;)

By the way, if you agree that below form is ok, i will send v3.
Can you let me know your opinion?

(qemu) info network
  virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
   \ hub0port0: type=hubport,
  virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
   \ hub1port0: type=hubport,
  e1000.0: type=nic,model=e1000,macaddr=52:54:00:12:34:58
   \ u: type=tap,ifname=tap0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
hub 1
   \ dump.0
   \ user.1
   \ virtio-net-pci.1
hub 0
   \ user.0
   \ virtio-net-pci.0

 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 13/15] net: Remove obsolete vlan info

2012-05-24 Thread Jan Kiszka
On 2012-05-24 11:51, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:43 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 11:38, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:31 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 11:27, Zhi Yong Wu wrote:
 On Thu, May 24, 2012 at 10:25 PM, Jan Kiszka jan.kis...@siemens.com 
 wrote:
 Something mangled your reply and made it unreadable. Please retry.
 Sorry. let it look like below. Do you think of it? typ=hubport

 (qemu) info network
   virtio-net-pci.0: 
 type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=hubport,
   virtio-net-pci.1: 
 type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=hubport,
   virtio-net-pci.2: 
 type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:58
\ u: type=user,net=10.0.2.0,restrict=off
   e1000.0: type=nic,model=e1000,macaddr=52:54:00:12:34:59
\ ur: type=user,net=10.0.2.0,restrict=off
 hub 1
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0

 My question remains: What added value we get from listing the hubs with
 its ports separately from the port connections? Also, how would this be
 printed:

-net user -net dump -net nic
 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=hubport,
 hub 1
 port 2 peer dump.0
 port 1 peer user.1
 port 0 peer virtio-net-pci.1
 hub 0
 port 1 peer user.0
 port 0 peer virtio-net-pci.0
 (qemu)


 The user should only be interested in the fact that user.0, dump.0 and
 some_nic.0 are attached to the same hub, not to which port of that hub.
 OK, then let it seem like below. right?

 (qemu) info network
   virtio-net-pci.0: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56
\ hub0port0: type=hubport,
   virtio-net-pci.1: type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:57
\ hub1port0: type=hubport,
 hub 1
   \ dump.0
   \ user.1
   \ virtio-net-pci.1
 hub 0
   \ user.0
   \ virtio-net-pci.0
 (qemu)

 And, still, what is the added value of this verbose form compared to my
 They are same, i think.

Then let's got for the more compact form I proposed.

 compact proposal? Please don't remark that it's easier to implement. ;)
 The implementation is not one difficult thing, if we reach agreement
 about its layout.
 For those NIC which aren't in one hub, they should been kept compact
 with old qemu form.

Yes. The form would be

peer
 \ peer

for the classic couples and

hub
 \ peer
 \ peer
 \ ...

for those that are attached to a hub.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio-powerpc: enabled and supported on power

2012-05-24 Thread Alex Williamson
On Thu, 2012-05-24 at 13:10 +1000, Alexey Kardashevskiy wrote:
 The patch introduces support of VFIO on POWER.
 
 The patch consists of:
 
 1. IOMMU driver for VFIO.
 It does not use IOMMU API at all, instead it calls POWER
 IOMMU API directly (ppc_md callbacks).
 
 2. A piece of code (module_init) which creates IOMMU groups.
 TBD: what is a better place for it?
 
 The patch is made on top of
 git://github.com/awilliam/linux-vfio.git iommu-group-vfio-20120523
 (which is iommu-group-vfio-20120521 + some fixes)
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/Kconfig |6 +
  arch/powerpc/include/asm/iommu.h |3 +
  arch/powerpc/kernel/Makefile |1 +
  arch/powerpc/kernel/iommu_vfio.c |  371 
 ++
  4 files changed, 381 insertions(+), 0 deletions(-)
  create mode 100644 arch/powerpc/kernel/iommu_vfio.c
 
 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
 index feab3ba..13d12ac 100644
 --- a/arch/powerpc/Kconfig
 +++ b/arch/powerpc/Kconfig
 @@ -319,6 +319,12 @@ config 8XX_MINIMAL_FPEMU
  config IOMMU_HELPER
   def_bool PPC64
  
 +config IOMMU_VFIO
 + select IOMMU_API
 + depends on PPC64

 VFIO?

 + tristate Enable IOMMU chardev to support user-space PCI
 + default n
 +
  config SWIOTLB
   bool SWIOTLB support
   default n
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index 957a83f..c64bce7 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -66,6 +66,9 @@ struct iommu_table {
   unsigned long  it_halfpoint; /* Breaking point for small/large allocs */
   spinlock_t it_lock;  /* Protects it_map */
   unsigned long *it_map;   /* A simple allocation bitmap for now */
 +#ifdef CONFIG_IOMMU_API
 + struct iommu_group *it_group;
 +#endif
  };
  
  struct scatterlist;
 diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
 index f5808a3..7cfd68e 100644
 --- a/arch/powerpc/kernel/Makefile
 +++ b/arch/powerpc/kernel/Makefile
 @@ -90,6 +90,7 @@ obj-$(CONFIG_RELOCATABLE_PPC32) += reloc_32.o
  
  obj-$(CONFIG_PPC32)  += entry_32.o setup_32.o
  obj-$(CONFIG_PPC64)  += dma-iommu.o iommu.o
 +obj-$(CONFIG_IOMMU_VFIO) += iommu_vfio.o
  obj-$(CONFIG_KGDB)   += kgdb.o
  obj-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += prom_init.o
  obj-$(CONFIG_MODULES)+= ppc_ksyms.o
 diff --git a/arch/powerpc/kernel/iommu_vfio.c 
 b/arch/powerpc/kernel/iommu_vfio.c
 new file mode 100644
 index 000..68a93dd
 --- /dev/null
 +++ b/arch/powerpc/kernel/iommu_vfio.c

Should this be drivers/vfio/vfio_iommu_powerpc.c?

 @@ -0,0 +1,371 @@
 +/*
 + * VFIO: IOMMU DMA mapping support for TCE on POWER
 + *
 + * Copyright (C) 2012 IBM Corp.  All rights reserved.
 + * Author: Alexey Kardashevskiy a...@ozlabs.ru
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * Derived from original vfio_iommu_x86.c:
 + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
 + * Author: Alex Williamson alex.william...@redhat.com
 + */
 +
 +#include linux/module.h
 +#include linux/pci.h
 +#include linux/slab.h
 +#include linux/uaccess.h
 +#include linux/vfio.h
 +#include linux/err.h
 +#include linux/spinlock.h
 +#include asm/iommu.h
 +
 +#define DRIVER_VERSION  0.1
 +#define DRIVER_AUTHOR   a...@ozlabs.ru
 +#define DRIVER_DESC POWER IOMMU chardev for VFIO
 +
 +#define IOMMU_CHECK_EXTENSION_IO(VFIO_TYPE, VFIO_BASE + 1)
 +
 +/*  API for POWERPC IOMMU  */
 +
 +#define POWERPC_IOMMU2
 +
 +struct tce_iommu_info {
 + __u32 argsz;
 + __u32 dma32_window_start;
 + __u32 dma32_window_size;
 +};
 +
 +#define POWERPC_IOMMU_GET_INFO   _IO(VFIO_TYPE, VFIO_BASE + 12)
 +
 +struct tce_iommu_dma_map {
 + __u32 argsz;
 + __u64 va;
 + __u64 dmaaddr;
 +};
 +
 +#define POWERPC_IOMMU_MAP_DMA_IO(VFIO_TYPE, VFIO_BASE + 13)
 +#define POWERPC_IOMMU_UNMAP_DMA  _IO(VFIO_TYPE, VFIO_BASE + 14)

We'd probably want to merge this into include/linux/vfio.h too?

 +/* * */
 +
 +struct tce_iommu {
 + struct iommu_table *tbl;
 +};
 +
 +static int tce_iommu_attach_group(void *iommu_data,
 + struct iommu_group *iommu_group)
 +{
 + struct tce_iommu *tceiommu = iommu_data;
 +
 + if (tceiommu-tbl) {
 + printk(KERN_ERR Only one group per IOMMU instance is 
 allowed\n);
 + return -EFAULT;
 + }
 + tceiommu-tbl = iommu_group_get_iommudata(iommu_group);
 +
 + return 0;
 +}
 +
 +static void tce_iommu_detach_group(void *iommu_data,
 + struct iommu_group *iommu_group)
 +{
 + struct tce_iommu *tceiommu = iommu_data;
 +
 + if (!tceiommu-tbl) {
 + 

[PATCH 1/2] Remove kvm_commit_irq_routes from error messages

2012-05-24 Thread Richard Weinberger
Make my life a bit easier and report the correct function names.
s/kvm_commit_irq_routes/kvm_irqchip_commit_routes

Signed-off-by: Richard Weinberger rich...@nod.at
---
 hw/device-assignment.c |4 ++--
 hw/msi.c   |2 +-
 hw/msix.c  |4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 1daadb9..09726f9 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -958,7 +958,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 
 kvm_add_routing_entry(kvm_state, assigned_dev-entry);
 if (kvm_irqchip_commit_routes(kvm_state)  0) {
-perror(assigned_dev_update_msi: kvm_commit_irq_routes);
+perror(assigned_dev_update_msi: kvm_irqchip_commit_routes);
 assigned_dev-cap.state = ~ASSIGNED_DEVICE_MSI_ENABLED;
 return;
 }
@@ -1053,7 +1053,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 }
 
 if (r == 0  kvm_irqchip_commit_routes(kvm_state)  0) {
-   perror(assigned_dev_update_msix_mmio: kvm_commit_irq_routes);
+   perror(assigned_dev_update_msix_mmio: kvm_irqchip_commit_routes);
return -EINVAL;
 }
 
diff --git a/hw/msi.c b/hw/msi.c
index 4fcf769..1a20e83 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -180,7 +180,7 @@ static void kvm_msi_update(PCIDevice *dev)
 if (changed) {
 r = kvm_irqchip_commit_routes(kvm_state);
 if (r) {
-fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__,
+fprintf(stderr, %s: kvm_irqchip_commit_routes failed: %s\n, 
__func__,
 strerror(-r));
 exit(1);
 }
diff --git a/hw/msix.c b/hw/msix.c
index 5515a32..34b7455 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -91,7 +91,7 @@ static void kvm_msix_update(PCIDevice *dev, int vector,
 *entry = new_entry;
 r = kvm_irqchip_commit_routes(kvm_state);
 if (r) {
-fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__,
+fprintf(stderr, %s: kvm_irqchip_commit_routes failed: %s\n, 
__func__,
strerror(-r));
 exit(1);
 }
@@ -112,7 +112,7 @@ static int kvm_msix_vector_add(PCIDevice *dev, unsigned 
vector)
 
 r = kvm_irqchip_commit_routes(kvm_state);
 if (r  0) {
-fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__, 
strerror(-r));
+fprintf(stderr, %s: kvm_irqchip_commit_routes failed: %s\n, 
__func__, strerror(-r));
 return r;
 }
 return 0;
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Richard Weinberger
MSI interrupt affinity setting on the guest ended always up on vcpu0,
no matter what.
IOW writes to /proc/irq/IRQ/smp_affinity are irgnored.
This patch fixes the MSI IRQ routing and avoids the utter madness of
tearing down and setting up the interrupt completely when this changes.

Signed-off-by: Thomas Gleixner t...@linutronix.de
Signed-off-by: Richard Weinberger rich...@nod.at
---
 hw/device-assignment.c |   73 ++--
 1 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 09726f9..78d57c8 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -913,6 +913,50 @@ void assigned_dev_update_irqs(void)
 }
 }
 
+static void assigned_dev_update_msi_route(PCIDevice *pci_dev)
+{
+AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+uint8_t ctrl_byte = pci_get_byte(pci_dev-config + pci_dev-msi_cap +
+PCI_MSI_FLAGS);
+struct kvm_irq_routing_entry *old, new;
+KVMMsiMessage msg;
+int r;
+
+if (!(ctrl_byte  PCI_MSI_FLAGS_ENABLE))
+   return;
+
+msg.addr_lo =  pci_get_long(pci_dev-config + pci_dev-msi_cap +
+   PCI_MSI_ADDRESS_LO);
+msg.addr_hi =  pci_get_long(pci_dev-config + pci_dev-msi_cap +
+   PCI_MSI_ADDRESS_HI);
+msg.data =  pci_get_long(pci_dev-config + pci_dev-msi_cap +
+PCI_MSI_DATA_32);
+
+old = adev-entry;
+new = *old;
+new.u.msi.address_lo = msg.addr_lo;
+new.u.msi.address_hi = msg.addr_hi;
+new.u.msi.data = msg.data;
+
+if (memcmp(old, new, sizeof(new)) == 0)
+return;
+
+r = kvm_update_routing_entry(old, new);
+if (r  0) {
+fprintf(stderr, %s: kvm_update_msi failed: %s\n, __func__,
+strerror(-r));
+exit(1);
+}
+
+*old = new;
+ r = kvm_irqchip_commit_routes(kvm_state);
+ if (r) {
+fprintf(stderr, %s: kvm_irqchip_commit_routes failed: %s\n, __func__,
+strerror(-r));
+exit(1);
+ }
+}
+
 static void assigned_dev_update_msi(PCIDevice *pci_dev)
 {
 struct kvm_assigned_irq assigned_irq_data;
@@ -1116,6 +1160,14 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
*pci_dev,
 uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
 uint32_t real_val, emulate_mask, full_emulation_mask;
 
+if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
+uint32_t msi_start = pci_dev-msi_cap;
+uint32_t msi_end = msi_start + PCI_MSI_DATA_64 + 3;
+
+   if (address = msi_start  (address + len)  msi_end)
+return virt_val;
+}
+
 emulate_mask = 0;
 memcpy(emulate_mask, assigned_dev-emulate_config_read + address, len);
 emulate_mask = le32_to_cpu(emulate_mask);
@@ -1130,6 +1182,17 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
*pci_dev,
 }
 }
 
+static void handle_cfg_write_msi(PCIDevice *pci_dev, AssignedDevice *adev)
+{
+if (!kvm_enabled() || !kvm_irqchip_in_kernel())
+   return;
+
+if (adev-entry  (adev-irq_requested_type  KVM_DEV_IRQ_GUEST_MSI))
+assigned_dev_update_msi_route(pci_dev);
+else
+assigned_dev_update_msi(pci_dev);
+}
+
 static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t address,
   uint32_t val, int len)
 {
@@ -1155,9 +1218,13 @@ static void assigned_dev_pci_write_config(PCIDevice 
*pci_dev, uint32_t address,
 }
 }
 if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
-if (range_covers_byte(address, len,
-  pci_dev-msi_cap + PCI_MSI_FLAGS)) {
-assigned_dev_update_msi(pci_dev);
+uint32_t msi_start = pci_dev-msi_cap;
+uint32_t msi_end = msi_start + PCI_MSI_DATA_64 + 3;
+
+if (address = msi_start  (address + len)  msi_end) {
+if (address == msi_start + PCI_MSI_DATA_32)
+handle_cfg_write_msi(pci_dev, assigned_dev);
+return;
 }
 }
 if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSIX) {
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/16] net: hub-based networking

2012-05-24 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

The patchset implements network hub stead of vlan. The main work was done by 
stefan, and i rebased it to latest QEMU upstream, did some testings and am 
responsible for pushing it to QEMU upstream.

Changelog from v2:
  1.) add the support for hub own flow control [paolo]
  2.) make the monitor output more reasonable hub info [jan kiszka]

v2:
  1.) cleanup some obsolete vlan info
  2.) cleanup deliver/deliver_iov func pointers [paolo]
  3.) support more flexible flow control [paolo]

Stefan Hajnoczi (12):
  net: Add a hub net client
  net: Use hubs for the vlan feature
  net: Look up 'vlan' net clients using hubs
  hub: Check that hubs are configured correctly
  net: Drop vlan argument to qemu_new_net_client()
  net: Remove vlan qdev property
  net: Remove vlan code from net.c
  net: Remove VLANState
  net: Rename non_vlan_clients to net_clients
  net: Rename VLANClientState to NetClientState
  net: Rename vc local variables to nc
  net: Rename qemu_del_vlan_client() to qemu_del_net_client()

Zhi Yong Wu (4):
  net: Make the monitor output more reasonable hub info
  net: cleanup deliver/deliver_iov func pointers
  net: determine if packets can be sent before net queue deliver
packets
  hub: add the support for hub own flow control

 Makefile.objs   |2 +-
 hw/cadence_gem.c|8 +-
 hw/dp8393x.c|6 +-
 hw/e1000.c  |   10 +-
 hw/eepro100.c   |8 +-
 hw/etraxfs_eth.c|8 +-
 hw/lan9118.c|8 +-
 hw/lance.c  |2 +-
 hw/mcf_fec.c|6 +-
 hw/milkymist-minimac2.c |6 +-
 hw/mipsnet.c|6 +-
 hw/musicpal.c   |6 +-
 hw/ne2000-isa.c |2 +-
 hw/ne2000.c |8 +-
 hw/ne2000.h |4 +-
 hw/opencores_eth.c  |8 +-
 hw/pcnet-pci.c  |4 +-
 hw/pcnet.c  |6 +-
 hw/pcnet.h  |6 +-
 hw/qdev-properties.c|   78 +--
 hw/qdev.c   |2 -
 hw/qdev.h   |8 +-
 hw/rtl8139.c|   10 +-
 hw/smc91c111.c  |6 +-
 hw/spapr_llan.c |4 +-
 hw/stellaris_enet.c |6 +-
 hw/usb/dev-network.c|8 +-
 hw/vhost_net.c  |   24 +-
 hw/vhost_net.h  |2 +-
 hw/virtio-net.c |   12 +-
 hw/xen_nic.c|7 +-
 hw/xgmac.c  |6 +-
 hw/xilinx_axienet.c |6 +-
 hw/xilinx_ethlite.c |6 +-
 net.c   |  606 ++-
 net.h   |   85 
 net/dump.c  |   28 ++-
 net/dump.h  |2 +-
 net/hub.c   |  298 +++
 net/hub.h   |   28 +++
 net/queue.c |   42 ++--
 net/queue.h |   25 +--
 net/slirp.c |   32 +--
 net/slirp.h |2 +-
 net/socket.c|   66 +++---
 net/socket.h|2 +-
 net/tap-win32.c |   27 +-
 net/tap.c   |   45 ++--
 net/tap.h   |   21 +-
 net/vde.c   |   17 +-
 net/vde.h   |3 +-
 qemu-common.h   |3 +-
 slirp/if.c  |5 -
 slirp/libslirp.h|1 -
 54 files changed, 811 insertions(+), 826 deletions(-)
 create mode 100644 net/hub.c
 create mode 100644 net/hub.h

-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/16] net: Use hubs for the vlan feature

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Stop using the special-case vlan code in net.c.  Instead use the hub net
client to implement the vlan feature.  The next patch will remove vlan
code from net.c completely.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net.c   |   46 +-
 net/dump.c  |   20 ++--
 net/dump.h  |2 +-
 net/slirp.c |8 
 net/slirp.h |2 +-
 net/socket.c|   48 ++--
 net/socket.h|2 +-
 net/tap-win32.c |9 +
 net/tap.c   |7 ---
 net/tap.h   |3 ++-
 net/vde.c   |9 +
 net/vde.h   |3 ++-
 12 files changed, 90 insertions(+), 69 deletions(-)

diff --git a/net.c b/net.c
index 1922d8a..8c02559 100644
--- a/net.c
+++ b/net.c
@@ -25,6 +25,7 @@
 
 #include config-host.h
 
+#include net/hub.h
 #include net/tap.h
 #include net/socket.h
 #include net/dump.h
@@ -153,23 +154,25 @@ void qemu_macaddr_default_if_unset(MACAddr *macaddr)
 macaddr-a[5] = 0x56 + index++;
 }
 
+/**
+ * Generate a name for net client
+ *
+ * Only net clients created with the legacy -net option need this.  Naming is
+ * mandatory for net clients created with -netdev.
+ */
 static char *assign_name(VLANClientState *vc1, const char *model)
 {
-VLANState *vlan;
 VLANClientState *vc;
 char buf[256];
 int id = 0;
 
-QTAILQ_FOREACH(vlan, vlans, next) {
-QTAILQ_FOREACH(vc, vlan-clients, next) {
-if (vc != vc1  strcmp(vc-model, model) == 0) {
-id++;
-}
-}
-}
-
 QTAILQ_FOREACH(vc, non_vlan_clients, next) {
-if (vc != vc1  strcmp(vc-model, model) == 0) {
+if (vc == vc1) {
+continue;
+}
+/* For compatibility only bump id for net clients on a vlan */
+if (strcmp(vc-model, model) == 0 
+net_hub_id_for_client(vc, NULL) == 0) {
 id++;
 }
 }
@@ -748,7 +751,7 @@ int net_handle_fd_param(Monitor *mon, const char *param)
 static int net_init_nic(QemuOpts *opts,
 Monitor *mon,
 const char *name,
-VLANState *vlan)
+VLANClientState *peer)
 {
 int idx;
 NICInfo *nd;
@@ -771,8 +774,8 @@ static int net_init_nic(QemuOpts *opts,
 return -1;
 }
 } else {
-assert(vlan);
-nd-vlan = vlan;
+assert(peer);
+nd-netdev = peer;
 }
 if (name) {
 nd-name = g_strdup(name);
@@ -820,17 +823,17 @@ static int net_init_nic(QemuOpts *opts,
 .help = identifier for monitor commands, \
  }
 
-typedef int (*net_client_init_func)(QemuOpts *opts,
-Monitor *mon,
-const char *name,
-VLANState *vlan);
+typedef int NetClientInitFunc(QemuOpts *opts,
+  Monitor *mon,
+  const char *name,
+  VLANClientState *peer);
 
 /* magic number, but compiler will warn if too small */
 #define NET_MAX_DESC 20
 
 static const struct {
 const char *type;
-net_client_init_func init;
+NetClientInitFunc *init;
 QemuOptDesc desc[NET_MAX_DESC];
 } net_client_types[NET_CLIENT_TYPE_MAX] = {
 [NET_CLIENT_TYPE_NONE] = {
@@ -1136,7 +1139,7 @@ int net_client_init(Monitor *mon, QemuOpts *opts, int 
is_netdev)
 for (i = 0; i  NET_CLIENT_TYPE_MAX; i++) {
 if (net_client_types[i].type != NULL 
 !strcmp(net_client_types[i].type, type)) {
-VLANState *vlan = NULL;
+VLANClientState *peer = NULL;
 int ret;
 
 if (qemu_opts_validate(opts, net_client_types[i].desc[0]) == -1) {
@@ -1147,12 +1150,12 @@ int net_client_init(Monitor *mon, QemuOpts *opts, int 
is_netdev)
  * netdev= parameter. */
 if (!(is_netdev ||
   (strcmp(type, nic) == 0  qemu_opt_get(opts, netdev 
{
-vlan = qemu_find_vlan(qemu_opt_get_number(opts, vlan, 0), 1);
+peer = net_hub_add_port(qemu_opt_get_number(opts, vlan, 0));
 }
 
 ret = 0;
 if (net_client_types[i].init) {
-ret = net_client_types[i].init(opts, mon, name, vlan);
+ret = net_client_types[i].init(opts, mon, name, peer);
 if (ret  0) {
 /* TODO push error reporting into init() methods */
 qerror_report(QERR_DEVICE_INIT_FAILED, type);
@@ -1297,6 +1300,7 @@ void do_info_network(Monitor *mon)
 print_net_client(mon, peer);
 }
 }
+net_hub_info(mon);
 }
 
 void qmp_set_link(const char *name, bool up, Error **errp)
diff --git a/net/dump.c b/net/dump.c
index 

[PATCH v3 01/16] net: Add a hub net client

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

The vlan feature can be implemented in terms of hubs.  By introducing a
hub net client it becomes possible to remove the special case vlan code
from net.c and push the vlan feature out of generic networking code.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 Makefile.objs |2 +-
 net.h |1 +
 net/hub.c |  200 +
 net/hub.h |   23 +++
 4 files changed, 225 insertions(+), 1 deletions(-)
 create mode 100644 net/hub.c
 create mode 100644 net/hub.h

diff --git a/Makefile.objs b/Makefile.objs
index 70c5c79..a3a3a8a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -63,7 +63,7 @@ block-nested-$(CONFIG_RBD) += rbd.o
 block-obj-y +=  $(addprefix block/, $(block-nested-y))
 
 net-obj-y = net.o
-net-nested-y = queue.o checksum.o util.o
+net-nested-y = queue.o checksum.o util.o hub.o
 net-nested-y += socket.o
 net-nested-y += dump.o
 net-nested-$(CONFIG_POSIX) += tap.o
diff --git a/net.h b/net.h
index 64993b4..50c55ad 100644
--- a/net.h
+++ b/net.h
@@ -38,6 +38,7 @@ typedef enum {
 NET_CLIENT_TYPE_VDE,
 NET_CLIENT_TYPE_DUMP,
 NET_CLIENT_TYPE_BRIDGE,
+NET_CLIENT_TYPE_HUB,
 
 NET_CLIENT_TYPE_MAX
 } net_client_type;
diff --git a/net/hub.c b/net/hub.c
new file mode 100644
index 000..0d3df7f
--- /dev/null
+++ b/net/hub.c
@@ -0,0 +1,200 @@
+/*
+ * Hub net client
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Stefan Hajnoczi   stefa...@linux.vnet.ibm.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include monitor.h
+#include net.h
+#include hub.h
+
+/*
+ * A hub broadcasts incoming packets to all its ports except the source port.
+ * Hubs can be used to provide independent network segments, also confusingly
+ * named the QEMU 'vlan' feature.
+ */
+
+typedef struct NetHub NetHub;
+
+typedef struct NetHubPort {
+VLANClientState nc;
+QLIST_ENTRY(NetHubPort) next;
+NetHub *hub;
+unsigned int id;
+} NetHubPort;
+
+struct NetHub {
+unsigned int id;
+QLIST_ENTRY(NetHub) next;
+unsigned int num_ports;
+QLIST_HEAD(, NetHubPort) ports;
+};
+
+static QLIST_HEAD(, NetHub) hubs = QLIST_HEAD_INITIALIZER(hubs);
+
+static ssize_t net_hub_receive(NetHub *hub, NetHubPort *source_port,
+   const uint8_t *buf, size_t len)
+{
+NetHubPort *port;
+
+QLIST_FOREACH(port, hub-ports, next) {
+if (port == source_port) {
+continue;
+}
+
+qemu_send_packet(port-nc, buf, len);
+}
+return len;
+}
+
+static ssize_t net_hub_receive_iov(NetHub *hub, NetHubPort *source_port,
+   const struct iovec *iov, int iovcnt)
+{
+NetHubPort *port;
+ssize_t ret = 0;
+
+QLIST_FOREACH(port, hub-ports, next) {
+if (port == source_port) {
+continue;
+}
+
+ret = qemu_sendv_packet(port-nc, iov, iovcnt);
+}
+return ret;
+}
+
+static NetHub *net_hub_new(unsigned int id)
+{
+NetHub *hub;
+
+hub = g_malloc(sizeof(*hub));
+hub-id = id;
+hub-num_ports = 0;
+QLIST_INIT(hub-ports);
+
+QLIST_INSERT_HEAD(hubs, hub, next);
+
+return hub;
+}
+
+static ssize_t net_hub_port_receive(VLANClientState *nc,
+const uint8_t *buf, size_t len)
+{
+NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
+
+return net_hub_receive(port-hub, port, buf, len);
+}
+
+static ssize_t net_hub_port_receive_iov(VLANClientState *nc,
+const struct iovec *iov, int iovcnt)
+{
+NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
+
+return net_hub_receive_iov(port-hub, port, iov, iovcnt);
+}
+
+static void net_hub_port_cleanup(VLANClientState *nc)
+{
+NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
+
+QLIST_REMOVE(port, next);
+}
+
+static NetClientInfo net_hub_port_info = {
+.type = NET_CLIENT_TYPE_HUB,
+.size = sizeof(NetHubPort),
+.receive = net_hub_port_receive,
+.receive_iov = net_hub_port_receive_iov,
+.cleanup = net_hub_port_cleanup,
+};
+
+static NetHubPort *net_hub_port_new(NetHub *hub)
+{
+VLANClientState *nc;
+NetHubPort *port;
+unsigned int id = hub-num_ports++;
+char name[128];
+
+snprintf(name, sizeof name, hub%uport%u, hub-id, id);
+
+nc = qemu_new_net_client(net_hub_port_info, NULL, NULL, hub, name);
+port = DO_UPCAST(NetHubPort, nc, nc);
+port-id = id;
+port-hub = hub;
+
+QLIST_INSERT_HEAD(hub-ports, port, next);
+
+return port;
+}
+
+/**
+ * Create a port on a given hub
+ *
+ * If there is no existing hub with the given id then a new hub is created.
+ */
+VLANClientState *net_hub_add_port(unsigned int hub_id)
+{
+NetHub *hub;
+NetHubPort *port;
+
+QLIST_FOREACH(hub, hubs, 

[PATCH v3 06/16] net: Remove vlan qdev property

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

The vlan feature is implemented using hubs and no longer uses
special-purpose VLANState structs that are accessible as qdev
properties.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 hw/qdev-properties.c |   72 --
 hw/qdev.c|2 -
 hw/qdev.h|4 ---
 net.h|3 --
 4 files changed, 0 insertions(+), 81 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index b7b5597..d2e2afb 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -623,71 +623,6 @@ PropertyInfo qdev_prop_netdev = {
 .set   = set_netdev,
 };
 
-/* --- vlan --- */
-
-static int print_vlan(DeviceState *dev, Property *prop, char *dest, size_t len)
-{
-VLANState **ptr = qdev_get_prop_ptr(dev, prop);
-
-if (*ptr) {
-return snprintf(dest, len, %d, (*ptr)-id);
-} else {
-return snprintf(dest, len, null);
-}
-}
-
-static void get_vlan(Object *obj, Visitor *v, void *opaque,
- const char *name, Error **errp)
-{
-DeviceState *dev = DEVICE(obj);
-Property *prop = opaque;
-VLANState **ptr = qdev_get_prop_ptr(dev, prop);
-int64_t id;
-
-id = *ptr ? (*ptr)-id : -1;
-visit_type_int(v, id, name, errp);
-}
-
-static void set_vlan(Object *obj, Visitor *v, void *opaque,
- const char *name, Error **errp)
-{
-DeviceState *dev = DEVICE(obj);
-Property *prop = opaque;
-VLANState **ptr = qdev_get_prop_ptr(dev, prop);
-Error *local_err = NULL;
-int64_t id;
-VLANState *vlan;
-
-if (dev-state != DEV_STATE_CREATED) {
-error_set(errp, QERR_PERMISSION_DENIED);
-return;
-}
-
-visit_type_int(v, id, name, local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
-if (id == -1) {
-*ptr = NULL;
-return;
-}
-vlan = qemu_find_vlan(id, 1);
-if (!vlan) {
-error_set(errp, QERR_INVALID_PARAMETER_VALUE,
-  name, prop-info-name);
-return;
-}
-*ptr = vlan;
-}
-
-PropertyInfo qdev_prop_vlan = {
-.name  = vlan,
-.print = print_vlan,
-.get   = get_vlan,
-.set   = set_vlan,
-};
-
 /* --- pointer --- */
 
 /* Not a proper property, just for dirty hacks.  TODO Remove it!  */
@@ -1094,13 +1029,6 @@ void qdev_prop_set_netdev(DeviceState *dev, const char 
*name, VLANClientState *v
 assert_no_error(errp);
 }
 
-void qdev_prop_set_vlan(DeviceState *dev, const char *name, VLANState *value)
-{
-Error *errp = NULL;
-object_property_set_int(OBJECT(dev), value ? value-id : -1, name, errp);
-assert_no_error(errp);
-}
-
 void qdev_prop_set_macaddr(DeviceState *dev, const char *name, uint8_t *value)
 {
 Error *errp = NULL;
diff --git a/hw/qdev.c b/hw/qdev.c
index 6a8f6bd..49dd303 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -316,8 +316,6 @@ void qdev_connect_gpio_out(DeviceState * dev, int n, 
qemu_irq pin)
 void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd)
 {
 qdev_prop_set_macaddr(dev, mac, nd-macaddr.a);
-if (nd-vlan)
-qdev_prop_set_vlan(dev, vlan, nd-vlan);
 if (nd-netdev)
 qdev_prop_set_netdev(dev, netdev, nd-netdev);
 if (nd-nvectors != DEV_NVECTORS_UNSPECIFIED 
diff --git a/hw/qdev.h b/hw/qdev.h
index 4e90119..0a50a40 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -222,7 +222,6 @@ extern PropertyInfo qdev_prop_macaddr;
 extern PropertyInfo qdev_prop_losttickpolicy;
 extern PropertyInfo qdev_prop_drive;
 extern PropertyInfo qdev_prop_netdev;
-extern PropertyInfo qdev_prop_vlan;
 extern PropertyInfo qdev_prop_pci_devfn;
 extern PropertyInfo qdev_prop_blocksize;
 
@@ -277,8 +276,6 @@ extern PropertyInfo qdev_prop_blocksize;
 DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*)
 #define DEFINE_PROP_NETDEV(_n, _s, _f) \
 DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState*)
-#define DEFINE_PROP_VLAN(_n, _s, _f) \
-DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, VLANState*)
 #define DEFINE_PROP_DRIVE(_n, _s, _f) \
 DEFINE_PROP(_n, _s, _f, qdev_prop_drive, BlockDriverState *)
 #define DEFINE_PROP_MACADDR(_n, _s, _f) \
@@ -305,7 +302,6 @@ void qdev_prop_set_uint64(DeviceState *dev, const char 
*name, uint64_t value);
 void qdev_prop_set_string(DeviceState *dev, const char *name, char *value);
 void qdev_prop_set_chr(DeviceState *dev, const char *name, CharDriverState 
*value);
 void qdev_prop_set_netdev(DeviceState *dev, const char *name, VLANClientState 
*value);
-void qdev_prop_set_vlan(DeviceState *dev, const char *name, VLANState *value);
 int qdev_prop_set_drive(DeviceState *dev, const char *name, BlockDriverState 
*value) QEMU_WARN_UNUSED_RESULT;
 void qdev_prop_set_drive_nofail(DeviceState *dev, const char *name, 
BlockDriverState *value);
 void qdev_prop_set_macaddr(DeviceState *dev, 

[PATCH v3 05/16] net: Drop vlan argument to qemu_new_net_client()

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Since hubs are now used to implement the 'vlan' feature and the vlan
argument is always NULL, remove the argument entirely and update all net
clients that use qemu_new_net_client().

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net.c   |   27 ++-
 net.h   |1 -
 net/dump.c  |2 +-
 net/hub.c   |2 +-
 net/slirp.c |2 +-
 net/socket.c|4 ++--
 net/tap-win32.c |2 +-
 net/tap.c   |2 +-
 net/vde.c   |2 +-
 9 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/net.c b/net.c
index 88b9e1f..96252f9 100644
--- a/net.c
+++ b/net.c
@@ -194,7 +194,6 @@ static ssize_t qemu_deliver_packet_iov(VLANClientState 
*sender,
void *opaque);
 
 VLANClientState *qemu_new_net_client(NetClientInfo *info,
- VLANState *vlan,
  VLANClientState *peer,
  const char *model,
  const char *name)
@@ -213,22 +212,16 @@ VLANClientState *qemu_new_net_client(NetClientInfo *info,
 vc-name = assign_name(vc, model);
 }
 
-if (vlan) {
-assert(!peer);
-vc-vlan = vlan;
-QTAILQ_INSERT_TAIL(vc-vlan-clients, vc, next);
-} else {
-if (peer) {
-assert(!peer-peer);
-vc-peer = peer;
-peer-peer = vc;
-}
-QTAILQ_INSERT_TAIL(non_vlan_clients, vc, next);
-
-vc-send_queue = qemu_new_net_queue(qemu_deliver_packet,
-qemu_deliver_packet_iov,
-vc);
+if (peer) {
+assert(!peer-peer);
+vc-peer = peer;
+peer-peer = vc;
 }
+QTAILQ_INSERT_TAIL(non_vlan_clients, vc, next);
+
+vc-send_queue = qemu_new_net_queue(qemu_deliver_packet,
+qemu_deliver_packet_iov,
+vc);
 
 return vc;
 }
@@ -245,7 +238,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
 assert(info-type == NET_CLIENT_TYPE_NIC);
 assert(info-size = sizeof(NICState));
 
-nc = qemu_new_net_client(info, conf-vlan, conf-peer, model, name);
+nc = qemu_new_net_client(info, conf-peer, model, name);
 
 nic = DO_UPCAST(NICState, nc, nc);
 nic-conf = conf;
diff --git a/net.h b/net.h
index 50c55ad..d3d6e4c 100644
--- a/net.h
+++ b/net.h
@@ -92,7 +92,6 @@ struct VLANState {
 VLANState *qemu_find_vlan(int id, int allocate);
 VLANClientState *qemu_find_netdev(const char *id);
 VLANClientState *qemu_new_net_client(NetClientInfo *info,
- VLANState *vlan,
  VLANClientState *peer,
  const char *model,
  const char *name);
diff --git a/net/dump.c b/net/dump.c
index 37cec3c..621f4e7 100644
--- a/net/dump.c
+++ b/net/dump.c
@@ -129,7 +129,7 @@ static int net_dump_init(VLANClientState *peer, const char 
*device,
 return -1;
 }
 
-nc = qemu_new_net_client(net_dump_info, NULL, peer, device, name);
+nc = qemu_new_net_client(net_dump_info, peer, device, name);
 
 snprintf(nc-info_str, sizeof(nc-info_str),
  dump to %s (len=%d), filename, len);
diff --git a/net/hub.c b/net/hub.c
index 19c1169..af3de9c 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -124,7 +124,7 @@ static NetHubPort *net_hub_port_new(NetHub *hub)
 
 snprintf(name, sizeof name, hub%uport%u, hub-id, id);
 
-nc = qemu_new_net_client(net_hub_port_info, NULL, NULL, hub, name);
+nc = qemu_new_net_client(net_hub_port_info, NULL, hub, name);
 port = DO_UPCAST(NetHubPort, nc, nc);
 port-id = id;
 port-hub = hub;
diff --git a/net/slirp.c b/net/slirp.c
index edb4621..5ed7036 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -238,7 +238,7 @@ static int net_slirp_init(VLANClientState *peer, const char 
*model,
 }
 #endif
 
-nc = qemu_new_net_client(net_slirp_info, NULL, peer, model, name);
+nc = qemu_new_net_client(net_slirp_info, peer, model, name);
 
 snprintf(nc-info_str, sizeof(nc-info_str),
  net=%s,restrict=%s, inet_ntoa(net),
diff --git a/net/socket.c b/net/socket.c
index ed28cbd..bf7a793 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -286,7 +286,7 @@ static NetSocketState 
*net_socket_fd_init_dgram(VLANClientState *peer,
 }
 }
 
-nc = qemu_new_net_client(net_dgram_socket_info, NULL, peer, model, name);
+nc = qemu_new_net_client(net_dgram_socket_info, peer, model, name);
 
 snprintf(nc-info_str, sizeof(nc-info_str),
 socket: fd=%d (%s mcast=%s:%d),
@@ -330,7 +330,7 @@ static NetSocketState 
*net_socket_fd_init_stream(VLANClientState *peer,
 VLANClientState *nc;
 NetSocketState 

[PATCH v3 03/16] net: Look up 'vlan' net clients using hubs

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net.c   |   28 +---
 net/hub.c   |   24 
 net/hub.h   |2 ++
 net/slirp.c |5 +++--
 4 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/net.c b/net.c
index 8c02559..d9c7eac 100644
--- a/net.c
+++ b/net.c
@@ -312,32 +312,6 @@ void qemu_del_vlan_client(VLANClientState *vc)
 qemu_free_vlan_client(vc);
 }
 
-VLANClientState *
-qemu_find_vlan_client_by_name(Monitor *mon, int vlan_id,
-  const char *client_str)
-{
-VLANState *vlan;
-VLANClientState *vc;
-
-vlan = qemu_find_vlan(vlan_id, 0);
-if (!vlan) {
-monitor_printf(mon, unknown VLAN %d\n, vlan_id);
-return NULL;
-}
-
-QTAILQ_FOREACH(vc, vlan-clients, next) {
-if (!strcmp(vc-name, client_str)) {
-break;
-}
-}
-if (!vc) {
-monitor_printf(mon, can't find device %s on VLAN %d\n,
-   client_str, vlan_id);
-}
-
-return vc;
-}
-
 void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
 {
 VLANClientState *nc;
@@ -1223,7 +1197,7 @@ void net_host_device_remove(Monitor *mon, const QDict 
*qdict)
 int vlan_id = qdict_get_int(qdict, vlan_id);
 const char *device = qdict_get_str(qdict, device);
 
-vc = qemu_find_vlan_client_by_name(mon, vlan_id, device);
+vc = net_hub_find_client_by_name(vlan_id, device);
 if (!vc) {
 return;
 }
diff --git a/net/hub.c b/net/hub.c
index 0d3df7f..3cd1249 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -159,6 +159,30 @@ VLANClientState *net_hub_add_port(unsigned int hub_id)
 }
 
 /**
+ * Find a specific client on a hub
+ */
+VLANClientState *net_hub_find_client_by_name(unsigned int hub_id,
+ const char *name)
+{
+NetHub *hub;
+NetHubPort *port;
+VLANClientState *peer;
+
+QLIST_FOREACH(hub, hubs, next) {
+if (hub-id == hub_id) {
+QLIST_FOREACH(port, hub-ports, next) {
+peer = port-nc.peer;
+
+if (peer  strcmp(peer-name, name) == 0) {
+return peer;
+}
+}
+}
+}
+return NULL;
+}
+
+/**
  * Print hub configuration
  */
 void net_hub_info(Monitor *mon)
diff --git a/net/hub.h b/net/hub.h
index 3ca05dc..60d4cae 100644
--- a/net/hub.h
+++ b/net/hub.h
@@ -17,6 +17,8 @@
 #include qemu-common.h
 
 VLANClientState *net_hub_add_port(unsigned int hub_id);
+VLANClientState *net_hub_find_client_by_name(unsigned int hub_id,
+ const char *name);
 void net_hub_info(Monitor *mon);
 int net_hub_id_for_client(VLANClientState *nc, unsigned int *id);
 
diff --git a/net/slirp.c b/net/slirp.c
index fa7c7fc..edb4621 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -29,6 +29,7 @@
 #include sys/wait.h
 #endif
 #include net.h
+#include net/hub.h
 #include monitor.h
 #include qemu_socket.h
 #include slirp/libslirp.h
@@ -283,7 +284,7 @@ static SlirpState *slirp_lookup(Monitor *mon, const char 
*vlan,
 
 if (vlan) {
 VLANClientState *nc;
-nc = qemu_find_vlan_client_by_name(mon, strtol(vlan, NULL, 0), stack);
+nc = net_hub_find_client_by_name(strtol(vlan, NULL, 0), stack);
 if (!nc) {
 return NULL;
 }
@@ -648,7 +649,7 @@ void do_info_usernet(Monitor *mon)
 
 QTAILQ_FOREACH(s, slirp_stacks, entry) {
 monitor_printf(mon, VLAN %d (%s):\n,
-   s-nc.vlan ? s-nc.vlan-id : -1,
+   -1, /* TODO */
s-nc.name);
 slirp_connection_info(s-slirp, mon);
 }
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/16] net: Remove vlan code from net.c

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

The vlan implementation in net.c has been replaced by hubs so we can
remove the code.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 hw/xen_nic.c |1 -
 net.c|  108 --
 net.h|1 -
 3 files changed, 0 insertions(+), 110 deletions(-)

diff --git a/hw/xen_nic.c b/hw/xen_nic.c
index 9a59bda..85526fe 100644
--- a/hw/xen_nic.c
+++ b/hw/xen_nic.c
@@ -328,7 +328,6 @@ static int net_init(struct XenDevice *xendev)
 return -1;
 }
 
-netdev-conf.vlan = qemu_find_vlan(netdev-xendev.dev, 1);
 netdev-conf.peer = NULL;
 
 netdev-nic = qemu_new_nic(net_xen_info, netdev-conf,
diff --git a/net.c b/net.c
index 96252f9..abf5a3d 100644
--- a/net.c
+++ b/net.c
@@ -388,50 +388,6 @@ static ssize_t qemu_deliver_packet(VLANClientState *sender,
 return ret;
 }
 
-static ssize_t qemu_vlan_deliver_packet(VLANClientState *sender,
-unsigned flags,
-const uint8_t *buf,
-size_t size,
-void *opaque)
-{
-VLANState *vlan = opaque;
-VLANClientState *vc;
-ssize_t ret = -1;
-
-QTAILQ_FOREACH(vc, vlan-clients, next) {
-ssize_t len;
-
-if (vc == sender) {
-continue;
-}
-
-if (vc-link_down) {
-ret = size;
-continue;
-}
-
-if (vc-receive_disabled) {
-ret = 0;
-continue;
-}
-
-if (flags  QEMU_NET_PACKET_FLAG_RAW  vc-info-receive_raw) {
-len = vc-info-receive_raw(vc, buf, size);
-} else {
-len = vc-info-receive(vc, buf, size);
-}
-
-if (len == 0) {
-vc-receive_disabled = 1;
-}
-
-ret = (ret = 0) ? ret : len;
-
-}
-
-return ret;
-}
-
 void qemu_purge_queued_packets(VLANClientState *vc)
 {
 NetQueue *queue;
@@ -538,42 +494,6 @@ static ssize_t qemu_deliver_packet_iov(VLANClientState 
*sender,
 }
 }
 
-static ssize_t qemu_vlan_deliver_packet_iov(VLANClientState *sender,
-unsigned flags,
-const struct iovec *iov,
-int iovcnt,
-void *opaque)
-{
-VLANState *vlan = opaque;
-VLANClientState *vc;
-ssize_t ret = -1;
-
-QTAILQ_FOREACH(vc, vlan-clients, next) {
-ssize_t len;
-
-if (vc == sender) {
-continue;
-}
-
-if (vc-link_down) {
-ret = iov_size(iov, iovcnt);
-continue;
-}
-
-assert(!(flags  QEMU_NET_PACKET_FLAG_RAW));
-
-if (vc-info-receive_iov) {
-len = vc-info-receive_iov(vc, iov, iovcnt);
-} else {
-len = vc_sendv_compat(vc, iov, iovcnt);
-}
-
-ret = (ret = 0) ? ret : len;
-}
-
-return ret;
-}
-
 ssize_t qemu_sendv_packet_async(VLANClientState *sender,
 const struct iovec *iov, int iovcnt,
 NetPacketSent *sent_cb)
@@ -601,34 +521,6 @@ qemu_sendv_packet(VLANClientState *vc, const struct iovec 
*iov, int iovcnt)
 return qemu_sendv_packet_async(vc, iov, iovcnt, NULL);
 }
 
-/* find or alloc a new VLAN */
-VLANState *qemu_find_vlan(int id, int allocate)
-{
-VLANState *vlan;
-
-QTAILQ_FOREACH(vlan, vlans, next) {
-if (vlan-id == id) {
-return vlan;
-}
-}
-
-if (!allocate) {
-return NULL;
-}
-
-vlan = g_malloc0(sizeof(VLANState));
-vlan-id = id;
-QTAILQ_INIT(vlan-clients);
-
-vlan-send_queue = qemu_new_net_queue(qemu_vlan_deliver_packet,
-  qemu_vlan_deliver_packet_iov,
-  vlan);
-
-QTAILQ_INSERT_TAIL(vlans, vlan, next);
-
-return vlan;
-}
-
 VLANClientState *qemu_find_netdev(const char *id)
 {
 VLANClientState *vc;
diff --git a/net.h b/net.h
index 7d18b10..a4ac48d 100644
--- a/net.h
+++ b/net.h
@@ -87,7 +87,6 @@ struct VLANState {
 NetQueue *send_queue;
 };
 
-VLANState *qemu_find_vlan(int id, int allocate);
 VLANClientState *qemu_find_netdev(const char *id);
 VLANClientState *qemu_new_net_client(NetClientInfo *info,
  VLANClientState *peer,
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 08/16] net: Remove VLANState

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

VLANState is no longer used and can be removed.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net.c |  127 ++---
 net.h |8 
 net/socket.c  |6 +-
 net/tap.c |6 +-
 net/tap.h |2 +-
 qemu-common.h |1 -
 6 files changed, 29 insertions(+), 121 deletions(-)

diff --git a/net.c b/net.c
index abf5a3d..eb2ad06 100644
--- a/net.c
+++ b/net.c
@@ -44,7 +44,6 @@
 # define CONFIG_NET_BRIDGE
 #endif
 
-static QTAILQ_HEAD(, VLANState) vlans;
 static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
 
 int default_net = 1;
@@ -249,11 +248,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
 
 static void qemu_cleanup_vlan_client(VLANClientState *vc)
 {
-if (vc-vlan) {
-QTAILQ_REMOVE(vc-vlan-clients, vc, next);
-} else {
-QTAILQ_REMOVE(non_vlan_clients, vc, next);
-}
+QTAILQ_REMOVE(non_vlan_clients, vc, next);
 
 if (vc-info-cleanup) {
 vc-info-cleanup(vc);
@@ -262,13 +257,11 @@ static void qemu_cleanup_vlan_client(VLANClientState *vc)
 
 static void qemu_free_vlan_client(VLANClientState *vc)
 {
-if (!vc-vlan) {
-if (vc-send_queue) {
-qemu_del_net_queue(vc-send_queue);
-}
-if (vc-peer) {
-vc-peer-peer = NULL;
-}
+if (vc-send_queue) {
+qemu_del_net_queue(vc-send_queue);
+}
+if (vc-peer) {
+vc-peer-peer = NULL;
 }
 g_free(vc-name);
 g_free(vc-model);
@@ -278,7 +271,7 @@ static void qemu_free_vlan_client(VLANClientState *vc)
 void qemu_del_vlan_client(VLANClientState *vc)
 {
 /* If there is a peer NIC, delete and cleanup client, but do not free. */
-if (!vc-vlan  vc-peer  vc-peer-info-type == NET_CLIENT_TYPE_NIC) {
+if (vc-peer  vc-peer-info-type == NET_CLIENT_TYPE_NIC) {
 NICState *nic = DO_UPCAST(NICState, nc, vc-peer);
 if (nic-peer_deleted) {
 return;
@@ -294,7 +287,7 @@ void qemu_del_vlan_client(VLANClientState *vc)
 }
 
 /* If this is a peer NIC and peer has already been deleted, free it now. */
-if (!vc-vlan  vc-peer  vc-info-type == NET_CLIENT_TYPE_NIC) {
+if (vc-peer  vc-info-type == NET_CLIENT_TYPE_NIC) {
 NICState *nic = DO_UPCAST(NICState, nc, vc);
 if (nic-peer_deleted) {
 qemu_free_vlan_client(vc-peer);
@@ -308,52 +301,25 @@ void qemu_del_vlan_client(VLANClientState *vc)
 void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
 {
 VLANClientState *nc;
-VLANState *vlan;
 
 QTAILQ_FOREACH(nc, non_vlan_clients, next) {
 if (nc-info-type == NET_CLIENT_TYPE_NIC) {
 func(DO_UPCAST(NICState, nc, nc), opaque);
 }
 }
-
-QTAILQ_FOREACH(vlan, vlans, next) {
-QTAILQ_FOREACH(nc, vlan-clients, next) {
-if (nc-info-type == NET_CLIENT_TYPE_NIC) {
-func(DO_UPCAST(NICState, nc, nc), opaque);
-}
-}
-}
 }
 
 int qemu_can_send_packet(VLANClientState *sender)
 {
-VLANState *vlan = sender-vlan;
-VLANClientState *vc;
-
-if (sender-peer) {
-if (sender-peer-receive_disabled) {
-return 0;
-} else if (sender-peer-info-can_receive 
-   !sender-peer-info-can_receive(sender-peer)) {
-return 0;
-} else {
-return 1;
-}
-}
-
-if (!sender-vlan) {
+if (!sender-peer) {
 return 1;
 }
 
-QTAILQ_FOREACH(vc, vlan-clients, next) {
-if (vc == sender) {
-continue;
-}
-
-/* no can_receive() handler, they can always receive */
-if (vc-info-can_receive  !vc-info-can_receive(vc)) {
-return 0;
-}
+if (sender-peer-receive_disabled) {
+return 0;
+} else if (sender-peer-info-can_receive 
+   !sender-peer-info-can_receive(sender-peer)) {
+return 0;
 }
 return 1;
 }
@@ -390,34 +356,18 @@ static ssize_t qemu_deliver_packet(VLANClientState 
*sender,
 
 void qemu_purge_queued_packets(VLANClientState *vc)
 {
-NetQueue *queue;
-
-if (!vc-peer  !vc-vlan) {
+if (!vc-peer) {
 return;
 }
 
-if (vc-peer) {
-queue = vc-peer-send_queue;
-} else {
-queue = vc-vlan-send_queue;
-}
-
-qemu_net_queue_purge(queue, vc);
+qemu_net_queue_purge(vc-peer-send_queue, vc);
 }
 
 void qemu_flush_queued_packets(VLANClientState *vc)
 {
-NetQueue *queue;
-
 vc-receive_disabled = 0;
 
-if (vc-vlan) {
-queue = vc-vlan-send_queue;
-} else {
-queue = vc-send_queue;
-}
-
-qemu_net_queue_flush(queue);
+qemu_net_queue_flush(vc-send_queue);
 }
 
 static ssize_t qemu_send_packet_async_with_flags(VLANClientState *sender,
@@ -432,15 +382,11 @@ static ssize_t 
qemu_send_packet_async_with_flags(VLANClientState *sender,
 

[PATCH v3 09/16] net: Rename non_vlan_clients to net_clients

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

There is no longer a distinction between vlan clients and non-vlan
clients in the net core.  The net core only knows about point-to-point
clients which are connected to a peer.  It's time to rename the global
list of net clients since it no longer refers to vlans at all.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net.c b/net.c
index eb2ad06..2ca4285 100644
--- a/net.c
+++ b/net.c
@@ -44,7 +44,7 @@
 # define CONFIG_NET_BRIDGE
 #endif
 
-static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
+static QTAILQ_HEAD(, VLANClientState) net_clients;
 
 int default_net = 1;
 
@@ -165,7 +165,7 @@ static char *assign_name(VLANClientState *vc1, const char 
*model)
 char buf[256];
 int id = 0;
 
-QTAILQ_FOREACH(vc, non_vlan_clients, next) {
+QTAILQ_FOREACH(vc, net_clients, next) {
 if (vc == vc1) {
 continue;
 }
@@ -216,7 +216,7 @@ VLANClientState *qemu_new_net_client(NetClientInfo *info,
 vc-peer = peer;
 peer-peer = vc;
 }
-QTAILQ_INSERT_TAIL(non_vlan_clients, vc, next);
+QTAILQ_INSERT_TAIL(net_clients, vc, next);
 
 vc-send_queue = qemu_new_net_queue(qemu_deliver_packet,
 qemu_deliver_packet_iov,
@@ -248,7 +248,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
 
 static void qemu_cleanup_vlan_client(VLANClientState *vc)
 {
-QTAILQ_REMOVE(non_vlan_clients, vc, next);
+QTAILQ_REMOVE(net_clients, vc, next);
 
 if (vc-info-cleanup) {
 vc-info-cleanup(vc);
@@ -302,7 +302,7 @@ void qemu_foreach_nic(qemu_nic_foreach func, void *opaque)
 {
 VLANClientState *nc;
 
-QTAILQ_FOREACH(nc, non_vlan_clients, next) {
+QTAILQ_FOREACH(nc, net_clients, next) {
 if (nc-info-type == NET_CLIENT_TYPE_NIC) {
 func(DO_UPCAST(NICState, nc, nc), opaque);
 }
@@ -467,7 +467,7 @@ VLANClientState *qemu_find_netdev(const char *id)
 {
 VLANClientState *vc;
 
-QTAILQ_FOREACH(vc, non_vlan_clients, next) {
+QTAILQ_FOREACH(vc, net_clients, next) {
 if (vc-info-type == NET_CLIENT_TYPE_NIC)
 continue;
 if (!strcmp(vc-name, id)) {
@@ -1080,7 +1080,7 @@ void do_info_network(Monitor *mon)
 net_client_type type;
 
 monitor_printf(mon, Devices not on any VLAN:\n);
-QTAILQ_FOREACH(vc, non_vlan_clients, next) {
+QTAILQ_FOREACH(vc, net_clients, next) {
 peer = vc-peer;
 type = vc-info-type;
 if (!peer || type == NET_CLIENT_TYPE_NIC) {
@@ -1133,7 +1133,7 @@ void net_cleanup(void)
 {
 VLANClientState *vc, *next_vc;
 
-QTAILQ_FOREACH_SAFE(vc, non_vlan_clients, next, next_vc) {
+QTAILQ_FOREACH_SAFE(vc, net_clients, next, next_vc) {
 qemu_del_vlan_client(vc);
 }
 }
@@ -1157,7 +1157,7 @@ void net_check_clients(void)
 
 net_hub_check_clients();
 
-QTAILQ_FOREACH(vc, non_vlan_clients, next) {
+QTAILQ_FOREACH(vc, net_clients, next) {
 if (!vc-peer) {
 fprintf(stderr, Warning: %s %s has no peer\n,
 vc-info-type == NET_CLIENT_TYPE_NIC ? nic : netdev,
@@ -1204,7 +1204,7 @@ int net_init_clients(void)
 #endif
 }
 
-QTAILQ_INIT(non_vlan_clients);
+QTAILQ_INIT(net_clients);
 
 if (qemu_opts_foreach(qemu_find_opts(netdev), net_init_netdev, NULL, 1) 
== -1)
 return -1;
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 16/16] hub: add the support for hub own flow control

2012-05-24 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net/hub.c   |   35 ---
 net/hub.h   |2 ++
 net/queue.c |5 +
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/net/hub.c b/net/hub.c
index 8a583ab..d27c52a 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -28,6 +28,7 @@ typedef struct NetHubPort {
 QLIST_ENTRY(NetHubPort) next;
 NetHub *hub;
 unsigned int id;
+uint64_t nr_packets;
 } NetHubPort;
 
 struct NetHub {
@@ -39,19 +40,37 @@ struct NetHub {
 
 static QLIST_HEAD(, NetHub) hubs = QLIST_HEAD_INITIALIZER(hubs);
 
+static void net_hub_receive_completed(NetClientState *nc, ssize_t len)
+{
+NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
+port-nr_packets--;
+if (!port-nr_packets) {
+qemu_net_queue_flush(nc-peer-send_queue);
+}
+}
+
+void net_hub_port_packet_stats(NetClientState *nc)
+{
+NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
+
+port-nr_packets++;
+}
+
 static ssize_t net_hub_receive(NetHub *hub, NetHubPort *source_port,
const uint8_t *buf, size_t len)
 {
 NetHubPort *port;
+ssize_t ret = 0;
 
 QLIST_FOREACH(port, hub-ports, next) {
 if (port == source_port) {
 continue;
 }
 
-qemu_send_packet(port-nc, buf, len);
+   ret = qemu_send_packet_async(port-nc, buf, len,
+net_hub_receive_completed);
 }
-return len;
+return ret;
 }
 
 static ssize_t net_hub_receive_iov(NetHub *hub, NetHubPort *source_port,
@@ -65,7 +84,8 @@ static ssize_t net_hub_receive_iov(NetHub *hub, NetHubPort 
*source_port,
 continue;
 }
 
-ret = qemu_sendv_packet(port-nc, iov, iovcnt);
+ret = qemu_sendv_packet_async(port-nc, iov, iovcnt,
+  net_hub_receive_completed);
 }
 return ret;
 }
@@ -84,6 +104,13 @@ static NetHub *net_hub_new(unsigned int id)
 return hub;
 }
 
+static int net_hub_port_can_receive(NetClientState *nc)
+{
+NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
+
+return port-nr_packets ? 0 : 1;
+}
+
 static ssize_t net_hub_port_receive(NetClientState *nc,
 const uint8_t *buf, size_t len)
 {
@@ -110,6 +137,7 @@ static void net_hub_port_cleanup(NetClientState *nc)
 static NetClientInfo net_hub_port_info = {
 .type = NET_CLIENT_TYPE_HUB,
 .size = sizeof(NetHubPort),
+.can_receive = net_hub_port_can_receive,
 .receive = net_hub_port_receive,
 .receive_iov = net_hub_port_receive_iov,
 .cleanup = net_hub_port_cleanup,
@@ -128,6 +156,7 @@ static NetHubPort *net_hub_port_new(NetHub *hub)
 port = DO_UPCAST(NetHubPort, nc, nc);
 port-id = id;
 port-hub = hub;
+port-nr_packets = 0;
 
 QLIST_INSERT_HEAD(hub-ports, port, next);
 
diff --git a/net/hub.h b/net/hub.h
index d04f1b1..542e657 100644
--- a/net/hub.h
+++ b/net/hub.h
@@ -23,4 +23,6 @@ void net_hub_info(Monitor *mon);
 int net_hub_id_for_client(NetClientState *nc, unsigned int *id);
 void net_hub_check_clients(void);
 
+void net_hub_port_packet_stats(NetClientState *nc);
+
 #endif /* NET_HUB_H */
diff --git a/net/queue.c b/net/queue.c
index 7484d2a..ebf18aa 100644
--- a/net/queue.c
+++ b/net/queue.c
@@ -22,6 +22,7 @@
  */
 
 #include net/queue.h
+#include net/hub.h
 #include qemu-queue.h
 #include net.h
 
@@ -101,6 +102,8 @@ static ssize_t qemu_net_queue_append(NetQueue *queue,
 
 QTAILQ_INSERT_TAIL(queue-packets, packet, entry);
 
+net_hub_port_packet_stats(sender);
+
 return size;
 }
 
@@ -134,6 +137,8 @@ static ssize_t qemu_net_queue_append_iov(NetQueue *queue,
 
 QTAILQ_INSERT_TAIL(queue-packets, packet, entry);
 
+net_hub_port_packet_stats(sender);
+
 return packet-size;
 }
 
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 11/16] net: Rename vc local variables to nc

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Now that VLANClientState has been renamed to NetClientState all 'vc'
local variables should be 'nc'.  Much of the code already used 'nc' but
there are places where 'vc' needs to be renamed.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 hw/ne2000.h |4 +-
 hw/vhost_net.c  |   18 +++---
 net.c   |  214 +++---
 net.h   |   20 +++---
 net/tap-win32.c |8 +-
 net/tap.h   |   16 ++--
 6 files changed, 140 insertions(+), 140 deletions(-)

diff --git a/hw/ne2000.h b/hw/ne2000.h
index 6c196a2..1e7ab07 100644
--- a/hw/ne2000.h
+++ b/hw/ne2000.h
@@ -31,5 +31,5 @@ typedef struct NE2000State {
 void ne2000_setup_io(NE2000State *s, unsigned size);
 extern const VMStateDescription vmstate_ne2000;
 void ne2000_reset(NE2000State *s);
-int ne2000_can_receive(NetClientState *vc);
-ssize_t ne2000_receive(NetClientState *vc, const uint8_t *buf, size_t size_);
+int ne2000_can_receive(NetClientState *nc);
+ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_);
diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index c3e6546..c2d90df 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -42,7 +42,7 @@ struct vhost_net {
 struct vhost_dev dev;
 struct vhost_virtqueue vqs[2];
 int backend;
-NetClientState *vc;
+NetClientState *nc;
 };
 
 unsigned vhost_net_get_features(struct vhost_net *net, unsigned features)
@@ -104,7 +104,7 @@ struct vhost_net *vhost_net_init(NetClientState *backend, 
int devfd,
 if (r  0) {
 goto fail;
 }
-net-vc = backend;
+net-nc = backend;
 net-dev.backend_features = tap_has_vnet_hdr(backend) ? 0 :
 (1  VHOST_NET_F_VIRTIO_NET_HDR);
 net-backend = r;
@@ -151,7 +151,7 @@ int vhost_net_start(struct vhost_net *net,
 goto fail_notifiers;
 }
 if (net-dev.acked_features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
-tap_set_vnet_hdr_len(net-vc,
+tap_set_vnet_hdr_len(net-nc,
  sizeof(struct virtio_net_hdr_mrg_rxbuf));
 }
 
@@ -160,7 +160,7 @@ int vhost_net_start(struct vhost_net *net,
 goto fail_start;
 }
 
-net-vc-info-poll(net-vc, false);
+net-nc-info-poll(net-nc, false);
 qemu_set_fd_handler(net-backend, NULL, NULL, NULL);
 file.fd = net-backend;
 for (file.index = 0; file.index  net-dev.nvqs; ++file.index) {
@@ -177,10 +177,10 @@ fail:
 int r = ioctl(net-dev.control, VHOST_NET_SET_BACKEND, file);
 assert(r = 0);
 }
-net-vc-info-poll(net-vc, true);
+net-nc-info-poll(net-nc, true);
 vhost_dev_stop(net-dev, dev);
 if (net-dev.acked_features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
-tap_set_vnet_hdr_len(net-vc, sizeof(struct virtio_net_hdr));
+tap_set_vnet_hdr_len(net-nc, sizeof(struct virtio_net_hdr));
 }
 fail_start:
 vhost_dev_disable_notifiers(net-dev, dev);
@@ -197,10 +197,10 @@ void vhost_net_stop(struct vhost_net *net,
 int r = ioctl(net-dev.control, VHOST_NET_SET_BACKEND, file);
 assert(r = 0);
 }
-net-vc-info-poll(net-vc, true);
+net-nc-info-poll(net-nc, true);
 vhost_dev_stop(net-dev, dev);
 if (net-dev.acked_features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
-tap_set_vnet_hdr_len(net-vc, sizeof(struct virtio_net_hdr));
+tap_set_vnet_hdr_len(net-nc, sizeof(struct virtio_net_hdr));
 }
 vhost_dev_disable_notifiers(net-dev, dev);
 }
@@ -209,7 +209,7 @@ void vhost_net_cleanup(struct vhost_net *net)
 {
 vhost_dev_cleanup(net-dev);
 if (net-dev.acked_features  (1  VIRTIO_NET_F_MRG_RXBUF)) {
-tap_set_vnet_hdr_len(net-vc, sizeof(struct virtio_net_hdr));
+tap_set_vnet_hdr_len(net-nc, sizeof(struct virtio_net_hdr));
 }
 g_free(net);
 }
diff --git a/net.c b/net.c
index de18c76..10fb601 100644
--- a/net.c
+++ b/net.c
@@ -129,11 +129,11 @@ int parse_host_port(struct sockaddr_in *saddr, const char 
*str)
 return 0;
 }
 
-void qemu_format_nic_info_str(NetClientState *vc, uint8_t macaddr[6])
+void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6])
 {
-snprintf(vc-info_str, sizeof(vc-info_str),
+snprintf(nc-info_str, sizeof(nc-info_str),
  model=%s,macaddr=%02x:%02x:%02x:%02x:%02x:%02x,
- vc-model,
+ nc-model,
  macaddr[0], macaddr[1], macaddr[2],
  macaddr[3], macaddr[4], macaddr[5]);
 }
@@ -159,19 +159,19 @@ void qemu_macaddr_default_if_unset(MACAddr *macaddr)
  * Only net clients created with the legacy -net option need this.  Naming is
  * mandatory for net clients created with -netdev.
  */
-static char *assign_name(NetClientState *vc1, const char *model)
+static char *assign_name(NetClientState *nc1, const char *model)
 {
-NetClientState *vc;
+NetClientState *nc;
 char buf[256];
 int id = 0;
 
-QTAILQ_FOREACH(vc, net_clients, next) 

[PATCH v3 13/16] net: Make the monitor output more reasonable hub info

2012-05-24 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net.c |7 ++-
 net/hub.c |2 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/net.c b/net.c
index 61dc28d..79ac51f 100644
--- a/net.c
+++ b/net.c
@@ -887,6 +887,12 @@ static const struct {
 },
 },
 #endif /* CONFIG_NET_BRIDGE */
+[NET_CLIENT_TYPE_HUB] = {
+.type = hubport,
+.desc = {
+{ /* end of list */ }
+},
+},
 };
 
 int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev)
@@ -1079,7 +1085,6 @@ void do_info_network(Monitor *mon)
 NetClientState *nc, *peer;
 net_client_type type;
 
-monitor_printf(mon, Devices not on any VLAN:\n);
 QTAILQ_FOREACH(nc, net_clients, next) {
 peer = nc-peer;
 type = nc-info-type;
diff --git a/net/hub.c b/net/hub.c
index 0cc385e..8a583ab 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -193,7 +193,7 @@ void net_hub_info(Monitor *mon)
 QLIST_FOREACH(hub, hubs, next) {
 monitor_printf(mon, hub %u\n, hub-id);
 QLIST_FOREACH(port, hub-ports, next) {
-monitor_printf(mon, port %u peer %s\n, port-id,
+monitor_printf(mon,\\ %s\n,
port-nc.peer ? port-nc.peer-name : none);
 }
 }
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/16] net: cleanup deliver/deliver_iov func pointers

2012-05-24 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net.c   |   35 +++
 net.h   |   11 +++
 net/queue.c |   13 -
 net/queue.h |   17 ++---
 4 files changed, 28 insertions(+), 48 deletions(-)

diff --git a/net.c b/net.c
index 79ac51f..2d7f445 100644
--- a/net.c
+++ b/net.c
@@ -181,17 +181,6 @@ static char *assign_name(NetClientState *nc1, const char 
*model)
 return g_strdup(buf);
 }
 
-static ssize_t qemu_deliver_packet(NetClientState *sender,
-   unsigned flags,
-   const uint8_t *data,
-   size_t size,
-   void *opaque);
-static ssize_t qemu_deliver_packet_iov(NetClientState *sender,
-   unsigned flags,
-   const struct iovec *iov,
-   int iovcnt,
-   void *opaque);
-
 NetClientState *qemu_new_net_client(NetClientInfo *info,
 NetClientState *peer,
 const char *model,
@@ -218,9 +207,7 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
 }
 QTAILQ_INSERT_TAIL(net_clients, nc, next);
 
-nc-send_queue = qemu_new_net_queue(qemu_deliver_packet,
-qemu_deliver_packet_iov,
-nc);
+nc-send_queue = qemu_new_net_queue(nc);
 
 return nc;
 }
@@ -324,11 +311,11 @@ int qemu_can_send_packet(NetClientState *sender)
 return 1;
 }
 
-static ssize_t qemu_deliver_packet(NetClientState *sender,
-   unsigned flags,
-   const uint8_t *data,
-   size_t size,
-   void *opaque)
+ssize_t qemu_deliver_packet(NetClientState *sender,
+unsigned flags,
+const uint8_t *data,
+size_t size,
+void *opaque)
 {
 NetClientState *nc = opaque;
 ssize_t ret;
@@ -421,11 +408,11 @@ static ssize_t nc_sendv_compat(NetClientState *nc, const 
struct iovec *iov,
 return nc-info-receive(nc, buffer, offset);
 }
 
-static ssize_t qemu_deliver_packet_iov(NetClientState *sender,
-   unsigned flags,
-   const struct iovec *iov,
-   int iovcnt,
-   void *opaque)
+ssize_t qemu_deliver_packet_iov(NetClientState *sender,
+unsigned flags,
+const struct iovec *iov,
+int iovcnt,
+void *opaque)
 {
 NetClientState *nc = opaque;
 
diff --git a/net.h b/net.h
index 250669a..7779b6a 100644
--- a/net.h
+++ b/net.h
@@ -112,6 +112,17 @@ void qemu_check_nic_model(NICInfo *nd, const char *model);
 int qemu_find_nic_model(NICInfo *nd, const char * const *models,
 const char *default_model);
 
+ssize_t qemu_deliver_packet(NetClientState *sender,
+unsigned flags,
+const uint8_t *data,
+size_t size,
+void *opaque);
+ssize_t qemu_deliver_packet_iov(NetClientState *sender,
+unsigned flags,
+const struct iovec *iov,
+int iovcnt,
+void *opaque);
+
 void do_info_network(Monitor *mon);
 
 /* NIC info */
diff --git a/net/queue.c b/net/queue.c
index 35c3463..0afd783 100644
--- a/net/queue.c
+++ b/net/queue.c
@@ -23,6 +23,7 @@
 
 #include net/queue.h
 #include qemu-queue.h
+#include net.h
 
 /* The delivery handler may only return zero if it will call
  * qemu_net_queue_flush() when it determines that it is once again able
@@ -48,8 +49,6 @@ struct NetPacket {
 };
 
 struct NetQueue {
-NetPacketDeliver *deliver;
-NetPacketDeliverIOV *deliver_iov;
 void *opaque;
 
 QTAILQ_HEAD(packets, NetPacket) packets;
@@ -57,16 +56,12 @@ struct NetQueue {
 unsigned delivering : 1;
 };
 
-NetQueue *qemu_new_net_queue(NetPacketDeliver *deliver,
- NetPacketDeliverIOV *deliver_iov,
- void *opaque)
+NetQueue *qemu_new_net_queue(void *opaque)
 {
 NetQueue *queue;
 
 queue = g_malloc0(sizeof(NetQueue));
 
-queue-deliver = deliver;
-queue-deliver_iov = deliver_iov;
 queue-opaque = opaque;
 
 QTAILQ_INIT(queue-packets);
@@ -151,7 +146,7 @@ static ssize_t qemu_net_queue_deliver(NetQueue *queue,
 ssize_t ret = -1;
 
 queue-delivering = 1;
-ret = queue-deliver(sender, flags, 

[PATCH v3 12/16] net: Rename qemu_del_vlan_client() to qemu_del_net_client()

2012-05-24 Thread zwu . kernel
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Another step in moving the vlan feature out of net core.  Users only
deal with NetClientState and therefore qemu_del_vlan_client() should be
named qemu_del_net_client().

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 hw/e1000.c   |2 +-
 hw/eepro100.c|2 +-
 hw/ne2000.c  |2 +-
 hw/pcnet-pci.c   |2 +-
 hw/rtl8139.c |2 +-
 hw/usb/dev-network.c |2 +-
 hw/virtio-net.c  |2 +-
 hw/xen_nic.c |2 +-
 net.c|   20 ++--
 net.h|2 +-
 net/slirp.c  |2 +-
 11 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 8c7fd3b..cf1e124 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1201,7 +1201,7 @@ pci_e1000_uninit(PCIDevice *dev)
 qemu_free_timer(d-autoneg_timer);
 memory_region_destroy(d-mmio);
 memory_region_destroy(d-io);
-qemu_del_vlan_client(d-nic-nc);
+qemu_del_net_client(d-nic-nc);
 return 0;
 }
 
diff --git a/hw/eepro100.c b/hw/eepro100.c
index 5725ccf..0217795 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1840,7 +1840,7 @@ static int pci_nic_uninit(PCIDevice *pci_dev)
 memory_region_destroy(s-flash_bar);
 vmstate_unregister(pci_dev-qdev, s-vmstate, s);
 eeprom93xx_free(pci_dev-qdev, s-eeprom);
-qemu_del_vlan_client(s-nic-nc);
+qemu_del_net_client(s-nic-nc);
 return 0;
 }
 
diff --git a/hw/ne2000.c b/hw/ne2000.c
index 2339725..e8b1d68 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -750,7 +750,7 @@ static int pci_ne2000_exit(PCIDevice *pci_dev)
 NE2000State *s = d-ne2000;
 
 memory_region_destroy(s-io);
-qemu_del_vlan_client(s-nic-nc);
+qemu_del_net_client(s-nic-nc);
 return 0;
 }
 
diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index 8c82667..8bbad47 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -279,7 +279,7 @@ static int pci_pcnet_uninit(PCIDevice *dev)
 memory_region_destroy(d-io_bar);
 qemu_del_timer(d-state.poll_timer);
 qemu_free_timer(d-state.poll_timer);
-qemu_del_vlan_client(d-state.nic-nc);
+qemu_del_net_client(d-state.nic-nc);
 return 0;
 }
 
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 1e4f4eb..3642fcb 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -3448,7 +3448,7 @@ static int pci_rtl8139_uninit(PCIDevice *dev)
 }
 qemu_del_timer(s-timer);
 qemu_free_timer(s-timer);
-qemu_del_vlan_client(s-nic-nc);
+qemu_del_net_client(s-nic-nc);
 return 0;
 }
 
diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index 21e0069..4bd4243 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1309,7 +1309,7 @@ static void usb_net_handle_destroy(USBDevice *dev)
 
 /* TODO: remove the nd_table[] entry */
 rndis_clear_responsequeue(s);
-qemu_del_vlan_client(s-nic-nc);
+qemu_del_net_client(s-nic-nc);
 }
 
 static NetClientInfo net_usbnet_info = {
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index a73c523..d5527d4 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -1077,6 +1077,6 @@ void virtio_net_exit(VirtIODevice *vdev)
 qemu_bh_delete(n-tx_bh);
 }
 
-qemu_del_vlan_client(n-nic-nc);
+qemu_del_net_client(n-nic-nc);
 virtio_cleanup(n-vdev);
 }
diff --git a/hw/xen_nic.c b/hw/xen_nic.c
index 6391a04..ba4a45c 100644
--- a/hw/xen_nic.c
+++ b/hw/xen_nic.c
@@ -409,7 +409,7 @@ static void net_disconnect(struct XenDevice *xendev)
 netdev-rxs = NULL;
 }
 if (netdev-nic) {
-qemu_del_vlan_client(netdev-nic-nc);
+qemu_del_net_client(netdev-nic-nc);
 netdev-nic = NULL;
 }
 }
diff --git a/net.c b/net.c
index 10fb601..61dc28d 100644
--- a/net.c
+++ b/net.c
@@ -246,7 +246,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
 return nic;
 }
 
-static void qemu_cleanup_vlan_client(NetClientState *nc)
+static void qemu_cleanup_net_client(NetClientState *nc)
 {
 QTAILQ_REMOVE(net_clients, nc, next);
 
@@ -255,7 +255,7 @@ static void qemu_cleanup_vlan_client(NetClientState *nc)
 }
 }
 
-static void qemu_free_vlan_client(NetClientState *nc)
+static void qemu_free_net_client(NetClientState *nc)
 {
 if (nc-send_queue) {
 qemu_del_net_queue(nc-send_queue);
@@ -268,7 +268,7 @@ static void qemu_free_vlan_client(NetClientState *nc)
 g_free(nc);
 }
 
-void qemu_del_vlan_client(NetClientState *nc)
+void qemu_del_net_client(NetClientState *nc)
 {
 /* If there is a peer NIC, delete and cleanup client, but do not free. */
 if (nc-peer  nc-peer-info-type == NET_CLIENT_TYPE_NIC) {
@@ -282,7 +282,7 @@ void qemu_del_vlan_client(NetClientState *nc)
 if (nc-peer-info-link_status_changed) {
 nc-peer-info-link_status_changed(nc-peer);
 }
-qemu_cleanup_vlan_client(nc);
+qemu_cleanup_net_client(nc);
 return;
 }
 
@@ -290,12 +290,12 @@ void 

[PATCH v3 15/16] net: determine if packets can be sent before net queue deliver packets

2012-05-24 Thread zwu . kernel
From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 net/queue.c  |8 
 net/slirp.c  |7 ---
 slirp/if.c   |5 -
 slirp/libslirp.h |1 -
 4 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/net/queue.c b/net/queue.c
index 0afd783..7484d2a 100644
--- a/net/queue.c
+++ b/net/queue.c
@@ -176,8 +176,8 @@ ssize_t qemu_net_queue_send(NetQueue *queue,
 {
 ssize_t ret;
 
-if (queue-delivering) {
-return qemu_net_queue_append(queue, sender, flags, data, size, NULL);
+if (queue-delivering || !qemu_can_send_packet(sender)) {
+return qemu_net_queue_append(queue, sender, flags, data, size, 
sent_cb);
 }
 
 ret = qemu_net_queue_deliver(queue, sender, flags, data, size);
@@ -200,8 +200,8 @@ ssize_t qemu_net_queue_send_iov(NetQueue *queue,
 {
 ssize_t ret;
 
-if (queue-delivering) {
-return qemu_net_queue_append_iov(queue, sender, flags, iov, iovcnt, 
NULL);
+if (queue-delivering || !qemu_can_send_packet(sender)) {
+return qemu_net_queue_append_iov(queue, sender, flags, iov, iovcnt, 
sent_cb);
 }
 
 ret = qemu_net_queue_deliver_iov(queue, sender, flags, iov, iovcnt);
diff --git a/net/slirp.c b/net/slirp.c
index a6ede2b..248f7ff 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -96,13 +96,6 @@ static void slirp_smb_cleanup(SlirpState *s);
 static inline void slirp_smb_cleanup(SlirpState *s) { }
 #endif
 
-int slirp_can_output(void *opaque)
-{
-SlirpState *s = opaque;
-
-return qemu_can_send_packet(s-nc);
-}
-
 void slirp_output(void *opaque, const uint8_t *pkt, int pkt_len)
 {
 SlirpState *s = opaque;
diff --git a/slirp/if.c b/slirp/if.c
index 096cf6f..533295d 100644
--- a/slirp/if.c
+++ b/slirp/if.c
@@ -177,11 +177,6 @@ void if_start(Slirp *slirp)
 }
 
 while (ifm_next) {
-/* check if we can really output */
-if (!slirp_can_output(slirp-opaque)) {
-break;
-}
-
 ifm = ifm_next;
 from_batchq = next_from_batchq;
 
diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index 77527ad..9b471b5 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -25,7 +25,6 @@ void slirp_select_poll(fd_set *readfds, fd_set *writefds, 
fd_set *xfds,
 void slirp_input(Slirp *slirp, const uint8_t *pkt, int pkt_len);
 
 /* you must provide the following functions: */
-int slirp_can_output(void *opaque);
 void slirp_output(void *opaque, const uint8_t *pkt, int pkt_len);
 
 int slirp_add_hostfwd(Slirp *slirp, int is_udp,
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Alex Williamson
On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
 MSI interrupt affinity setting on the guest ended always up on vcpu0,
 no matter what.
 IOW writes to /proc/irq/IRQ/smp_affinity are irgnored.
 This patch fixes the MSI IRQ routing and avoids the utter madness of
 tearing down and setting up the interrupt completely when this changes.
 
 Signed-off-by: Thomas Gleixner t...@linutronix.de
 Signed-off-by: Richard Weinberger rich...@nod.at
 ---
  hw/device-assignment.c |   73 
 ++--
  1 files changed, 70 insertions(+), 3 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 09726f9..78d57c8 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -913,6 +913,50 @@ void assigned_dev_update_irqs(void)
  }
  }
  
 +static void assigned_dev_update_msi_route(PCIDevice *pci_dev)
 +{
 +AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 +uint8_t ctrl_byte = pci_get_byte(pci_dev-config + pci_dev-msi_cap +
 +  PCI_MSI_FLAGS);
 +struct kvm_irq_routing_entry *old, new;
 +KVMMsiMessage msg;
 +int r;

Please follow qemu coding style for braces throughout.

 +
 +if (!(ctrl_byte  PCI_MSI_FLAGS_ENABLE))
 + return;
 +
 +msg.addr_lo =  pci_get_long(pci_dev-config + pci_dev-msi_cap +
 + PCI_MSI_ADDRESS_LO);
 +msg.addr_hi =  pci_get_long(pci_dev-config + pci_dev-msi_cap +
 + PCI_MSI_ADDRESS_HI);

Odd, since we only expose a 32bit MSI capability to the guest...

 +msg.data =  pci_get_long(pci_dev-config + pci_dev-msi_cap +
 +  PCI_MSI_DATA_32);

Should be pci_get_word()

 +
 +old = adev-entry;
 +new = *old;
 +new.u.msi.address_lo = msg.addr_lo;
 +new.u.msi.address_hi = msg.addr_hi;
 +new.u.msi.data = msg.data;
 +
 +if (memcmp(old, new, sizeof(new)) == 0)
 +return;
 +
 +r = kvm_update_routing_entry(old, new);

How does this work?  old is now new, so kvm_update_routing_entry() is
never going to match to the existing entry if address_lo or data
actually change.

 +if (r  0) {
 +fprintf(stderr, %s: kvm_update_msi failed: %s\n, __func__,
 +strerror(-r));
 +exit(1);
 +}
 +
 +*old = new;

huh?

 + r = kvm_irqchip_commit_routes(kvm_state);
 + if (r) {
 +fprintf(stderr, %s: kvm_irqchip_commit_routes failed: %s\n, 
 __func__,
 +strerror(-r));
 +exit(1);
 + }
 +}
 +
  static void assigned_dev_update_msi(PCIDevice *pci_dev)
  {
  struct kvm_assigned_irq assigned_irq_data;
 @@ -1116,6 +1160,14 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
 *pci_dev,
  uint32_t virt_val = pci_default_read_config(pci_dev, address, len);
  uint32_t real_val, emulate_mask, full_emulation_mask;
  
 +if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
 +uint32_t msi_start = pci_dev-msi_cap;
 +uint32_t msi_end = msi_start + PCI_MSI_DATA_64 + 3;
 +
 + if (address = msi_start  (address + len)  msi_end)

ranges_overlap() is meant for this.  We only expose a 32bit MSI cap, so
msi_end is wrong.

 +return virt_val;
 +}
 +
  emulate_mask = 0;
  memcpy(emulate_mask, assigned_dev-emulate_config_read + address, len);
  emulate_mask = le32_to_cpu(emulate_mask);
 @@ -1130,6 +1182,17 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice 
 *pci_dev,
  }
  }
  
 +static void handle_cfg_write_msi(PCIDevice *pci_dev, AssignedDevice *adev)
 +{
 +if (!kvm_enabled() || !kvm_irqchip_in_kernel())
 + return;

Unnecessary, device assignment doesn't work otherwise.

 +
 +if (adev-entry  (adev-irq_requested_type  KVM_DEV_IRQ_GUEST_MSI))

Should just be able to test irq_requested_type.

 +assigned_dev_update_msi_route(pci_dev);
 +else
 +assigned_dev_update_msi(pci_dev);
 +}
 +
  static void assigned_dev_pci_write_config(PCIDevice *pci_dev, uint32_t 
 address,
uint32_t val, int len)
  {
 @@ -1155,9 +1218,13 @@ static void assigned_dev_pci_write_config(PCIDevice 
 *pci_dev, uint32_t address,
  }
  }
  if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
 -if (range_covers_byte(address, len,
 -  pci_dev-msi_cap + PCI_MSI_FLAGS)) {
 -assigned_dev_update_msi(pci_dev);
 +uint32_t msi_start = pci_dev-msi_cap;
 +uint32_t msi_end = msi_start + PCI_MSI_DATA_64 + 3;
 +
 +if (address = msi_start  (address + len)  msi_end) {

Use ranges_overlap() please, msi_end is wrong.

 +if (address == msi_start + PCI_MSI_DATA_32)
 +handle_cfg_write_msi(pci_dev, assigned_dev);

Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
+ PCI_MSI_DATA_32) to start with?  But how does this handle the enable
bit?

 +return;
 

Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Richard Weinberger
On Thu, 24 May 2012 13:00:51 -0600, Alex Williamson
alex.william...@redhat.com wrote:
 How does this work?  old is now new, so kvm_update_routing_entry() is
 never going to match to the existing entry if address_lo or data
 actually change.
 
 Apologies, I read memcpy above

No problem. :)
I'll address your comments and send v2 tomorrow.

Thanks,
//richard

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/16] net: Make the monitor output more reasonable hub info

2012-05-24 Thread Jan Kiszka
On 2012-05-24 14:59, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c |7 ++-
  net/hub.c |2 +-
  2 files changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/net.c b/net.c
 index 61dc28d..79ac51f 100644
 --- a/net.c
 +++ b/net.c
 @@ -887,6 +887,12 @@ static const struct {
  },
  },
  #endif /* CONFIG_NET_BRIDGE */
 +[NET_CLIENT_TYPE_HUB] = {
 +.type = hubport,
 +.desc = {
 +{ /* end of list */ }
 +},
 +},
  };
  
  int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev)
 @@ -1079,7 +1085,6 @@ void do_info_network(Monitor *mon)
  NetClientState *nc, *peer;
  net_client_type type;
  
 -monitor_printf(mon, Devices not on any VLAN:\n);
  QTAILQ_FOREACH(nc, net_clients, next) {
  peer = nc-peer;
  type = nc-info-type;
 diff --git a/net/hub.c b/net/hub.c
 index 0cc385e..8a583ab 100644
 --- a/net/hub.c
 +++ b/net/hub.c
 @@ -193,7 +193,7 @@ void net_hub_info(Monitor *mon)
  QLIST_FOREACH(hub, hubs, next) {
  monitor_printf(mon, hub %u\n, hub-id);
  QLIST_FOREACH(port, hub-ports, next) {
 -monitor_printf(mon, port %u peer %s\n, port-id,
 +monitor_printf(mon,\\ %s\n,
 port-nc.peer ? port-nc.peer-name : none);
  }
  }

I still do not agree with this formatting (peer - hubport + hub -
abbreviated peers instead of just hub - peers). But the series has a
higher value than this, and we can fix on top - unless there is a need
for another round anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Jan Kiszka
On 2012-05-24 14:02, Richard Weinberger wrote:
 MSI interrupt affinity setting on the guest ended always up on vcpu0,
 no matter what.
 IOW writes to /proc/irq/IRQ/smp_affinity are irgnored.
 This patch fixes the MSI IRQ routing and avoids the utter madness of
 tearing down and setting up the interrupt completely when this changes.

The device assignment code will soon be significantly refactored in this
regard (MSI/MSI-X handling will use generic QEMU services instead of
open-coding their own bugs). Also for this reason, it would be very good
to explain in the commit log what was broken or missing so that
affinities were not respected.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking

2012-05-24 Thread Luiz Capitulino
On Fri, 25 May 2012 01:59:06 +0800
zwu.ker...@gmail.com wrote:

 From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
 The patchset implements network hub stead of vlan. The main work was done by 
 stefan, and i rebased it to latest QEMU upstream, did some testings and am 
 responsible for pushing it to QEMU upstream.

Honest question: does it really pay off to have this in qemu vs. using one of
the externaly available solutions?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/13] iommu: IOMMU groups for VT-d and AMD-Vi

2012-05-24 Thread Don Dutile

On 05/22/2012 01:04 AM, Alex Williamson wrote:

Add back group support for AMD  Intel.  amd_iommu already tracks
devices and has init and uninit routines to manage groups.
intel-iommu does this on the fly, so we make use of the notifier
support built into iommu groups to create and remove groups.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

  drivers/iommu/amd_iommu.c   |   28 +-
  drivers/iommu/intel-iommu.c |   46 +++
  2 files changed, 73 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 32c00cd..b7e5ddf 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -256,9 +256,11 @@ static bool check_device(struct device *dev)

  static int iommu_init_device(struct device *dev)
  {
-   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
struct iommu_dev_data *dev_data;
+   struct iommu_group *group;
u16 alias;
+   int ret;

if (dev-archdata.iommu)
return 0;
@@ -279,8 +281,30 @@ static int iommu_init_device(struct device *dev)
return -ENOTSUPP;
}
dev_data-alias_data = alias_data;
+
+   dma_pdev = pci_get_bus_and_slot(alias  8, alias  0xff);
+   } else
+   dma_pdev = pdev;
+
+   if (!pdev-is_virtfn  PCI_FUNC(pdev-devfn)  iommu_group_mf
+   pdev-hdr_type == PCI_HEADER_TYPE_NORMAL)
+   dma_pdev = pci_get_slot(pdev-bus,
+   PCI_DEVFN(PCI_SLOT(pdev-devfn), 0));
+
+   group = iommu_group_get(dma_pdev-dev);
+   if (!group) {
+   group = iommu_group_alloc();
+   if (IS_ERR(group))
+   return PTR_ERR(group);
}

+   ret = iommu_group_add_device(group, dev);
+
+   iommu_group_put(group);
+

do you want to do a put if there is a failure in the iommu_group_add_device()?

+   if (ret)
+   return ret;
+
if (pci_iommuv2_capable(pdev)) {
struct amd_iommu *iommu;

@@ -309,6 +333,8 @@ static void iommu_ignore_device(struct device *dev)

  static void iommu_uninit_device(struct device *dev)
  {
+   iommu_group_remove_device(dev);
+
/*
 * Nothing to do here - we keep dev_data around for unplugged devices
 * and reuse it when the device is re-plugged - not doing so would
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d4a0ff7..e63b33b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4087,6 +4087,50 @@ static int intel_iommu_domain_has_cap(struct 
iommu_domain *domain,
return 0;
  }

+static int intel_iommu_add_device(struct device *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pci_dev *bridge, *dma_pdev = pdev;
+   struct iommu_group *group;
+   int ret;
+
+   if (!device_to_iommu(pci_domain_nr(pdev-bus),
+pdev-bus-number, pdev-devfn))
+   return -ENODEV;
+
+   bridge = pci_find_upstream_pcie_bridge(pdev);
+   if (bridge) {
+   if (pci_is_pcie(bridge))
+   dma_pdev = pci_get_domain_bus_and_slot(
+   pci_domain_nr(pdev-bus),
+   bridge-subordinate-number, 0);
+   else
+   dma_pdev = bridge;
+   }
+
+   if (!pdev-is_virtfn  PCI_FUNC(pdev-devfn)  iommu_group_mf
+   pdev-hdr_type == PCI_HEADER_TYPE_NORMAL)
+   dma_pdev = pci_get_slot(pdev-bus,
+   PCI_DEVFN(PCI_SLOT(pdev-devfn), 0));
+
+   group = iommu_group_get(dma_pdev-dev);
+   if (!group) {
+   group = iommu_group_alloc();
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+   }
+
+   ret = iommu_group_add_device(group, dev);
+

ditto.

+   iommu_group_put(group);
+   return ret;
+}
+
+static void intel_iommu_remove_device(struct device *dev)
+{
+   iommu_group_remove_device(dev);
+}
+
  static struct iommu_ops intel_iommu_ops = {
.domain_init= intel_iommu_domain_init,
.domain_destroy = intel_iommu_domain_destroy,
@@ -4096,6 +4140,8 @@ static struct iommu_ops intel_iommu_ops = {
.unmap  = intel_iommu_unmap,
.iova_to_phys   = intel_iommu_iova_to_phys,
.domain_has_cap = intel_iommu_domain_has_cap,
+   .add_device = intel_iommu_add_device,
+   .remove_device  = intel_iommu_remove_device,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
  };




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/13] pci: Add ACS validation utility

2012-05-24 Thread Don Dutile

On 05/22/2012 01:05 AM, Alex Williamson wrote:

In a PCI environment, transactions aren't always required to reach
the root bus before being re-routed.  Intermediate switches between
an endpoint and the root bus can redirect DMA back downstream before
things like IOMMUs have a chance to intervene.  Legacy PCI is always
susceptible to this as it operates on a shared bus.  PCIe added a
new capability to describe and control this behavior, Access Control
Services, or ACS.  The utility function pci_acs_enabled() allows us
to test the ACS capabilities of an individual devices against a set
of flags while pci_acs_path_enabled() tests a complete path from
a given downstream device up to the specified upstream device.  We
also include the ability to add device specific tests as it's
likely we'll see devices that do no implement ACS, but want to
indicate support for various capabilities in this space.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

  drivers/pci/pci.c|   76 ++
  drivers/pci/quirks.c |   29 +++
  include/linux/pci.h  |   10 ++-
  3 files changed, 114 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 111569c..ab6c2a6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2359,6 +2359,82 @@ void pci_enable_acs(struct pci_dev *dev)
  }

  /**
+ * pci_acs_enable - test ACS against required flags for a given device

typo:   ^^^ missing 'd'


+ * @pdev: device to test
+ * @acs_flags: required PCI ACS flags
+ *
+ * Return true if the device supports the provided flags.  Automatically
+ * filters out flags that are not implemented on multifunction devices.
+ */
+bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
+{
+   int pos;
+   u16 ctrl;
+
+   if (pci_dev_specific_acs_enabled(pdev, acs_flags))
+   return true;
+
+   if (!pci_is_pcie(pdev))
+   return false;
+
+   if (pdev-pcie_type == PCI_EXP_TYPE_DOWNSTREAM ||
+   pdev-pcie_type == PCI_EXP_TYPE_ROOT_PORT) {
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return false;
+
+   pci_read_config_word(pdev, pos + PCI_ACS_CTRL,ctrl);
+   if ((ctrl  acs_flags) != acs_flags)
+   return false;
+   } else if (pdev-multifunction) {
+   /* Filter out flags not applicable to multifunction */
+   acs_flags= (PCI_ACS_RR | PCI_ACS_CR |
+ PCI_ACS_EC | PCI_ACS_DT);
+
+   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return false;
+
+   pci_read_config_word(pdev, pos + PCI_ACS_CTRL,ctrl);
+   if ((ctrl  acs_flags) != acs_flags)
+   return false;
+   }
+
+   return true;

or, to reduce duplicated code (which compiler may do?):

/* Filter out flags not applicable to multifunction */
if (pdev-multifunction)
acs_flags = (PCI_ACS_RR | PCI_ACS_CR |
  PCI_ACS_EC | PCI_ACS_DT);

if (pdev-pcie_type == PCI_EXP_TYPE_DOWNSTREAM ||
pdev-pcie_type == PCI_EXP_TYPE_ROOT_PORT ||
pdev-multifunction) {
pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
if (!pos)
return false;
pci_read_config_word(pdev, pos + PCI_ACS_CTRL, ctrl);
if ((ctrl  acs_flags) != acs_flags)
return false;
}

return true;

+}


But the above doesn't handle the case where the RC does not do
peer-to-peer btwn root ports. Per ACS spec, such a RC's root ports
don't need to provide an ACS cap, since peer-to-peer port xfers aren't
allowed/enabled/supported, so by design, the root port is ACS compliant.
ATM, an IOMMU-capable system is a pre-req for VFIO,
and all such systems have an ACS cap, but they may not always be true.


+EXPORT_SYMBOL_GPL(pci_acs_enabled);
+
+/**
+ * pci_acs_path_enable - test ACS flags from start to end in a hierarchy
+ * @start: starting downstream device
+ * @end: ending upstream device or NULL to search to the root bus
+ * @acs_flags: required flags
+ *
+ * Walk up a device tree from start to end testing PCI ACS support.  If
+ * any step along the way does not support the required flags, return false.
+ */
+bool pci_acs_path_enabled(struct pci_dev *start,
+ struct pci_dev *end, u16 acs_flags)
+{
+   struct pci_dev *pdev, *parent = start;
+
+   do {
+   pdev = parent;
+
+   if (!pci_acs_enabled(pdev, acs_flags))
+   return false;
+
+   if (pci_is_root_bus(pdev-bus))
+   return (end == NULL);

doesn't this mean that a caller can't pass the pdev of the root port?
I would think that 

Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Thomas Gleixner
On Thu, 24 May 2012, Alex Williamson wrote:
 On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
  +if (address == msi_start + PCI_MSI_DATA_32)
  +handle_cfg_write_msi(pci_dev, assigned_dev);
 
 Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
 + PCI_MSI_DATA_32) to start with?  But how does this handle the enable
 bit?

The problem with the current implementation is that it only changes
the routing if the msi entry goes from masked to unmasked state.

Linux does not mask the entries on affinity changes and never did,
neither for MSI nor for MSI-X.

I know it's probably not according to the spec, but we can't fix that
retroactively.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/13] pci: Misc pci_reg additions

2012-05-24 Thread Don Dutile

On 05/22/2012 01:05 AM, Alex Williamson wrote:

Fill in many missing definitions and add sizeof fields for many
sections allowing for more extensive config parsing.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---


overall, i'm very glad to see defines instead of hardcoded numbers in the code, 
but


  include/linux/pci_regs.h |  112 +-
  1 files changed, 100 insertions(+), 12 deletions(-)

diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index 4b608f5..379be84 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -26,6 +26,7 @@
   * Under PCI, each device has 256 bytes of configuration address space,
   * of which the first 64 bytes are standardized as follows:
   */
+#define PCI_STD_HEADER_SIZEOF  64
  #define PCI_VENDOR_ID 0x00/* 16 bits */
  #define PCI_DEVICE_ID 0x02/* 16 bits */
  #define PCI_COMMAND   0x04/* 16 bits */
@@ -209,9 +210,12 @@
  #define  PCI_CAP_ID_SHPC  0x0C/* PCI Standard Hot-Plug Controller */
  #define  PCI_CAP_ID_SSVID 0x0D/* Bridge subsystem vendor/device ID */
  #define  PCI_CAP_ID_AGP3  0x0E/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SECDEV 0x0F/* Secure Device */
  #define  PCI_CAP_ID_EXP   0x10/* PCI Express */
  #define  PCI_CAP_ID_MSIX  0x11/* MSI-X */
+#define  PCI_CAP_ID_SATA   0x12/* SATA Data/Index Conf. */
  #define  PCI_CAP_ID_AF0x13/* PCI Advanced Features */
+#define  PCI_CAP_ID_MAXPCI_CAP_ID_AF
  #define PCI_CAP_LIST_NEXT 1   /* Next capability in the list */
  #define PCI_CAP_FLAGS 2   /* Capability defined flags (16 bits) */
  #define PCI_CAP_SIZEOF4
@@ -276,6 +280,7 @@
  #define  PCI_VPD_ADDR_MASK0x7fff  /* Address mask */
  #define  PCI_VPD_ADDR_F   0x8000  /* Write 0, 1 indicates 
completion */
  #define PCI_VPD_DATA  4   /* 32-bits of data returned here */
+#define PCI_CAP_VPD_SIZEOF 8

  /* Slot Identification */

@@ -297,8 +302,10 @@
  #define PCI_MSI_ADDRESS_HI8   /* Upper 32 bits (if 
PCI_MSI_FLAGS_64BIT set) */
  #define PCI_MSI_DATA_32   8   /* 16 bits of data for 32-bit 
devices */
  #define PCI_MSI_MASK_32   12  /* Mask bits register for 
32-bit devices */
+#define PCI_MSI_PENDING_32 16  /* Pending intrs for 32-bit devices */
  #define PCI_MSI_DATA_64   12  /* 16 bits of data for 64-bit 
devices */
  #define PCI_MSI_MASK_64   16  /* Mask bits register for 
64-bit devices */
+#define PCI_MSI_PENDING_64 20  /* Pending intrs for 64-bit devices */

  /* MSI-X registers */
  #define PCI_MSIX_FLAGS2
@@ -308,6 +315,7 @@
  #define PCI_MSIX_TABLE4
  #define PCI_MSIX_PBA  8
  #define  PCI_MSIX_FLAGS_BIRMASK   (7  0)
+#define PCI_CAP_MSIX_SIZEOF12  /* size of MSIX registers */

  /* MSI-X entry's format */
  #define PCI_MSIX_ENTRY_SIZE   16
@@ -338,6 +346,7 @@
  #define  PCI_AF_CTRL_FLR  0x01
  #define PCI_AF_STATUS 5
  #define  PCI_AF_STATUS_TP 0x01
+#define PCI_CAP_AF_SIZEOF  6   /* size of AF registers */

  /* PCI-X registers */

@@ -374,6 +383,9 @@
  #define  PCI_X_STATUS_SPL_ERR 0x2000  /* Rcvd Split Completion Error 
Msg */
  #define  PCI_X_STATUS_266MHZ  0x4000  /* 266 MHz capable */
  #define  PCI_X_STATUS_533MHZ  0x8000  /* 533 MHz capable */
+#define PCI_X_ECC_CSR  8   /* ECC control and status */
+#define PCI_CAP_PCIX_SIZEOF_V0 8   /* size of registers for Version 0 */
+#define PCI_CAP_PCIX_SIZEOF_V1224  /* size for Version 1  2 */

ew!
unlikely that version 12 will ever exist, but why not:
#define PCI_CAP_PCIX_SIZEOF_V1  24
#define PCI_CAP_PCIX_SIZEOF_V2  PCI_CAP_PCIX_SIZEOF_V1




  /* PCI Bridge Subsystem ID registers */

@@ -462,6 +474,7 @@
  #define  PCI_EXP_LNKSTA_DLLLA 0x2000  /* Data Link Layer Link Active */
  #define  PCI_EXP_LNKSTA_LBMS  0x4000  /* Link Bandwidth Management Status */
  #define  PCI_EXP_LNKSTA_LABS  0x8000  /* Link Autonomous Bandwidth Status */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V1 20  /* v1 endpoints end here */
  #define PCI_EXP_SLTCAP20  /* Slot Capabilities */
  #define  PCI_EXP_SLTCAP_ABP   0x0001 /* Attention Button Present */
  #define  PCI_EXP_SLTCAP_PCP   0x0002 /* Power Controller Present */
@@ -521,6 +534,7 @@
  #define  PCI_EXP_OBFF_MSGA_EN 0x2000  /* OBFF enable with Message type A */
  #define  PCI_EXP_OBFF_MSGB_EN 0x4000  /* OBFF enable with Message type B */
  #define  PCI_EXP_OBFF_WAKE_EN 0x6000  /* OBFF using WAKE# signaling */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 44  /* v2 endpoints end here */
  #define PCI_EXP_LNKCTL2   48  /* Link Control 2 */
  #define PCI_EXP_SLTCTL2   56  /* Slot Control 2 */

@@ -529,23 

Re: [PATCH v2 03/13] iommu: IOMMU groups for VT-d and AMD-Vi

2012-05-24 Thread Alex Williamson
On Thu, 2012-05-24 at 17:01 -0400, Don Dutile wrote:
 On 05/22/2012 01:04 AM, Alex Williamson wrote:
  Add back group support for AMD  Intel.  amd_iommu already tracks
  devices and has init and uninit routines to manage groups.
  intel-iommu does this on the fly, so we make use of the notifier
  support built into iommu groups to create and remove groups.
 
  Signed-off-by: Alex Williamsonalex.william...@redhat.com
  ---
 
drivers/iommu/amd_iommu.c   |   28 +-
drivers/iommu/intel-iommu.c |   46 
  +++
2 files changed, 73 insertions(+), 1 deletions(-)
 
  diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
  index 32c00cd..b7e5ddf 100644
  --- a/drivers/iommu/amd_iommu.c
  +++ b/drivers/iommu/amd_iommu.c
  @@ -256,9 +256,11 @@ static bool check_device(struct device *dev)
 
static int iommu_init_device(struct device *dev)
{
  -   struct pci_dev *pdev = to_pci_dev(dev);
  +   struct pci_dev *dma_pdev, *pdev = to_pci_dev(dev);
  struct iommu_dev_data *dev_data;
  +   struct iommu_group *group;
  u16 alias;
  +   int ret;
 
  if (dev-archdata.iommu)
  return 0;
  @@ -279,8 +281,30 @@ static int iommu_init_device(struct device *dev)
  return -ENOTSUPP;
  }
  dev_data-alias_data = alias_data;
  +
  +   dma_pdev = pci_get_bus_and_slot(alias  8, alias  0xff);
  +   } else
  +   dma_pdev = pdev;
  +
  +   if (!pdev-is_virtfn  PCI_FUNC(pdev-devfn)  iommu_group_mf
  +   pdev-hdr_type == PCI_HEADER_TYPE_NORMAL)
  +   dma_pdev = pci_get_slot(pdev-bus,
  +   PCI_DEVFN(PCI_SLOT(pdev-devfn), 0));
  +
  +   group = iommu_group_get(dma_pdev-dev);
  +   if (!group) {
  +   group = iommu_group_alloc();
  +   if (IS_ERR(group))
  +   return PTR_ERR(group);
  }
 
  +   ret = iommu_group_add_device(group, dev);
  +
  +   iommu_group_put(group);
  +
 do you want to do a put if there is a failure in the iommu_group_add_device()?

Yes, this was intentional.  iommu_group_alloc() adds a reference to the
group it returns so that it doesn't disappear while we're working on it
(documented in iommu.c).  iommu_group_get() also gets a reference.  So
this put will free the group if it was new or just drop the reference if
existing.  Thanks,

Alex

  +   if (ret)
  +   return ret;
  +
  if (pci_iommuv2_capable(pdev)) {
  struct amd_iommu *iommu;
 
  @@ -309,6 +333,8 @@ static void iommu_ignore_device(struct device *dev)
 
static void iommu_uninit_device(struct device *dev)
{
  +   iommu_group_remove_device(dev);
  +
  /*
   * Nothing to do here - we keep dev_data around for unplugged devices
   * and reuse it when the device is re-plugged - not doing so would
  diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
  index d4a0ff7..e63b33b 100644
  --- a/drivers/iommu/intel-iommu.c
  +++ b/drivers/iommu/intel-iommu.c
  @@ -4087,6 +4087,50 @@ static int intel_iommu_domain_has_cap(struct 
  iommu_domain *domain,
  return 0;
}
 
  +static int intel_iommu_add_device(struct device *dev)
  +{
  +   struct pci_dev *pdev = to_pci_dev(dev);
  +   struct pci_dev *bridge, *dma_pdev = pdev;
  +   struct iommu_group *group;
  +   int ret;
  +
  +   if (!device_to_iommu(pci_domain_nr(pdev-bus),
  +pdev-bus-number, pdev-devfn))
  +   return -ENODEV;
  +
  +   bridge = pci_find_upstream_pcie_bridge(pdev);
  +   if (bridge) {
  +   if (pci_is_pcie(bridge))
  +   dma_pdev = pci_get_domain_bus_and_slot(
  +   pci_domain_nr(pdev-bus),
  +   bridge-subordinate-number, 0);
  +   else
  +   dma_pdev = bridge;
  +   }
  +
  +   if (!pdev-is_virtfn  PCI_FUNC(pdev-devfn)  iommu_group_mf
  +   pdev-hdr_type == PCI_HEADER_TYPE_NORMAL)
  +   dma_pdev = pci_get_slot(pdev-bus,
  +   PCI_DEVFN(PCI_SLOT(pdev-devfn), 0));
  +
  +   group = iommu_group_get(dma_pdev-dev);
  +   if (!group) {
  +   group = iommu_group_alloc();
  +   if (IS_ERR(group))
  +   return PTR_ERR(group);
  +   }
  +
  +   ret = iommu_group_add_device(group, dev);
  +
 ditto.
  +   iommu_group_put(group);
  +   return ret;
  +}
  +
  +static void intel_iommu_remove_device(struct device *dev)
  +{
  +   iommu_group_remove_device(dev);
  +}
  +
static struct iommu_ops intel_iommu_ops = {
  .domain_init= intel_iommu_domain_init,
  .domain_destroy = intel_iommu_domain_destroy,
  @@ -4096,6 +4140,8 @@ static struct iommu_ops intel_iommu_ops = {
  .unmap  = intel_iommu_unmap,
  .iova_to_phys   = intel_iommu_iova_to_phys,
  .domain_has_cap = intel_iommu_domain_has_cap,
  +   .add_device = intel_iommu_add_device,
  

Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Jan Kiszka
On 2012-05-24 18:39, Thomas Gleixner wrote:
 On Thu, 24 May 2012, Alex Williamson wrote:
 On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
 +if (address == msi_start + PCI_MSI_DATA_32)
 +handle_cfg_write_msi(pci_dev, assigned_dev);

 Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
 + PCI_MSI_DATA_32) to start with?  But how does this handle the enable
 bit?
 
 The problem with the current implementation is that it only changes
 the routing if the msi entry goes from masked to unmasked state.
 
 Linux does not mask the entries on affinity changes and never did,
 neither for MSI nor for MSI-X.
 
 I know it's probably not according to the spec, but we can't fix that
 retroactively.

For MSI, this is allowed. For MSI-X, this would clearly be a Linux bug,
waiting for hardware to dislike this spec violation.

However, if this is the current behavior of such a prominent guest, I
guess we have to stop optimizing the QEMU MSI-X code that it only
updates routings on mask changes. Possibly other OSes get this wrong too...

Thanks, for the clarification. Should go into the changelog.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/13] IOMMU Groups + VFIO

2012-05-24 Thread Don Dutile

On 05/22/2012 01:04 AM, Alex Williamson wrote:

Version 2 incorporating acks and feedback from v1.  The PCI DMA quirk
and ACS check are reworked, sysfs iommu groups ABI Documentation
added as well as numerous other fixes, including patches from Alexey
Kardashevskiy towards supporting POWER usage of VFIO and IOMMU groups.

This series can be found here on top of 3.4:

git://github.com/awilliam/linux-vfio.git iommu-group-vfio-20120521

The Qemu tree has also been updated to Qemu 1.1 and can be found here:

git://github.com/awilliam/qemu-vfio.git iommu-group-vfio

I'd really like to make a push to get this in for 3.5, so let's talk
about how to do that across iommu, pci, and new driver.  Joerg, are
you sufficiently happy with the IOMMU group concept and code?  We'll
also need David Woodhouse buyin on the intel-iommu changes in patches
3  6.  Who needs to approve VFIO as a new driver, GregKH?  Bjorn,
I'd be happy to send the PCI changes as a series for you, but I
wonder if it makes sense to collect acks for them if you approve and
bundle them in with the associated code that needs them so you're
not left with unused code.  Let me know which you prefer.  If there
are better ways to do it, please let me know.  Thanks,

Alex

---

ack to 1,2,4,6,8,10  11.
provided some minor feedback on 3,9,12.
have to do final review of the big stuff, 7  13.


Alex Williamson (13):
   vfio: Add PCI device driver
   pci: Misc pci_reg additions
   pci: Create common pcibios_err_to_errno
   pci: export pci_user functions for use by other drivers
   vfio: x86 IOMMU implementation
   vfio: Add documentation
   vfio: VFIO core
   iommu: Make use of DMA quirking and ACS enabled check for groups
   pci: Add ACS validation utility
   pci: Add PCI DMA source ID quirk
   iommu: IOMMU groups for VT-d and AMD-Vi
   iommu: IOMMU Groups
   driver core: Add iommu_group tracking to struct device


  .../ABI/testing/sysfs-kernel-iommu_groups  |   14
  Documentation/ioctl/ioctl-number.txt   |1
  Documentation/vfio.txt |  315 
  MAINTAINERS|8
  drivers/Kconfig|2
  drivers/Makefile   |1
  drivers/iommu/amd_iommu.c  |   67 +
  drivers/iommu/intel-iommu.c|   87 +
  drivers/iommu/iommu.c  |  578 +++-
  drivers/pci/access.c   |6
  drivers/pci/pci.c  |   76 +
  drivers/pci/pci.h  |7
  drivers/pci/quirks.c   |   69 +
  drivers/vfio/Kconfig   |   16
  drivers/vfio/Makefile  |3
  drivers/vfio/pci/Kconfig   |8
  drivers/vfio/pci/Makefile  |4
  drivers/vfio/pci/vfio_pci.c|  557 +++
  drivers/vfio/pci/vfio_pci_config.c | 1522 
  drivers/vfio/pci/vfio_pci_intrs.c  |  724 ++
  drivers/vfio/pci/vfio_pci_private.h|   91 +
  drivers/vfio/pci/vfio_pci_rdwr.c   |  269 
  drivers/vfio/vfio.c| 1413 +++
  drivers/vfio/vfio_iommu_x86.c  |  743 ++
  drivers/xen/xen-pciback/conf_space.c   |6
  include/linux/device.h |2
  include/linux/iommu.h  |  104 +
  include/linux/pci.h|   49 +
  include/linux/pci_regs.h   |  112 +
  include/linux/vfio.h   |  444 ++
  30 files changed, 7182 insertions(+), 116 deletions(-)
  create mode 100644 Documentation/ABI/testing/sysfs-kernel-iommu_groups
  create mode 100644 Documentation/vfio.txt
  create mode 100644 drivers/vfio/Kconfig
  create mode 100644 drivers/vfio/Makefile
  create mode 100644 drivers/vfio/pci/Kconfig
  create mode 100644 drivers/vfio/pci/Makefile
  create mode 100644 drivers/vfio/pci/vfio_pci.c
  create mode 100644 drivers/vfio/pci/vfio_pci_config.c
  create mode 100644 drivers/vfio/pci/vfio_pci_intrs.c
  create mode 100644 drivers/vfio/pci/vfio_pci_private.h
  create mode 100644 drivers/vfio/pci/vfio_pci_rdwr.c
  create mode 100644 drivers/vfio/vfio.c
  create mode 100644 drivers/vfio/vfio_iommu_x86.c
  create mode 100644 include/linux/vfio.h


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio-powerpc: enabled and supported on power

2012-05-24 Thread Benjamin Herrenschmidt
On Thu, 2012-05-24 at 09:12 -0600, Alex Williamson wrote:

  --- /dev/null
  +++ b/arch/powerpc/kernel/iommu_vfio.c
 
 Should this be drivers/vfio/vfio_iommu_powerpc.c?

Very minor bike shed painting... too long file names suck, in
this case what's the point of the vfio prefix for files already
in the vfio directory ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Alex Williamson
On Thu, 2012-05-24 at 23:39 +0200, Thomas Gleixner wrote:
 On Thu, 24 May 2012, Alex Williamson wrote:
  On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
   +if (address == msi_start + PCI_MSI_DATA_32)
   +handle_cfg_write_msi(pci_dev, assigned_dev);
  
  Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
  + PCI_MSI_DATA_32) to start with?  But how does this handle the enable
  bit?
 
 The problem with the current implementation is that it only changes
 the routing if the msi entry goes from masked to unmasked state.

We don't expose a maskable MSI capability to the guest, so I think you
mean enable/disable.

 Linux does not mask the entries on affinity changes and never did,
 neither for MSI nor for MSI-X.
 
 I know it's probably not according to the spec, but we can't fix that
 retroactively.

We need to do both then, enable MSI based on the enable bit and update
routing based on address updates.  It seems like this code is counting
on data being written after the enable bit is set, which is not
guaranteed to happen.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Alex Williamson
On Thu, 2012-05-24 at 18:53 -0300, Jan Kiszka wrote:
 On 2012-05-24 18:39, Thomas Gleixner wrote:
  On Thu, 24 May 2012, Alex Williamson wrote:
  On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
  +if (address == msi_start + PCI_MSI_DATA_32)
  +handle_cfg_write_msi(pci_dev, assigned_dev);
 
  Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
  + PCI_MSI_DATA_32) to start with?  But how does this handle the enable
  bit?
  
  The problem with the current implementation is that it only changes
  the routing if the msi entry goes from masked to unmasked state.
  
  Linux does not mask the entries on affinity changes and never did,
  neither for MSI nor for MSI-X.
  
  I know it's probably not according to the spec, but we can't fix that
  retroactively.
 
 For MSI, this is allowed. For MSI-X, this would clearly be a Linux bug,
 waiting for hardware to dislike this spec violation.
 
 However, if this is the current behavior of such a prominent guest, I
 guess we have to stop optimizing the QEMU MSI-X code that it only
 updates routings on mask changes. Possibly other OSes get this wrong too...
 
 Thanks, for the clarification. Should go into the changelog.

Hmm, if Linux didn't mask MSIX before updating vectors it'd not only be
a spec violation, but my testing of the recent changes to fix MSIX
vector updates for exactly this would have failed...

} else if (msix_masked(orig)  !msix_masked(entry)) {
... update vector...

So I'm not entirely sure I believe that.  Thanks,

Alex


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Michael S. Tsirkin
On Thu, May 24, 2012 at 06:53:15PM -0300, Jan Kiszka wrote:
 On 2012-05-24 18:39, Thomas Gleixner wrote:
  On Thu, 24 May 2012, Alex Williamson wrote:
  On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
  +if (address == msi_start + PCI_MSI_DATA_32)
  +handle_cfg_write_msi(pci_dev, assigned_dev);
 
  Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
  + PCI_MSI_DATA_32) to start with?  But how does this handle the enable
  bit?
  
  The problem with the current implementation is that it only changes
  the routing if the msi entry goes from masked to unmasked state.
  
  Linux does not mask the entries on affinity changes and never did,
  neither for MSI nor for MSI-X.
  
  I know it's probably not according to the spec, but we can't fix that
  retroactively.
 
 For MSI, this is allowed. For MSI-X, this would clearly be a Linux bug,
 waiting for hardware to dislike this spec violation.
 
 However, if this is the current behavior of such a prominent guest, I
 guess we have to stop optimizing the QEMU MSI-X code that it only
 updates routings on mask changes. Possibly other OSes get this wrong too...

Very strange, a clear spec violation. I'll have to dig in the source to
verify this.


 Thanks, for the clarification. Should go into the changelog.
 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/13] pci: Misc pci_reg additions

2012-05-24 Thread Alex Williamson
On Thu, 2012-05-24 at 17:49 -0400, Don Dutile wrote:
 On 05/22/2012 01:05 AM, Alex Williamson wrote:
  Fill in many missing definitions and add sizeof fields for many
  sections allowing for more extensive config parsing.
 
  Signed-off-by: Alex Williamsonalex.william...@redhat.com
  ---
 
 overall, i'm very glad to see defines instead of hardcoded numbers in the 
 code, but
 
include/linux/pci_regs.h |  112 
  +-
1 files changed, 100 insertions(+), 12 deletions(-)
 
  diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
  index 4b608f5..379be84 100644
  --- a/include/linux/pci_regs.h
  +++ b/include/linux/pci_regs.h
  @@ -26,6 +26,7 @@
 * Under PCI, each device has 256 bytes of configuration address space,
 * of which the first 64 bytes are standardized as follows:
 */
  +#define PCI_STD_HEADER_SIZEOF  64
#define PCI_VENDOR_ID 0x00/* 16 bits */
#define PCI_DEVICE_ID 0x02/* 16 bits */
#define PCI_COMMAND   0x04/* 16 bits */
  @@ -209,9 +210,12 @@
#define  PCI_CAP_ID_SHPC  0x0C/* PCI Standard Hot-Plug Controller */
#define  PCI_CAP_ID_SSVID 0x0D/* Bridge subsystem vendor/device ID */
#define  PCI_CAP_ID_AGP3  0x0E/* AGP Target PCI-PCI bridge */
  +#define  PCI_CAP_ID_SECDEV 0x0F/* Secure Device */
#define  PCI_CAP_ID_EXP   0x10/* PCI Express */
#define  PCI_CAP_ID_MSIX  0x11/* MSI-X */
  +#define  PCI_CAP_ID_SATA   0x12/* SATA Data/Index Conf. */
#define  PCI_CAP_ID_AF0x13/* PCI Advanced Features */
  +#define  PCI_CAP_ID_MAXPCI_CAP_ID_AF
#define PCI_CAP_LIST_NEXT 1   /* Next capability in the list */
#define PCI_CAP_FLAGS 2   /* Capability defined flags (16 
  bits) */
#define PCI_CAP_SIZEOF4
  @@ -276,6 +280,7 @@
#define  PCI_VPD_ADDR_MASK0x7fff  /* Address mask */
#define  PCI_VPD_ADDR_F   0x8000  /* Write 0, 1 indicates 
  completion */
#define PCI_VPD_DATA  4   /* 32-bits of data returned 
  here */
  +#define PCI_CAP_VPD_SIZEOF 8
 
/* Slot Identification */
 
  @@ -297,8 +302,10 @@
#define PCI_MSI_ADDRESS_HI8   /* Upper 32 bits (if 
  PCI_MSI_FLAGS_64BIT set) */
#define PCI_MSI_DATA_32   8   /* 16 bits of data for 32-bit 
  devices */
#define PCI_MSI_MASK_32   12  /* Mask bits register for 
  32-bit devices */
  +#define PCI_MSI_PENDING_32 16  /* Pending intrs for 32-bit devices */
#define PCI_MSI_DATA_64   12  /* 16 bits of data for 64-bit 
  devices */
#define PCI_MSI_MASK_64   16  /* Mask bits register for 
  64-bit devices */
  +#define PCI_MSI_PENDING_64 20  /* Pending intrs for 64-bit devices */
 
/* MSI-X registers */
#define PCI_MSIX_FLAGS2
  @@ -308,6 +315,7 @@
#define PCI_MSIX_TABLE4
#define PCI_MSIX_PBA  8
#define  PCI_MSIX_FLAGS_BIRMASK   (7  0)
  +#define PCI_CAP_MSIX_SIZEOF12  /* size of MSIX registers */
 
/* MSI-X entry's format */
#define PCI_MSIX_ENTRY_SIZE   16
  @@ -338,6 +346,7 @@
#define  PCI_AF_CTRL_FLR  0x01
#define PCI_AF_STATUS 5
#define  PCI_AF_STATUS_TP 0x01
  +#define PCI_CAP_AF_SIZEOF  6   /* size of AF registers */
 
/* PCI-X registers */
 
  @@ -374,6 +383,9 @@
#define  PCI_X_STATUS_SPL_ERR 0x2000  /* Rcvd Split 
  Completion Error Msg */
#define  PCI_X_STATUS_266MHZ  0x4000  /* 266 MHz capable */
#define  PCI_X_STATUS_533MHZ  0x8000  /* 533 MHz capable */
  +#define PCI_X_ECC_CSR  8   /* ECC control and status */
  +#define PCI_CAP_PCIX_SIZEOF_V0 8   /* size of registers for 
  Version 0 */
  +#define PCI_CAP_PCIX_SIZEOF_V1224  /* size for Version 1  2 */
 ew!
 unlikely that version 12 will ever exist, but why not:
 #define PCI_CAP_PCIX_SIZEOF_V124
 #define PCI_CAP_PCIX_SIZEOF_V2PCI_CAP_PCIX_SIZEOF_V1

Works for me, will fix.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/13] vfio: x86 IOMMU implementation

2012-05-24 Thread Alex Williamson
On Thu, 2012-05-24 at 17:38 -0400, Don Dutile wrote:
 On 05/22/2012 01:05 AM, Alex Williamson wrote:
  x86 is probably the wrong name for this VFIO IOMMU driver, but x86
  is the primary target for it.  This driver support a very simple
  usage model using the existing IOMMU API.  The IOMMU is expected to
  support the full host address space with no special IOVA windows,
  number of mappings restrictions, or unique processor target options.
 
  Signed-off-by: Alex Williamsonalex.william...@redhat.com
  ---
 
Documentation/ioctl/ioctl-number.txt |2
drivers/vfio/Kconfig |6
drivers/vfio/Makefile|2
drivers/vfio/vfio.c  |7
drivers/vfio/vfio_iommu_x86.c|  743 
  ++
include/linux/vfio.h |   52 ++
6 files changed, 811 insertions(+), 1 deletions(-)
create mode 100644 drivers/vfio/vfio_iommu_x86.c
 
  diff --git a/Documentation/ioctl/ioctl-number.txt 
  b/Documentation/ioctl/ioctl-number.txt
  index 111e30a..9d1694e 100644
  --- a/Documentation/ioctl/ioctl-number.txt
  +++ b/Documentation/ioctl/ioctl-number.txt
  @@ -88,7 +88,7 @@ Code  Seq#(hex)   Include FileComments
  and kernel/power/user.c
'8'   all SNP8023 advanced NIC card
  mailto:m...@solidum.com
  -';'64-6F   linux/vfio.h
  +';'64-72   linux/vfio.h
'@'   00-0F   linux/radeonfb.hconflict!
'@'   00-0F   drivers/video/aty/aty128fb.cconflict!
'A'   00-1F   linux/apm_bios.hconflict!
  diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
  index 9acb1e7..bd88a30 100644
  --- a/drivers/vfio/Kconfig
  +++ b/drivers/vfio/Kconfig
  @@ -1,6 +1,12 @@
  +config VFIO_IOMMU_X86
  +   tristate
  +   depends on VFIO  X86
  +   default n
  +
menuconfig VFIO
  tristate VFIO Non-Privileged userspace driver framework
  depends on IOMMU_API
  +   select VFIO_IOMMU_X86 if X86
  help
VFIO provides a framework for secure userspace device drivers.
See Documentation/vfio.txt for more details.
 
 So a future refactoring that uses some chunk of this support
 on a non-x86 machine could be a lot of useless renaming.
 
 Why not rename vfio_iommu_x86 to something like vfio_iommu_no_iova
 and just make it conditionally compiled on X86 (as you've done above in 
 Kconfig's)?
 Then if another arch can use it, or refactors the file to use
 some of it, and split x86 vs other-arch into separate per-arch files,
 or per-iova schemes, it's more descriptive and less disruptive?

Yep, the problem is how to concisely describe what we expect to support
here.  This file supports IOMMU API based usage of an IOMMU with
effectively no DMA window or mapping constraints, optimized for static
mapping of an address space.  What's a good name for that?  Maybe I
should follow the example of others and just call it a Type 1 IOMMU
implementation so the marketing material looks better!  ;-P  That may
honestly be better than calling it x86.  Thoughts?  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Thomas Gleixner
On Thu, 24 May 2012, Alex Williamson wrote:
 On Thu, 2012-05-24 at 18:53 -0300, Jan Kiszka wrote:
  On 2012-05-24 18:39, Thomas Gleixner wrote:
   On Thu, 24 May 2012, Alex Williamson wrote:
   On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
   +if (address == msi_start + PCI_MSI_DATA_32)
   +handle_cfg_write_msi(pci_dev, assigned_dev);
  
   Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
   + PCI_MSI_DATA_32) to start with?  But how does this handle the enable
   bit?
   
   The problem with the current implementation is that it only changes
   the routing if the msi entry goes from masked to unmasked state.
   
   Linux does not mask the entries on affinity changes and never did,
   neither for MSI nor for MSI-X.
   
   I know it's probably not according to the spec, but we can't fix that
   retroactively.
  
  For MSI, this is allowed. For MSI-X, this would clearly be a Linux bug,
  waiting for hardware to dislike this spec violation.
  
  However, if this is the current behavior of such a prominent guest, I
  guess we have to stop optimizing the QEMU MSI-X code that it only
  updates routings on mask changes. Possibly other OSes get this wrong too...
  
  Thanks, for the clarification. Should go into the changelog.
 
 Hmm, if Linux didn't mask MSIX before updating vectors it'd not only be
 a spec violation, but my testing of the recent changes to fix MSIX
 vector updates for exactly this would have failed...
 
 } else if (msix_masked(orig)  !msix_masked(entry)) {
 ... update vector...
 
 So I'm not entirely sure I believe that.  Thanks,

What happens is:

A write to /proc/irq/$N/smp_affinity calls into irq_set_affinity()
which does:

if (irq_can_move_pcntxt(data)) {
ret = chip-irq_set_affinity(data, mask, false);
} else {
irqd_set_move_pending(data);
irq_copy_pending(desc, mask);
}

MSI and MSI-X fall into the !irq_can_move_pcntxt() code path unless
the irq is remapped, which is not the case in a guest. That means that
we merily copy the new mask and set the move pending bit. 

MSI/MSI-X use the edge handler so on the next incoming interrupt, we
do

  irq_desc-chip-irq_ack()

which ends up in ack_apic_edge() which does:

static void ack_apic_edge(struct irq_data *data)
{
irq_complete_move(data-chip_data);
irq_move_irq(data);
ack_APIC_irq();
}

irq_move_irq() is the interesting function. And that does

  irq_desc-chip-irq_mask()

before calling the irq_set_affinity() function which actually changes
the masks.
  
-irq_mask() ends up in mask_msi_irq(). 

Now that calls msi_set_mask_bit() and for MSI-X that actually masks
the irq. So ignore my MSI-X comment.

But for MSI this ends up in msi_mask_irq() which does:

if (!desc-msi_attrib.maskbit)
   return 0;

So in case desc-msi_attrib.maskbit is 0 we do not write anything out
and then the masked/unmasked logic in qemu fails.

Sorry, that I did not decode that down to this level before, but I was
in a hurry and assumed correctly that qemu is doing something
wrong. Not being familiar with that code did not help either :)

So the proper fix is that qemu tells the guest that mask bit is
supported and catches the mask bit toggling before writing it out to
the hardware for those devices which do not support it.

We'll have another look.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Thomas Gleixner
On Fri, 25 May 2012, Michael S. Tsirkin wrote:
 On Thu, May 24, 2012 at 06:53:15PM -0300, Jan Kiszka wrote:
  On 2012-05-24 18:39, Thomas Gleixner wrote:
   On Thu, 24 May 2012, Alex Williamson wrote:
   On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
   +if (address == msi_start + PCI_MSI_DATA_32)
   +handle_cfg_write_msi(pci_dev, assigned_dev);
  
   Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
   + PCI_MSI_DATA_32) to start with?  But how does this handle the enable
   bit?
   
   The problem with the current implementation is that it only changes
   the routing if the msi entry goes from masked to unmasked state.
   
   Linux does not mask the entries on affinity changes and never did,
   neither for MSI nor for MSI-X.
   
   I know it's probably not according to the spec, but we can't fix that
   retroactively.
  
  For MSI, this is allowed. For MSI-X, this would clearly be a Linux bug,
  waiting for hardware to dislike this spec violation.
  
  However, if this is the current behavior of such a prominent guest, I
  guess we have to stop optimizing the QEMU MSI-X code that it only
  updates routings on mask changes. Possibly other OSes get this wrong too...
 
 Very strange, a clear spec violation. I'll have to dig in the source to
 verify this.

Stop digging. MSI-X is correct.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Thomas Gleixner
On Fri, 25 May 2012, Thomas Gleixner wrote:

 On Fri, 25 May 2012, Michael S. Tsirkin wrote:
  On Thu, May 24, 2012 at 06:53:15PM -0300, Jan Kiszka wrote:
   On 2012-05-24 18:39, Thomas Gleixner wrote:
On Thu, 24 May 2012, Alex Williamson wrote:
On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
+if (address == msi_start + PCI_MSI_DATA_32)
+handle_cfg_write_msi(pci_dev, assigned_dev);
   
Why didn't we just use range_covers_byte(address, len, pci_dev-msi_cap
+ PCI_MSI_DATA_32) to start with?  But how does this handle the enable
bit?

The problem with the current implementation is that it only changes
the routing if the msi entry goes from masked to unmasked state.

Linux does not mask the entries on affinity changes and never did,
neither for MSI nor for MSI-X.

I know it's probably not according to the spec, but we can't fix that
retroactively.
   
   For MSI, this is allowed. For MSI-X, this would clearly be a Linux bug,
   waiting for hardware to dislike this spec violation.
   
   However, if this is the current behavior of such a prominent guest, I
   guess we have to stop optimizing the QEMU MSI-X code that it only
   updates routings on mask changes. Possibly other OSes get this wrong 
   too...
  
  Very strange, a clear spec violation. I'll have to dig in the source to
  verify this.
 
 Stop digging. MSI-X is correct.

This was based off an older version of qemu-kvm, where the routing for
MSI-X was broken for other reasons. But that seems to be fixed now.

I use the age excuse :)

Thanks,

tglx

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Alex Williamson
On Fri, 2012-05-25 at 01:01 +0200, Thomas Gleixner wrote:
 So the proper fix is that qemu tells the guest that mask bit is
 supported and catches the mask bit toggling before writing it out to
 the hardware for those devices which do not support it.

We can't necessarily do that, we have to work with the config space
we're give.  Using the smallest possible MSI capability always works.
Adding mask bits may not fit in with the existing capabilities of the
physical device.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM updates for Linux 3.5

2012-05-24 Thread Linus Torvalds
On Tue, May 22, 2012 at 6:08 AM, Avi Kivity a...@redhat.com wrote:
 Linus, please pull the KVM updates for the 3.5 cycle from

Ok. The way I resolved a the x86 kvm_para.h conflict, I suspect it is
now being *unnecessarily* careful about doing the CPUID thing, and I
think that the cpuid_level  0 check that came in through the x86
tree can be removed in favor of the KVM one.

But I left both checks in as the obvious non-semantic resolution, and
if you guys decide that it's redundant, you can remove it later.

Linus
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Thomas Gleixner
On Thu, 24 May 2012, Alex Williamson wrote:

 On Fri, 2012-05-25 at 01:01 +0200, Thomas Gleixner wrote:
  So the proper fix is that qemu tells the guest that mask bit is
  supported and catches the mask bit toggling before writing it out to
  the hardware for those devices which do not support it.
 
 We can't necessarily do that, we have to work with the config space
 we're give.  Using the smallest possible MSI capability always works.
 Adding mask bits may not fit in with the existing capabilities of the
 physical device.  Thanks,

I see what you mean. A random device driver of a random guest OS might
rely on that information. Unlikely, but 

So we need some logic to circumvent the masked/unmasked logic in case
that property is not set, right ?

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking

2012-05-24 Thread Zhi Yong Wu
On Fri, May 25, 2012 at 4:53 AM, Luiz Capitulino lcapitul...@redhat.com wrote:
 On Fri, 25 May 2012 01:59:06 +0800
 zwu.ker...@gmail.com wrote:

 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 The patchset implements network hub stead of vlan. The main work was done by 
 stefan, and i rebased it to latest QEMU upstream, did some testings and am 
 responsible for pushing it to QEMU upstream.

 Honest question: does it really pay off to have this in qemu vs. using one of
It's said that it can speed up packets delivery, but i have not do
every bechmark testings.
For more details, please refer to
http://thread.gmane.org/gmane.comp.emulators.qemu/133362
 the externaly available solutions?
Is there external available solutions?:), What are they? Open vSwitch?


-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 13/16] net: Make the monitor output more reasonable hub info

2012-05-24 Thread Zhi Yong Wu
On Fri, May 25, 2012 at 4:34 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-05-24 14:59, zwu.ker...@gmail.com wrote:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  net.c     |    7 ++-
  net/hub.c |    2 +-
  2 files changed, 7 insertions(+), 2 deletions(-)

 diff --git a/net.c b/net.c
 index 61dc28d..79ac51f 100644
 --- a/net.c
 +++ b/net.c
 @@ -887,6 +887,12 @@ static const struct {
          },
      },
  #endif /* CONFIG_NET_BRIDGE */
 +    [NET_CLIENT_TYPE_HUB] = {
 +        .type = hubport,
 +        .desc = {
 +            { /* end of list */ }
 +        },
 +    },
  };

  int net_client_init(Monitor *mon, QemuOpts *opts, int is_netdev)
 @@ -1079,7 +1085,6 @@ void do_info_network(Monitor *mon)
      NetClientState *nc, *peer;
      net_client_type type;

 -    monitor_printf(mon, Devices not on any VLAN:\n);
      QTAILQ_FOREACH(nc, net_clients, next) {
          peer = nc-peer;
          type = nc-info-type;
 diff --git a/net/hub.c b/net/hub.c
 index 0cc385e..8a583ab 100644
 --- a/net/hub.c
 +++ b/net/hub.c
 @@ -193,7 +193,7 @@ void net_hub_info(Monitor *mon)
      QLIST_FOREACH(hub, hubs, next) {
          monitor_printf(mon, hub %u\n, hub-id);
          QLIST_FOREACH(port, hub-ports, next) {
 -            monitor_printf(mon,     port %u peer %s\n, port-id,
 +            monitor_printf(mon,    \\ %s\n,
                             port-nc.peer ? port-nc.peer-name : none);
          }
      }

 I still do not agree with this formatting (peer - hubport + hub -
 abbreviated peers instead of just hub - peers). But the series has a
 higher value than this, and we can fix on top - unless there is a need
 for another round anyway.
OK, i agree.

 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: x86: Implement PCID/INVPCID for guests with EPT

2012-05-24 Thread Marcelo Tosatti
On Wed, May 23, 2012 at 05:02:28AM +, Mao, Junjie wrote:
  
   The previous patch regards PCID and INVPCID as a whole because qemu
  doesn't support cpuid leaf 7 configuration at present. This is not the case 
  in this
  version.
The problem with cpu_has_hypervisor check is that its Linux specific.
Any solution should also take into account other OSes running as L1
guest and virtualizing L2 guest.
  
   Is there any other way, which applies to all host hypervisors, to know if 
   kvm is
  running as a guest hypervisor?
  
  The point is that other hypervisor might be running as L1 guest.
  
  The problem with enabling PCID for the L2 guest is that it can share same 
  PCID
  values with the L1 hypervisor.
  
  However, if the L1 hypervisor enables and configures VPID (given that
  the L0 hypervisor emulates and exposes it), there is no problem in enabling
  PCID for both L1 and L2 guests because TLB entries will be differentiated by
  their VPID values, even if their PCID values are the same.
  
 
 This may not be a problem because:
 
 1. If both L1 and L2 use VPID, there's no problem as you have mentioned.
 2. If neither L1 and L2 use VPID, the TLB entries are all tagged with VPID 0 
 and any VM entries or exits will invalidate them.
 3. If one of L1 and L2 uses VPID but the other don't, the TLB entries still 
 have different VPID and won't affect each other.
 
 I haven't thought over exposing PCID to L2 guests before but it seems that no 
 problem exists in exposing PCID to L2 guests. Is it looks ok to you if PCID 
 is always exposed, no matter for L1 or L2 guests?

Yes, it appears to be OK, because of the TLB flush on vm-entry/vm-exit
without VPID (2 above).

Avi ?

  So i think that checking whether EPT _and_ VPID are supported should be a
  precondition to enable PCID support for guests.
  
 
 Should this check be carried out on L0 too? If so, this will add one more 
 dependency of PCID on VPID, which doesn't exist in the manual.

Actually it is not necessary because L0 cannot access data which is
EPT-tagged (and you already require EPT). 

So it is fine to remove the cpu_has_hypervisor check along with
vmx_pcid_supported().


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-05-24 Thread Alexey Kardashevskiy
On 25/05/12 00:41, Alex Williamson wrote:

 [Found while debugging VFIO on POWER but it is platform independent]

 There is a feature in PCI (=2.3?) to mask/unmask INTx via PCI_COMMAND and
 PCI_STATUS registers.

 Yes, 2.3 introduced this. Masking is done via command register, checking
 if the source was the PCI in question via the status register. The
 latter is important for supporting IRQ sharing - and that's why we
 introduced this masking API to the PCI layer.
 And there is some API to support that (commit 
 a2e27787f893621c5a6b865acf6b7766f8671328).

 I have a network adapter:
 0001:00:01.0 Ethernet controller: Chelsio Communications Inc T310 10GbE 
 Single Port Adapter
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
 Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
 MAbort- SERR- PERR- INTx-

 pci_intx_mask_supported() reports that the feature is supported for this 
 adapter
 BUT the adapter does not set PCI_STATUS_INTERRUPT so 
 pci_check_and_set_intx_mask()
 never changes PCI_COMMAND and INTx does not work on it when we use it as 
 VFIO-PCI device.

 If I remove the check of this bit, it works fine as it is called from an 
 interrupt handler and
 Status bit check is redundant.

 Opened a spec:
 PCI LOCAL BUS SPECIFICATION, REV. 3.0, Table 6-2: Status Register Bits
 ===
 3   This read-only bit reflects the state of the interrupt in the
 device/function. Only when the Interrupt Disable bit in the command
 register is a 0 and this Interrupt Status bit is a 1, will the
 device’s/function’s INTx# signal be asserted. Setting the Interrupt
Disable bit to a 1 has no effect on the state of this bit.
 ===
 With this adapter, INTx# is asserted but Status bit is still 0.

 Is it mandatory for a device to set Status bit if it supports INTx masking?

 2 Alex: if it is mandatory, then we need to be able to disable pci_2_3 in 
 VFIO-PCI
 somehow.

 Since PCI 2.3, this bit is mandatory, and it should be independent of
 the masking bit. The question is, if your device is supposed to support
 2.3, thus is just buggy, or if our detection algorithm is unreliable. It
 basically builds on the assumption that, if we can flip the mask bit,
 the feature should be present. I guess that is the best we can do. Maybe
 we can augment this with a blacklist of devices that support flipping
 without actually providing the feature.
 
 Yep, that's what I'd suggest as well, add a blacklist to
 pci_intx_mask_supported() so this device returns false and we require an
 exclusive interrupt for it.  Thanks,

Okay, here is one for the starter:

aik@vpl2:~$ lspci -s 1:1:0.0
0001:01:00.0 Ethernet controller: Chelsio Communications Inc T310 10GbE Single 
Port Adapter
aik@vpl2:~$ lspci -ns 1:1:0.0
0001:01:00.0 0200: 1425:0030



-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-05-24 Thread Alexey Kardashevskiy
On 24/05/12 22:02, Jan Kiszka wrote:
 On 2012-05-24 04:44, Alexey Kardashevskiy wrote:
 [Found while debugging VFIO on POWER but it is platform independent]

 There is a feature in PCI (=2.3?) to mask/unmask INTx via PCI_COMMAND and
 PCI_STATUS registers.
 
 Yes, 2.3 introduced this. Masking is done via command register, checking
 if the source was the PCI in question via the status register. The
 latter is important for supporting IRQ sharing - and that's why we
 introduced this masking API to the PCI layer.


Is not it just a quite small optimization to not to disable interrupts on all 
devices which share
the same IRQ but just on those who fired an interrupt? If so, do PCI devices 
really often share
IRQs? Does not supporting this mean real slowdown on such devices?

As far as I understand, everyone who cares about performance uses MSI/MSIX, no?


 And there is some API to support that (commit 
 a2e27787f893621c5a6b865acf6b7766f8671328).

 I have a network adapter:
 0001:00:01.0 Ethernet controller: Chelsio Communications Inc T310 10GbE 
 Single Port Adapter
  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
 Stepping- SERR+ FastB2B- DisINTx-
  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
 MAbort- SERR- PERR- INTx-

 pci_intx_mask_supported() reports that the feature is supported for this 
 adapter
 BUT the adapter does not set PCI_STATUS_INTERRUPT so 
 pci_check_and_set_intx_mask()
 never changes PCI_COMMAND and INTx does not work on it when we use it as 
 VFIO-PCI device.

 If I remove the check of this bit, it works fine as it is called from an 
 interrupt handler and
 Status bit check is redundant.

 Opened a spec:
 PCI LOCAL BUS SPECIFICATION, REV. 3.0, Table 6-2: Status Register Bits
 ===
 3This read-only bit reflects the state of the interrupt in the
 device/function. Only when the Interrupt Disable bit in the command
 register is a 0 and this Interrupt Status bit is a 1, will the
 device’s/function’s INTx# signal be asserted. Setting the Interrupt
Disable bit to a 1 has no effect on the state of this bit.
 ===
 With this adapter, INTx# is asserted but Status bit is still 0.

 Is it mandatory for a device to set Status bit if it supports INTx masking?

 2 Alex: if it is mandatory, then we need to be able to disable pci_2_3 in 
 VFIO-PCI
 somehow.
 
 Since PCI 2.3, this bit is mandatory, and it should be independent of
 the masking bit. The question is, if your device is supposed to support
 2.3, thus is just buggy, or if our detection algorithm is unreliable. It
 basically builds on the assumption that, if we can flip the mask bit,
 the feature should be present. I guess that is the best we can do. Maybe
 we can augment this with a blacklist of devices that support flipping
 without actually providing the feature.

It is a good moment to start :)
Not sure where - in VFIO or along with that PCI INTx API.

Here is that broken device:
aik@vpl2:~$ lspci -s 1:1:0.0
0001:01:00.0 Ethernet controller: Chelsio Communications Inc T310 10GbE Single 
Port Adapter
aik@vpl2:~$ lspci -ns 1:1:0.0
0001:01:00.0 0200: 1425:0030


-- 
Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio-powerpc: enabled and supported on power

2012-05-24 Thread Alexey Kardashevskiy
On 25/05/12 01:12, Alex Williamson wrote:
 On Thu, 2012-05-24 at 13:10 +1000, Alexey Kardashevskiy wrote:
 The patch introduces support of VFIO on POWER.

 The patch consists of:

 1. IOMMU driver for VFIO.
 It does not use IOMMU API at all, instead it calls POWER
 IOMMU API directly (ppc_md callbacks).

 2. A piece of code (module_init) which creates IOMMU groups.
 TBD: what is a better place for it?

 The patch is made on top of
 git://github.com/awilliam/linux-vfio.git iommu-group-vfio-20120523
 (which is iommu-group-vfio-20120521 + some fixes)

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/Kconfig |6 +
  arch/powerpc/include/asm/iommu.h |3 +
  arch/powerpc/kernel/Makefile |1 +
  arch/powerpc/kernel/iommu_vfio.c |  371 
 ++
  4 files changed, 381 insertions(+), 0 deletions(-)
  create mode 100644 arch/powerpc/kernel/iommu_vfio.c

 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
 index feab3ba..13d12ac 100644
 --- a/arch/powerpc/Kconfig
 +++ b/arch/powerpc/Kconfig
 @@ -319,6 +319,12 @@ config 8XX_MINIMAL_FPEMU
  config IOMMU_HELPER
  def_bool PPC64
  
 +config IOMMU_VFIO
 +select IOMMU_API
 +depends on PPC64
 
  VFIO?

And get a loop:

drivers/vfio/Kconfig|6| error: recursive dependency detected!
drivers/vfio/Kconfig|6| symbol VFIO depends on IOMMU_API
drivers/iommu/Kconfig|2| symbol IOMMU_API is selected by IOMMU_VFIO
arch/powerpc/Kconfig|322| symbol IOMMU_VFIO depends on VFIO

because:

menuconfig VFIO
tristate VFIO Non-Privileged userspace driver framework
depends on IOMMU_API

But this is a minor issue, read below.


 +tristate Enable IOMMU chardev to support user-space PCI
 +default n
 +
  config SWIOTLB
  bool SWIOTLB support
  default n
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index 957a83f..c64bce7 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -66,6 +66,9 @@ struct iommu_table {
  unsigned long  it_halfpoint; /* Breaking point for small/large allocs */
  spinlock_t it_lock;  /* Protects it_map */
  unsigned long *it_map;   /* A simple allocation bitmap for now */
 +#ifdef CONFIG_IOMMU_API
 +struct iommu_group *it_group;
 +#endif
  };
  
  struct scatterlist;
 diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
 index f5808a3..7cfd68e 100644
 --- a/arch/powerpc/kernel/Makefile
 +++ b/arch/powerpc/kernel/Makefile
 @@ -90,6 +90,7 @@ obj-$(CONFIG_RELOCATABLE_PPC32)+= reloc_32.o
  
  obj-$(CONFIG_PPC32) += entry_32.o setup_32.o
  obj-$(CONFIG_PPC64) += dma-iommu.o iommu.o
 +obj-$(CONFIG_IOMMU_VFIO)+= iommu_vfio.o
  obj-$(CONFIG_KGDB)  += kgdb.o
  obj-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE)+= prom_init.o
  obj-$(CONFIG_MODULES)   += ppc_ksyms.o
 diff --git a/arch/powerpc/kernel/iommu_vfio.c 
 b/arch/powerpc/kernel/iommu_vfio.c
 new file mode 100644
 index 000..68a93dd
 --- /dev/null
 +++ b/arch/powerpc/kernel/iommu_vfio.c
 
 Should this be drivers/vfio/vfio_iommu_powerpc.c?


I guess no.
We already have some IOMMU code in arch/powerpc/kernel/iommu.c and eventually 
when I get my patch
polished it all can go to arch/powerpc/kernel/iommu.c or use some code from it.


 @@ -0,0 +1,371 @@
 +/*
 + * VFIO: IOMMU DMA mapping support for TCE on POWER
 + *
 + * Copyright (C) 2012 IBM Corp.  All rights reserved.
 + * Author: Alexey Kardashevskiy a...@ozlabs.ru
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * Derived from original vfio_iommu_x86.c:
 + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
 + * Author: Alex Williamson alex.william...@redhat.com
 + */
 +
 +#include linux/module.h
 +#include linux/pci.h
 +#include linux/slab.h
 +#include linux/uaccess.h
 +#include linux/vfio.h
 +#include linux/err.h
 +#include linux/spinlock.h
 +#include asm/iommu.h
 +
 +#define DRIVER_VERSION  0.1
 +#define DRIVER_AUTHOR   a...@ozlabs.ru
 +#define DRIVER_DESC POWER IOMMU chardev for VFIO
 +
 +#define IOMMU_CHECK_EXTENSION   _IO(VFIO_TYPE, VFIO_BASE + 1)
 +
 +/*  API for POWERPC IOMMU  */
 +
 +#define POWERPC_IOMMU   2
 +
 +struct tce_iommu_info {
 +__u32 argsz;
 +__u32 dma32_window_start;
 +__u32 dma32_window_size;
 +};
 +
 +#define POWERPC_IOMMU_GET_INFO  _IO(VFIO_TYPE, VFIO_BASE + 12)
 +
 +struct tce_iommu_dma_map {
 +__u32 argsz;
 +__u64 va;
 +__u64 dmaaddr;
 +};
 +
 +#define POWERPC_IOMMU_MAP_DMA   _IO(VFIO_TYPE, VFIO_BASE + 13)
 +#define POWERPC_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
 
 We'd probably want to merge this into include/linux/vfio.h too?


Why? The vfio_iommu_driver_ops has nothing to do with VFIO actually, it is what 
IOMMU 

Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-05-24 Thread Jan Kiszka
On 2012-05-24 22:18, Alexey Kardashevskiy wrote:
 On 24/05/12 22:02, Jan Kiszka wrote:
 On 2012-05-24 04:44, Alexey Kardashevskiy wrote:
 [Found while debugging VFIO on POWER but it is platform independent]

 There is a feature in PCI (=2.3?) to mask/unmask INTx via PCI_COMMAND and
 PCI_STATUS registers.

 Yes, 2.3 introduced this. Masking is done via command register, checking
 if the source was the PCI in question via the status register. The
 latter is important for supporting IRQ sharing - and that's why we
 introduced this masking API to the PCI layer.
 
 
 Is not it just a quite small optimization to not to disable interrupts on all 
 devices which share
 the same IRQ but just on those who fired an interrupt? If so, do PCI devices 
 really often share
 IRQs? Does not supporting this mean real slowdown on such devices?
 
 As far as I understand, everyone who cares about performance uses MSI/MSIX, 
 no?

Not everyone is blessed with MSI-only PCI devices. From my notebook:

# cat /proc/interrupts
[...]
 22: [...] IO-APIC-fasteoi   ehci_hcd:usb1, ehci_hcd:usb2

So, if I want to assign one EHCI controller to a guest, I have to
disable the other as well. The same can happen quickly if you attach a
few legacy PCI adapters to a system and want to pass them through.

 
 
 And there is some API to support that (commit 
 a2e27787f893621c5a6b865acf6b7766f8671328).

 I have a network adapter:
 0001:00:01.0 Ethernet controller: Chelsio Communications Inc T310 10GbE 
 Single Port Adapter
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ 
 Stepping- SERR+ FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
 MAbort- SERR- PERR- INTx-

 pci_intx_mask_supported() reports that the feature is supported for this 
 adapter
 BUT the adapter does not set PCI_STATUS_INTERRUPT so 
 pci_check_and_set_intx_mask()
 never changes PCI_COMMAND and INTx does not work on it when we use it as 
 VFIO-PCI device.

 If I remove the check of this bit, it works fine as it is called from an 
 interrupt handler and
 Status bit check is redundant.

 Opened a spec:
 PCI LOCAL BUS SPECIFICATION, REV. 3.0, Table 6-2: Status Register Bits
 ===
 3   This read-only bit reflects the state of the interrupt in the
 device/function. Only when the Interrupt Disable bit in the command
 register is a 0 and this Interrupt Status bit is a 1, will the
 device’s/function’s INTx# signal be asserted. Setting the Interrupt
Disable bit to a 1 has no effect on the state of this bit.
 ===
 With this adapter, INTx# is asserted but Status bit is still 0.

 Is it mandatory for a device to set Status bit if it supports INTx masking?

 2 Alex: if it is mandatory, then we need to be able to disable pci_2_3 in 
 VFIO-PCI
 somehow.

 Since PCI 2.3, this bit is mandatory, and it should be independent of
 the masking bit. The question is, if your device is supposed to support
 2.3, thus is just buggy, or if our detection algorithm is unreliable. It
 basically builds on the assumption that, if we can flip the mask bit,
 the feature should be present. I guess that is the best we can do. Maybe
 we can augment this with a blacklist of devices that support flipping
 without actually providing the feature.
 
 It is a good moment to start :)
 Not sure where - in VFIO or along with that PCI INTx API.

At PCI level as the API is VFIO agnostic (it was introduced for
classic KVM device assignment, in fact).

 
 Here is that broken device:
 aik@vpl2:~$ lspci -s 1:1:0.0
 0001:01:00.0 Ethernet controller: Chelsio Communications Inc T310 10GbE 
 Single Port Adapter
 aik@vpl2:~$ lspci -ns 1:1:0.0
 0001:01:00.0 0200: 1425:0030

A patch to add the infrastructure as well would be even more welcome. :)
You could have a look at drivers/pci/quirks.c for patterns how to do this.

Jan



signature.asc
Description: OpenPGP digital signature


[PATCH 0/3] Fix hot-unplug race in virtio-blk

2012-05-24 Thread Asias He
This patch set fixes the race when hot-unplug stressed disk.

Asias He (3):
  virtio-blk: Call del_gendisk() before disable guest kick
  virtio-blk: Reset device after blk_cleanup_queue()
  virtio-blk: Use block layer provided spinlock

 drivers/block/virtio_blk.c |   25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] virtio-blk: Call del_gendisk() before disable guest kick

2012-05-24 Thread Asias He
del_gendisk() might not return due to failing to remove the
/sys/block/vda/serial sysfs entry when another thread (udev) is
trying to read it.

virtblk_remove()
  vdev-config-reset() : guest will not kick us through interrupt
del_gendisk()
  device_del()
kobject_del(): got stuck, sysfs entry ref count non zero

sysfs_open_file(): user space process read /sys/block/vda/serial
   sysfs_get_active() : got sysfs entry ref count
  dev_attr_show()
virtblk_serial_show()
   blk_execute_rq() : got stuck, interrupt is disabled
  request cannot be finished

This patch fixes it by calling del_gendisk() before we disable guest's
interrupt so that the request sent in virtblk_serial_show() will be
finished and del_gendisk() will success.

This fixes another race in hot-unplug process.

It is save to call del_gendisk(vblk-disk) before
flush_work(vblk-config_work) which might access vblk-disk, because
vblk-disk is not freed until put_disk(vblk-disk).

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: virtualizat...@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He as...@redhat.com
---
 drivers/block/virtio_blk.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 693187d..1bed517 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -584,13 +584,13 @@ static void __devexit virtblk_remove(struct virtio_device 
*vdev)
vblk-config_enable = false;
mutex_unlock(vblk-config_lock);
 
+   del_gendisk(vblk-disk);
+
/* Stop all the virtqueues. */
vdev-config-reset(vdev);
 
flush_work(vblk-config_work);
 
-   del_gendisk(vblk-disk);
-
/* Abort requests dispatched to driver. */
spin_lock_irqsave(vblk-lock, flags);
while ((vbr = virtqueue_detach_unused_buf(vblk-vq))) {
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] virtio-blk: Reset device after blk_cleanup_queue()

2012-05-24 Thread Asias He
blk_cleanup_queue() will call blk_drian_queue() to drain all the
requests before queue DEAD marking. If we reset the device before
blk_cleanup_queue() the drain would fail.

1) if the queue is stopped in do_virtblk_request() because device is
full, the q-request_fn() will not be called.

blk_drain_queue() {
   while(true) {
  ...
  if (!list_empty(q-queue_head))
__blk_run_queue(q) {
if (queue is not stoped)
q-request_fn()
}
  ...
   }
}

Do no reset the device before blk_cleanup_queue() gives the chance to
start the queue in interrupt handler blk_done().

2) In commit b79d866c8b7014a51f611a64c40546109beaf24a, We abort requests
dispatched to driver before blk_cleanup_queue(). There is a race if
requests are dispatched to driver after the abort and before the queue
DEAD mark. To fix this, instead of aborting the requests explicitly, we
can just reset the device after after blk_cleanup_queue so that the
device can complete all the requests before queue DEAD marking in the
drain process.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: virtualizat...@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He as...@redhat.com
---
 drivers/block/virtio_blk.c |   12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1bed517..b4fa2d7 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -576,8 +576,6 @@ static void __devexit virtblk_remove(struct virtio_device 
*vdev)
 {
struct virtio_blk *vblk = vdev-priv;
int index = vblk-index;
-   struct virtblk_req *vbr;
-   unsigned long flags;
 
/* Prevent config work handler from accessing the device. */
mutex_lock(vblk-config_lock);
@@ -585,21 +583,13 @@ static void __devexit virtblk_remove(struct virtio_device 
*vdev)
mutex_unlock(vblk-config_lock);
 
del_gendisk(vblk-disk);
+   blk_cleanup_queue(vblk-disk-queue);
 
/* Stop all the virtqueues. */
vdev-config-reset(vdev);
 
flush_work(vblk-config_work);
 
-   /* Abort requests dispatched to driver. */
-   spin_lock_irqsave(vblk-lock, flags);
-   while ((vbr = virtqueue_detach_unused_buf(vblk-vq))) {
-   __blk_end_request_all(vbr-req, -EIO);
-   mempool_free(vbr, vblk-pool);
-   }
-   spin_unlock_irqrestore(vblk-lock, flags);
-
-   blk_cleanup_queue(vblk-disk-queue);
put_disk(vblk-disk);
mempool_destroy(vblk-pool);
vdev-config-del_vqs(vdev);
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] virtio-blk: Use block layer provided spinlock

2012-05-24 Thread Asias He
Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().

The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.
if (q-queue_lock != q-__queue_lock)
q-queue_lock = q-__queue_lock;

However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up , they would release the
block provided spinlock.

=
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ #238 Not tainted
-
fio/3587 is trying to release lock ((q-__queue_lock)-rlock) at:
[813274d2] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/3587:
 #0:  ((vblk-lock)-rlock){..}, at:
[8132661a] get_request_wait+0x19a/0x250

Other drivers use block layer provided spinlock as well, e.g. SCSI.  I
do not see any reason why we shouldn't, even the lock unblance issue can
be fixed by block layer.

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: virtualizat...@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He as...@redhat.com
---
 drivers/block/virtio_blk.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index b4fa2d7..774c31d 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -21,8 +21,6 @@ struct workqueue_struct *virtblk_wq;
 
 struct virtio_blk
 {
-   spinlock_t lock;
-
struct virtio_device *vdev;
struct virtqueue *vq;
 
@@ -65,7 +63,7 @@ static void blk_done(struct virtqueue *vq)
unsigned int len;
unsigned long flags;
 
-   spin_lock_irqsave(vblk-lock, flags);
+   spin_lock_irqsave(vblk-disk-queue-queue_lock, flags);
while ((vbr = virtqueue_get_buf(vblk-vq, len)) != NULL) {
int error;
 
@@ -99,7 +97,7 @@ static void blk_done(struct virtqueue *vq)
}
/* In case queue is stopped waiting for more buffers. */
blk_start_queue(vblk-disk-queue);
-   spin_unlock_irqrestore(vblk-lock, flags);
+   spin_unlock_irqrestore(vblk-disk-queue-queue_lock, flags);
 }
 
 static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
@@ -431,7 +429,6 @@ static int __devinit virtblk_probe(struct virtio_device 
*vdev)
goto out_free_index;
}
 
-   spin_lock_init(vblk-lock);
vblk-vdev = vdev;
vblk-sg_elems = sg_elems;
sg_init_table(vblk-sg, vblk-sg_elems);
@@ -456,7 +453,7 @@ static int __devinit virtblk_probe(struct virtio_device 
*vdev)
goto out_mempool;
}
 
-   q = vblk-disk-queue = blk_init_queue(do_virtblk_request, vblk-lock);
+   q = vblk-disk-queue = blk_init_queue(do_virtblk_request, NULL);
if (!q) {
err = -ENOMEM;
goto out_put_disk;
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting

2012-05-24 Thread Jan Kiszka
On 2012-05-24 20:56, Thomas Gleixner wrote:
 On Thu, 24 May 2012, Alex Williamson wrote:
 
 On Fri, 2012-05-25 at 01:01 +0200, Thomas Gleixner wrote:
 So the proper fix is that qemu tells the guest that mask bit is
 supported and catches the mask bit toggling before writing it out to
 the hardware for those devices which do not support it.

 We can't necessarily do that, we have to work with the config space
 we're give.  Using the smallest possible MSI capability always works.
 Adding mask bits may not fit in with the existing capabilities of the
 physical device.  Thanks,
 
 I see what you mean. A random device driver of a random guest OS might
 rely on that information. Unlikely, but 
 
 So we need some logic to circumvent the masked/unmasked logic in case
 that property is not set, right ?

For MSI emulation in QEMU (including device assignment) it is quite
simple: don't assume that the guest will always mask or even disable
before fiddling with some MSI vector configuration. That is not required
by the spec, so we can't rely on it. The patches I have in a
semi-finished state will do precisely this. But there is still some use
for a dev-assign fix based on the current code for qemu-kvm-1.1.

BTW, along with the switch of device assignment to generic MSI support,
we should also gain support for MSI vector masking - provided the
underlying device comes with that feature as well.

Jan



signature.asc
Description: OpenPGP digital signature


  1   2   >