[PATCH v6 0/4] irqfd support for arm/arm64

2015-01-13 Thread Eric Auger
This patch series enables irqfd on arm and arm64.

Irqfd framework enables to inject a virtual IRQ into a guest upon an
eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with
a kvm_irqfd struct that associates a VM, an eventfd, a virtual IRQ number
(aka. the gsi). When an actor signals the eventfd (typically a VFIO
platform driver), the kvm irqfd subsystem injects the gsi into the VM.

Resamplefd also is supported for level sensitive interrupts, ie. the
user can provide another eventfd that is triggered when the completion
of the virtual IRQ (gsi) is detected by the GIC.

The gsi must correspond to a shared peripheral interrupt (SPI), ie the
GIC interrupt ID is gsi + 32.

The rationale behind not supporting PPI irqfd injection is that
any device using a PPI would be a private-to-the-CPU device (timer for
instance), so its state would have to be context-switched along with the
VCPU and would require in-kernel wiring anyhow. It is not a relevant use
case for irqfds.

This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.

No IRQ routing table is used, enabling to remove CONFIG_HAVE_KVM_IRQCHIP

The ARM virtual interrupt controller, the VGIC, is dynamically
instantiated. The user-space may attempt to assign an irqfd before
the virtual interrupt controller is ready. For that reason a
check is added in the generic irqfd code to test whether the virtual
interrupt controller is ready.

This work was tested with Calxeda Midway xgmac main interrupt with
qemu-system-arm and QEMU VFIO platform device. Also irqfd was proven
functional on several vhost-net prototypes.

Available on ssh://git.linaro.org/people/eric.auger/linux.git
branch irqfd_v6_integrated_official_release

v5 -> v6:
- take into account Christoffer's comments:
  - rename macro and function enabling to check the state of virtual
interrupt controller (kvm_arch_intc_initialized)
  - kvm_arch_intc_initialized is declared in kvm_host.h whatever the
archi support.
  - squash v5 patch files 3 & 4
  - KVM_CAP_IRQFD support depends on vgic_present
  - add Christoffer's Reviewed-by on last patch file

v4 -> v5:
- add the capability to check whether vgic is initialized when
  assigning an irqfd.  objective is to avoid injecting IRQ before
  this vgic is ready: this corresponds to new patch files 2, 3, 4.
- do not specifically handle early virtual IRQ injections in
  kvm_set_irq.  In case of injection when vgic is not yet ready,
  simply return an error.  User-space now has means to force vgic
  init and get notified if irqfd assign takes place too early.
- squash [PATCH v4 2/3] KVM: arm: add irqfd support and
 [PATCH v4 3/3] KVM: arm64: add irqfd support
- add Acked-by's in KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP
- some comment rewording in vgic

v3 -> v4:
- rebase on 3.18rc5
- vgic dynamic instantiation brought new challenges:
  handling of irqfd injection when vgic is not ready
- unset of CONFIG_HAVE_KVM_IRQCHIP in a separate patch
- add arm64 enable
- vgic.c style modifications according to Christoffer comments

v2 -> v3:
- removal of irq.h from eventfd.c put in a separate patch to increase
  visibility
- properly expose KVM_CAP_IRQFD capability in arm.c
- remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used

v1 -> v2:
- rebase on 3.17rc1
- move of the dist unlock in process_maintenance
- remove of dist lock in __kvm_vgic_sync_hwstate
- rewording of the commit message (add resamplefd reference)
- remove irq.h


Eric Auger (4):
  KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP
  KVM: introduce kvm_arch_intc_initialized
  KVM: arm/arm64: implement kvm_arch_intc_initialized and use it in
irqfd
  KVM: arm/arm64: add irqfd support

 Documentation/virtual/kvm/api.txt |  6 +++-
 arch/arm/include/asm/kvm_host.h   |  2 ++
 arch/arm/include/uapi/asm/kvm.h   |  3 ++
 arch/arm/kvm/Kconfig  |  4 +--
 arch/arm/kvm/Makefile |  2 +-
 arch/arm/kvm/arm.c| 10 +++
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/include/uapi/asm/kvm.h |  3 ++
 arch/arm64/kvm/Kconfig|  3 +-
 arch/arm64/kvm/Makefile   |  2 +-
 include/linux/kvm_host.h  | 14 +
 virt/kvm/arm/vgic.c   | 63 ---
 virt/kvm/eventfd.c|  3 ++
 13 files changed, 107 insertions(+), 10 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/4] kvmtool: ARM/ARM64: Misc updates

2014-10-06 Thread Anup Patel
This patchset updates KVMTOOL to use some of the features
supported by Linux-3.16 KVM ARM/ARM64, such as:

1. Target CPU == Host using KVM_ARM_PREFERRED_TARGET vm ioctl
2. Target CPU type Potenza for using KVMTOOL on X-Gene
3. PSCI v0.2 support for Aarch32 and Aarch64 guest
4. System event exit reason

Changes since v5:
- Use pr_info() and pr_warning() instead of printf() when
handling system event exit reason

Changes since v4:
- Avoid using magic '0' target for kvm arm generic target
- Added comment for why we need Potenza target in KVMTOOL

Changes since v3:
- Add generic targets for aarch32 and aarch64 which are used
  by KVMTOOL when target type returned by KVM_ARM_PREFERRED_TARGET
  vm ioctl is not known to KVMTOOL
- Print more info when handling system reset event

Changes since v2:
- Use target type returned by KVM_ARM_PREFERRED_TARGET vm ioctl
  for VCPU init such that we don't need to update KVMTOOL for
  every new host hardware
- Simplify DTB generation for PSCI node

Changes since v1:
- Drop the patch to fix compile error for aarch64
- Fallback to old method of trying all target types if
KVM_ARM_PREFERRED_TARGET vm ioctl fails
- Print more info when handling KVM_EXIT_SYSTEM_EVENT

Anup Patel (4):
  kvmtool: ARM: Use KVM_ARM_PREFERRED_TARGET vm ioctl to determine
target cpu
  kvmtool: ARM64: Add target type potenza for aarch64
  kvmtool: Handle exit reason KVM_EXIT_SYSTEM_EVENT
  kvmtool: ARM/ARM64: Provide PSCI-0.2 to guest when KVM supports it

 tools/kvm/arm/aarch32/arm-cpu.c |8 +++
 tools/kvm/arm/aarch64/arm-cpu.c |   23 -
 tools/kvm/arm/fdt.c |   51 +--
 tools/kvm/arm/include/arm-common/kvm-cpu-arch.h |2 +
 tools/kvm/arm/kvm-cpu.c |   61 +++
 tools/kvm/kvm-cpu.c |   21 
 6 files changed, 149 insertions(+), 17 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/4] live migration dirty bitmap support for ARMv7

2014-05-15 Thread Mario Smarduch
Will do that, I'm sure there will be another iteration :).

On 05/15/2014 11:51 AM, Christoffer Dall wrote:
> On Thu, May 15, 2014 at 11:27:27AM -0700, Mario Smarduch wrote:
>> This is v6 patcheset of live mgiration support for ARMv7.
> 
> migration
> 
> This is an extremely terse cover letter.  It would have been nice with a
> few sentences of which existing features this leverages, which support
> was missing, what the preferred approach is, etc.  Also, links to a wiki
> page or just a few notes on how you did the testing below with which
> user space tools etc. would also have been great.
> 
>>
>> - Tested on two 4-way A15 hardware, QEMU 2-way/4-way SMP guest upto 2GB
>> - Various dirty data rates tested - 2GB/1s ... 2048 pgs/5ms
>> - validated source/destination memory image integrity
>>
>> Changes since v1:
>> - add unlock of VM mmu_lock to prevent a deadlock
>> - moved migratiion active inside mmu_lock acquire for visibility in 2nd stage
>>   data abort handler
>> - Added comments
>>
>> Changes since v2: 
>> - move initial VM write protect to memory region architecture prepare 
>> function
>>   (needed to make dirty logging function generic) 
>> - added stage2_mark_pte_ro() - to mark ptes ro - Marc's comment
>> - optimized initial VM memory region write protect to do fewer table lookups 
>> -
>>   applied Marc's comment for walking dirty bitmap mask
>> - added pud_addr_end() for stage2 tables, to make the walk 4-level
>> - added kvm_flush_remote_tlbs() to use ARM TLB invalidation, made the generic
>>   one weak, Marc's comment to for generic dirty bitmap log function
>> - optimized walking dirty bit map mask to skip upper tables - Marc's comment
>> - deleted x86,arm kvm_vm_ioctl_get_dirty_log(), moved to kvm_main.c tagged 
>>   the function weak - Marc's comment
>> - changed Data Abort handler pte index handling - Marc's comment
>>
>> Changes since v3:
>> - changed pte updates to reset write bit instead of setting default 
>>   value for existing pte's - Steve's comment
>> - In addition to PUD add 2nd stage >4GB range functions - Steves
>>   suggestion
>> - Restructured initial memory slot write protect function for PGD, PUD, PMD
>>   table walking - Steves suggestion
>> - Renamed variable types to resemble their use - Steves suggestions
>> - Added couple pte helpers for 2nd stage tables - Steves suggestion
>> - Updated unmap_range() that handles 2nd stage tables and identity mappings
>>   to handle 2nd stage addresses >4GB. Left ARMv8 unchanged.
>>
>> Changes since v4:
>> - rebased to 3.15.0-rc1 - 'next' to pickup p*addr_end patches - Gavins 
>> comment
>> - Update PUD address end function to support 4-level page table walk
>> - Elimiated 5th patch of the series that fixed unmap_range(), since it was
>>   fixed by Marcs patches.
>>
>> Changes since v5:
>> - Created seperate entry point for VMID TLB flush with no param - 
>> Christoffers
>>   comment
>> - Update documentation for kvm_flush_remote_tlbs() - Christoffers comment
>> - Simplified splitting of huge pages - inittial WP and 2nd stage DABT handler
>>   clear the huge page PMD, and use current code to fault in small pages.
>>   Removed kvm_split_pmd().
>>
>> Mario Smarduch (4):
>>   add ARMv7 HYP API to flush VM TLBs without address param
>>   live migration support for initial write protect of VM
>>   live migration support for VM dirty log management
>>   add 2nd stage page fault handling during live migration
>>
>>  arch/arm/include/asm/kvm_asm.h  |1 +
>>  arch/arm/include/asm/kvm_host.h |   11 ++
>>  arch/arm/include/asm/kvm_mmu.h  |   10 ++
>>  arch/arm/kvm/arm.c  |8 +-
>>  arch/arm/kvm/interrupts.S   |   11 ++
>>  arch/arm/kvm/mmu.c  |  292 
>> ++-
>>  arch/x86/kvm/x86.c  |   86 
>>  virt/kvm/kvm_main.c |   84 ++-
>>  8 files changed, 409 insertions(+), 94 deletions(-)
>>
>> -- 
>> 1.7.9.5
>>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/4] live migration dirty bitmap support for ARMv7

2014-05-15 Thread Christoffer Dall
On Thu, May 15, 2014 at 11:27:27AM -0700, Mario Smarduch wrote:
> This is v6 patcheset of live mgiration support for ARMv7.

migration

This is an extremely terse cover letter.  It would have been nice with a
few sentences of which existing features this leverages, which support
was missing, what the preferred approach is, etc.  Also, links to a wiki
page or just a few notes on how you did the testing below with which
user space tools etc. would also have been great.

> 
> - Tested on two 4-way A15 hardware, QEMU 2-way/4-way SMP guest upto 2GB
> - Various dirty data rates tested - 2GB/1s ... 2048 pgs/5ms
> - validated source/destination memory image integrity
> 
> Changes since v1:
> - add unlock of VM mmu_lock to prevent a deadlock
> - moved migratiion active inside mmu_lock acquire for visibility in 2nd stage
>   data abort handler
> - Added comments
> 
> Changes since v2: 
> - move initial VM write protect to memory region architecture prepare function
>   (needed to make dirty logging function generic) 
> - added stage2_mark_pte_ro() - to mark ptes ro - Marc's comment
> - optimized initial VM memory region write protect to do fewer table lookups -
>   applied Marc's comment for walking dirty bitmap mask
> - added pud_addr_end() for stage2 tables, to make the walk 4-level
> - added kvm_flush_remote_tlbs() to use ARM TLB invalidation, made the generic
>   one weak, Marc's comment to for generic dirty bitmap log function
> - optimized walking dirty bit map mask to skip upper tables - Marc's comment
> - deleted x86,arm kvm_vm_ioctl_get_dirty_log(), moved to kvm_main.c tagged 
>   the function weak - Marc's comment
> - changed Data Abort handler pte index handling - Marc's comment
> 
> Changes since v3:
> - changed pte updates to reset write bit instead of setting default 
>   value for existing pte's - Steve's comment
> - In addition to PUD add 2nd stage >4GB range functions - Steves
>   suggestion
> - Restructured initial memory slot write protect function for PGD, PUD, PMD
>   table walking - Steves suggestion
> - Renamed variable types to resemble their use - Steves suggestions
> - Added couple pte helpers for 2nd stage tables - Steves suggestion
> - Updated unmap_range() that handles 2nd stage tables and identity mappings
>   to handle 2nd stage addresses >4GB. Left ARMv8 unchanged.
> 
> Changes since v4:
> - rebased to 3.15.0-rc1 - 'next' to pickup p*addr_end patches - Gavins comment
> - Update PUD address end function to support 4-level page table walk
> - Elimiated 5th patch of the series that fixed unmap_range(), since it was
>   fixed by Marcs patches.
> 
> Changes since v5:
> - Created seperate entry point for VMID TLB flush with no param - Christoffers
>   comment
> - Update documentation for kvm_flush_remote_tlbs() - Christoffers comment
> - Simplified splitting of huge pages - inittial WP and 2nd stage DABT handler
>   clear the huge page PMD, and use current code to fault in small pages.
>   Removed kvm_split_pmd().
> 
> Mario Smarduch (4):
>   add ARMv7 HYP API to flush VM TLBs without address param
>   live migration support for initial write protect of VM
>   live migration support for VM dirty log management
>   add 2nd stage page fault handling during live migration
> 
>  arch/arm/include/asm/kvm_asm.h  |1 +
>  arch/arm/include/asm/kvm_host.h |   11 ++
>  arch/arm/include/asm/kvm_mmu.h  |   10 ++
>  arch/arm/kvm/arm.c  |8 +-
>  arch/arm/kvm/interrupts.S   |   11 ++
>  arch/arm/kvm/mmu.c  |  292 
> ++-
>  arch/x86/kvm/x86.c  |   86 
>  virt/kvm/kvm_main.c |   84 ++-
>  8 files changed, 409 insertions(+), 94 deletions(-)
> 
> -- 
> 1.7.9.5
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/4] live migration dirty bitmap support for ARMv7

2014-05-15 Thread Mario Smarduch
This is v6 patcheset of live mgiration support for ARMv7.

- Tested on two 4-way A15 hardware, QEMU 2-way/4-way SMP guest upto 2GB
- Various dirty data rates tested - 2GB/1s ... 2048 pgs/5ms
- validated source/destination memory image integrity

Changes since v1:
- add unlock of VM mmu_lock to prevent a deadlock
- moved migratiion active inside mmu_lock acquire for visibility in 2nd stage
  data abort handler
- Added comments

Changes since v2: 
- move initial VM write protect to memory region architecture prepare function
  (needed to make dirty logging function generic) 
- added stage2_mark_pte_ro() - to mark ptes ro - Marc's comment
- optimized initial VM memory region write protect to do fewer table lookups -
  applied Marc's comment for walking dirty bitmap mask
- added pud_addr_end() for stage2 tables, to make the walk 4-level
- added kvm_flush_remote_tlbs() to use ARM TLB invalidation, made the generic
  one weak, Marc's comment to for generic dirty bitmap log function
- optimized walking dirty bit map mask to skip upper tables - Marc's comment
- deleted x86,arm kvm_vm_ioctl_get_dirty_log(), moved to kvm_main.c tagged 
  the function weak - Marc's comment
- changed Data Abort handler pte index handling - Marc's comment

Changes since v3:
- changed pte updates to reset write bit instead of setting default 
  value for existing pte's - Steve's comment
- In addition to PUD add 2nd stage >4GB range functions - Steves
  suggestion
- Restructured initial memory slot write protect function for PGD, PUD, PMD
  table walking - Steves suggestion
- Renamed variable types to resemble their use - Steves suggestions
- Added couple pte helpers for 2nd stage tables - Steves suggestion
- Updated unmap_range() that handles 2nd stage tables and identity mappings
  to handle 2nd stage addresses >4GB. Left ARMv8 unchanged.

Changes since v4:
- rebased to 3.15.0-rc1 - 'next' to pickup p*addr_end patches - Gavins comment
- Update PUD address end function to support 4-level page table walk
- Elimiated 5th patch of the series that fixed unmap_range(), since it was
  fixed by Marcs patches.

Changes since v5:
- Created seperate entry point for VMID TLB flush with no param - Christoffers
  comment
- Update documentation for kvm_flush_remote_tlbs() - Christoffers comment
- Simplified splitting of huge pages - inittial WP and 2nd stage DABT handler
  clear the huge page PMD, and use current code to fault in small pages.
  Removed kvm_split_pmd().

Mario Smarduch (4):
  add ARMv7 HYP API to flush VM TLBs without address param
  live migration support for initial write protect of VM
  live migration support for VM dirty log management
  add 2nd stage page fault handling during live migration

 arch/arm/include/asm/kvm_asm.h  |1 +
 arch/arm/include/asm/kvm_host.h |   11 ++
 arch/arm/include/asm/kvm_mmu.h  |   10 ++
 arch/arm/kvm/arm.c  |8 +-
 arch/arm/kvm/interrupts.S   |   11 ++
 arch/arm/kvm/mmu.c  |  292 ++-
 arch/x86/kvm/x86.c  |   86 
 virt/kvm/kvm_main.c |   84 ++-
 8 files changed, 409 insertions(+), 94 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/4] KVM/ARM Architected Timers support

2013-01-16 Thread Christoffer Dall
The following series implements support for the architected generic
timers for KVM/ARM.

This patch series can also be pulled from:
git://github.com/virtualopensystems/linux-kvm-arm.git
branch: kvm-arm-v16-vgic-timers

Changes since v5:
 - Renamed sync_{to,from} to {flush,sync}_hwstate
 - Removed ISB's in world-switch code
 - Avoid add/sub on vcpu pointer in world-switch

Changes since v1-v4:
 - Get virtual IRQ number from DT
 - Simplify access to cntvoff and cntv_cval
 - Remove extraneous bit clearing
 - Abstract timer arming/disarming to improve code readability
 - Context switch CNTKCTL across world-switches
 - Add CPU hotplug notifier

---

Marc Zyngier (4):
  ARM: arch_timers: switch to physical timers if HYP mode is available
  ARM: KVM: arch_timers: Add guest timer core support
  ARM: KVM: arch_timers: Add timer world switch
  ARM: KVM: arch_timers: Wire the init code and config option


 arch/arm/include/asm/kvm_arch_timer.h |   85 ++
 arch/arm/include/asm/kvm_asm.h|3 
 arch/arm/include/asm/kvm_host.h   |5 +
 arch/arm/kernel/arch_timer.c  |7 +
 arch/arm/kernel/asm-offsets.c |6 +
 arch/arm/kvm/Kconfig  |8 +
 arch/arm/kvm/Makefile |1 
 arch/arm/kvm/arch_timer.c |  271 +
 arch/arm/kvm/arm.c|   14 ++
 arch/arm/kvm/coproc.c |4 
 arch/arm/kvm/interrupts.S |2 
 arch/arm/kvm/interrupts_head.S|   90 +++
 arch/arm/kvm/vgic.c   |1 
 13 files changed, 495 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_arch_timer.h
 create mode 100644 arch/arm/kvm/arch_timer.c

-- 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/4] VFIO-based PCI device assignment

2012-10-01 Thread Anthony Liguori
Alex Williamson  writes:

> v6:
>   Update patch 4/4 so Makefile just uses CONFIG_LINUX and
>   avoids all the noise in configure.
>
> Also available in git here:
>
> git://github.com/awilliam/qemu-vfio.git
> branch: vfio-for-qemu
> tag: vfio-pci-for-qemu-v6

Applied. Thanks.

Regards,

Anthony Liguori

>
> ---
>
> Alex Williamson (4):
>   vfio: Enable vfio-pci and mark supported
>   vfio: vfio-pci device assignment driver
>   Update Linux kernel headers
>   Update kernel header script to include vfio
>
>
>  MAINTAINERS |5 
>  hw/Makefile.objs|3 
>  hw/vfio_pci.c   | 1864 
> +++
>  hw/vfio_pci_int.h   |  114 ++
>  linux-headers/linux/vfio.h  |  368 
>  scripts/update-linux-headers.sh |2 
>  6 files changed, 2354 insertions(+), 2 deletions(-)
>  create mode 100644 hw/vfio_pci.c
>  create mode 100644 hw/vfio_pci_int.h
>  create mode 100644 linux-headers/linux/vfio.h
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/4] VFIO-based PCI device assignment

2012-09-26 Thread Alex Williamson
v6:
  Update patch 4/4 so Makefile just uses CONFIG_LINUX and
  avoids all the noise in configure.

Also available in git here:

git://github.com/awilliam/qemu-vfio.git
branch: vfio-for-qemu
tag: vfio-pci-for-qemu-v6

---

Alex Williamson (4):
  vfio: Enable vfio-pci and mark supported
  vfio: vfio-pci device assignment driver
  Update Linux kernel headers
  Update kernel header script to include vfio


 MAINTAINERS |5 
 hw/Makefile.objs|3 
 hw/vfio_pci.c   | 1864 +++
 hw/vfio_pci_int.h   |  114 ++
 linux-headers/linux/vfio.h  |  368 
 scripts/update-linux-headers.sh |2 
 6 files changed, 2354 insertions(+), 2 deletions(-)
 create mode 100644 hw/vfio_pci.c
 create mode 100644 hw/vfio_pci_int.h
 create mode 100644 linux-headers/linux/vfio.h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/4] The intro of QEMU block I/O throttling

2011-09-01 Thread Zhi Yong Wu
The main goal of the patch is to effectively cap the disk I/O speed or counts 
of one single VM.It is only one draft, so it unavoidably has some drawbacks, if 
you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer 
and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its 
I/O rate or counts exceed the limits; if yes, then it will enqueue to the block 
queue; The timer will handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
   -drive bps=xxxin bytes/s
(2) only read bps limit
   -drive bps_rd=xxx in bytes/s
(3) only write bps limit
   -drive bps_wr=xxx in bytes/s
(4) global iops limit
   -drive iops=xxx   in ios/s
(5) only read iops limit
   -drive iops_rd=xxxin ios/s
(6) only write iops limit
   -drive iops_wr=xxxin ios/s
(7) the combination of some limits.
   -drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6
(3) When bps/iops limits are specified to a small value such as 511 bytes/s, 
this VM will hang up. We are considering how to handle this senario.

Changes since code V5:
  Mainly fix the aio callback issue for block queue.
  Adjust codes based on Ram Pai's comments.

Zhi Yong Wu (4):
  block: add the command line support
  block: add the block queue support
  block: add block timer and block throttling algorithm
  qmp/hmp: add block_set_io_throttle

 v5: add qmp/hmp support. 
 Adjust the codes based on stefan's comments
 qmp/hmp: add block_set_io_throttle
 
 v4: fix memory leaking based on ryan's feedback.
 
 v3: Added the code for extending slice time, and modified the method to 
compute wait time for the timer.
 
 v2: The codes V2 for QEMU disk I/O limits.
 Modified the codes mainly based on stefan's comments.
 
 v1: Submit the codes for QEMU disk I/O limits.
 Only a code draft.

 Makefile.objs |2 +-
 block.c   |  324 +++--
 block.h   |6 +-
 block/blk-queue.c |  226 +
 block/blk-queue.h |   63 ++
 block_int.h   |   30 +
 blockdev.c|   98 
 blockdev.h|2 +
 hmp-commands.hx   |   15 +++
 qemu-config.c |   24 
 qemu-options.hx   |1 +
 qerror.c  |4 +
 qerror.h  |3 +
 qmp-commands.hx   |   52 +-
 14 files changed, 837 insertions(+), 13 deletions(-)
 create mode 100644 block/blk-queue.c
 create mode 100644 block/blk-queue.h

-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/4] Enable SMEP feature support for KVM

2011-05-30 Thread Yang, Wei Y
This patchset enables a new CPU feature SMEP (Supervisor Mode Execution
Protection) in KVM. SMEP prevents kernel from executing code in application.
Updated Intel SDM describes this CPU feature. The document will be
published soon.

This patchset is based on Fenghua's SMEP patch series, as referred by:
https://lkml.org/lkml/2011/5/17/523

changes since v5:
Add kvm_supported_word9_x86_features and mask against it
before masking against host capability

changes since v4:
Update patch 1/4 comment
Change PT_USER_MASK to ACC_USER_MASK

changes since v3:
Add SMEP bit in CR4_RESERVED_BITS while removing cr4_reserved_bits;
Mask CPUID leaf 7 ebx against host capability word9 in do_cpuid_ent;

Changes since v2:
add instruction fetch checking when walking guest page table.

---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/paging_tmpl.h  |9 -
 arch/x86/kvm/x86.c  |   30 +++---
 3 files changed, 36 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 0/4 net-next] macvtap/vhost TX zero-copy support

2011-05-28 Thread Shirley Ma
On Thu, 2011-05-26 at 13:31 -0700, Shirley Ma wrote:
> On Thu, 2011-05-26 at 23:28 +0300, Michael S. Tsirkin wrote:
> > On Thu, May 26, 2011 at 01:00:20PM -0700, Shirley Ma wrote:
> > > 3. Add sleep in vhost shutting down instead of busy-wait for
> > outstanding
> > >DMAs.
> > 
> > I still think this is not much better. We need to use a
> > completion structure and wait on it instead.
> > If this gets blocked thinkably a tx watchdog can fire and save us
> > from blocking forver :) 
> 
> Ok, I can add a completion structure here. 

The code here doesn't block forever during shutdown, it will release all
outstanding userspace buffers anyway, see vhost_zerocopy_signal_used()
shutdown case.

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/4] rbd improvements

2011-05-26 Thread Kevin Wolf
Am 27.05.2011 01:07, schrieb Josh Durgin:
> This patchset moves the complexity of the rbd format into librbd and
> adds truncation support.
> 
> Changes since v5:
>  * compare full string, not prefix, with "conf" in 2/4
>  * when truncate fails, just return librbd's error
> 
> Changes since v4:
>  * fixed cosmetic issues pointed out by Christian Brunner
> 
> Changes since v3:
>  * trivially rebased
>  * updated copyright header
> 
> Changes since v2:
>  * return values are checked in rbd_aio_rw_vector
>  * bdrv_truncate added
> 
> Josh Durgin (4):
>   rbd: use the higher level librbd instead of just librados
>   rbd: allow configuration of rados from the rbd filename
>   rbd: check return values when scheduling aio
>   rbd: Add bdrv_truncate implementation
> 
>  block/rbd.c   |  896 
> +++--
>  block/rbd_types.h |   71 -
>  configure |   33 +--
>  3 files changed, 334 insertions(+), 666 deletions(-)
>  delete mode 100644 block/rbd_types.h

Thanks, applied to the block branch.

Kevin

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 0/4] rbd improvements

2011-05-26 Thread Josh Durgin
This patchset moves the complexity of the rbd format into librbd and
adds truncation support.

Changes since v5:
 * compare full string, not prefix, with "conf" in 2/4
 * when truncate fails, just return librbd's error

Changes since v4:
 * fixed cosmetic issues pointed out by Christian Brunner

Changes since v3:
 * trivially rebased
 * updated copyright header

Changes since v2:
 * return values are checked in rbd_aio_rw_vector
 * bdrv_truncate added

Josh Durgin (4):
  rbd: use the higher level librbd instead of just librados
  rbd: allow configuration of rados from the rbd filename
  rbd: check return values when scheduling aio
  rbd: Add bdrv_truncate implementation

 block/rbd.c   |  896 +++--
 block/rbd_types.h |   71 -
 configure |   33 +--
 3 files changed, 334 insertions(+), 666 deletions(-)
 delete mode 100644 block/rbd_types.h

-- 
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 0/4 net-next] macvtap/vhost TX zero-copy support

2011-05-26 Thread Shirley Ma
On Thu, 2011-05-26 at 23:28 +0300, Michael S. Tsirkin wrote:
> On Thu, May 26, 2011 at 01:00:20PM -0700, Shirley Ma wrote:
> > 3. Add sleep in vhost shutting down instead of busy-wait for
> outstanding
> >DMAs.
> 
> I still think this is not much better. We need to use a
> completion structure and wait on it instead.
> If this gets blocked thinkably a tx watchdog can fire and save us
> from blocking forver :) 

Ok, I can add a completion structure here.

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V6 0/4 net-next] macvtap/vhost TX zero-copy support

2011-05-26 Thread Michael S. Tsirkin
On Thu, May 26, 2011 at 01:00:20PM -0700, Shirley Ma wrote:
> 3. Add sleep in vhost shutting down instead of busy-wait for outstanding
>DMAs.

I still think this is not much better. We need to use a
completion structure and wait on it instead.
If this gets blocked thinkably a tx watchdog can fire and save us
from blocking forver :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 0/4 net-next] macvtap/vhost TX zero-copy support

2011-05-26 Thread Shirley Ma
This patchset add supports for TX zero-copy between guest and host
kernel through vhost. It significantly reduces CPU utilization on the
local host on which the guest is located (It reduced about 50% CPU usage
for single stream test on the host, while 4K message size BW has
increased about 50%). The patchset is based on previous submission and
comments from the community regarding when/how to handle guest kernel
buffers to be released. This is the simplest approach I can think of
after comparing with several other solutions.

This patchset has integrated V3 review comments from community: 

1. Add more comments on how to use device ZEROCOPY flag;

2. Change device ZEROCOPY to available bit 31

3. Fix skb header linear allocation when virtio_net GSO is not enabled

It has integrated V4 review comments from MST and Sridhar:

1. In vhost, using socket poll wake up for outstanding DMAs

2. Add detailed comments for vhost_zerocopy_signal_used call

3. Add sleep in vhost shutting down instead of busy-wait for outstanding
   DMAs.

4. Copy small packets, don't do zero-copy callback in mavtap, mark it's
   DMA done in vhost

5. change zerocopy to bool in macvtap.

It integrates V5 review comments from MST and 


Michał Mirosław 

1. Prevent userspace apps from holding skb userspace buffers by copying
userspace buffers to kernel in skb_clone, skb_copy, pskb_copy,
pskb_expand_head.

2. It is also used HIGHDMA, SG feature bits to enable ZEROCOPY to remove
the dependency of a new feature bit, we can add it later when new
feature bit is available.

This patchset includes:

1/4: Add a new sock zero-copy flag, SOCK_ZEROCOPY;

2/4: Add a new struct skb_ubuf_info in skb_share_info for userspace
buffers release callback when lower device DMA has done for that skb,
which is the last reference count gone; Or whenever skb_clone, skb_copy,
pskb_copy, pskb_expand_head get call from tcpdump, filtering, these
userspace
buffers will be copied into kernel ... we don't want userspace apps to
hold
userspace buffers too long.

3/4: Add vhost zero-copy callback in vhost when skb last refcnt is gone;
add vhost_zerocopy_signal_used to notify guest to release TX skb
buffers.

4/4: Add macvtap zero-copy in lower device when sending packet is
greater than 256 bytes.

The patchset is built against most recent net-next linux 2.6.39-rc7. It
has passed netperf/netserver multiple streams stress test, tcpdump
suspended test, dynamically SG change test.

Single TCP_STREAM 120 secs test results over ixgbe 10Gb NIC results:

Message BW(Gb/s)qemu-kvm (NumCPU)vhost-net(NumCPU) PerfTop irq/s
4K  7408.57 92.1%   22.6%   1229
4K(Orig)4913.17 118.1%  84.1%   2086
8K  9129.90 89.3%   23.3%   1141
8K(Orig)7094.55 115.9%  84.7%   2157
16K 9178.81 89.1%   23.3%   1139
16K(Orig)8927.1 118.7%  83.4%   2262
64K 9171.43 88.4%   24.9%   1253
64K(Orig)9085.85115.9%  82.4%   2229

For message size less or equal than 2K, there is a known KVM guest TX
overrun issue. With this zero-copy patch, the issue becomes more severe,
guest io_exits has tripled than before, so the performance is not good.
Once the TX overrun problem has been addressed, I will retest the small
message size performance.

 drivers/net/macvtap.c  |  132
---
 drivers/vhost/net.c|   44 +-
 drivers/vhost/vhost.c  |   49 +++
 drivers/vhost/vhost.h  |   13 
 include/linux/netdevice.h  |   10 +++
 include/linux/skbuff.h |   26 
 include/net/sock.h |1 +
 net/core/skbuff.c  |   81 -
 8 files changed, 345 insertions(+), 17 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 0/4]

2011-05-26 Thread Shirley Ma
This patchset add supports for TX zero-copy between guest and host
kernel through vhost. It significantly reduces CPU utilization on the
local host on which the guest is located (It reduced about 50% CPU usage
for single stream test on the host, while 4K message size BW has
increased about 50%). The patchset is based on previous submission and
comments from the community regarding when/how to handle guest kernel
buffers to be released. This is the simplest approach I can think of
after comparing with several other solutions.

This patchset has integrated V3 review comments from community: 

1. Add more comments on how to use device ZEROCOPY flag;

2. Change device ZEROCOPY to available bit 31

3. Fix skb header linear allocation when virtio_net GSO is not enabled

It has integrated V4 review comments from MST and Sridhar:

1. In vhost, using socket poll wake up for outstanding DMAs

2. Add detailed comments for vhost_zerocopy_signal_used call

3. Add sleep in vhost shutting down instead of busy-wait for outstanding
   DMAs.

4. Copy small packets, don't do zero-copy callback in mavtap, mark it's
   DMA done in vhost

5. change zerocopy to bool in macvtap.

It integrates V5 review comments from MST and 


Michał Mirosław 

1. Prevent userspace apps from holding skb userspace buffers by copying
userspace buffers to kernel in skb_clone, skb_copy, pskb_copy,
pskb_expand_head.

2. It is also used HIGHDMA, SG feature bits to enable ZEROCOPY to remove
the dependency of a new feature bit, we can add it later when new
feature bit is available.

This patchset includes:

1/4: Add a new sock zero-copy flag, SOCK_ZEROCOPY;

2/4: Add a new struct skb_ubuf_info in skb_share_info for userspace
buffers release callback when lower device DMA has done for that skb,
which is the last reference count gone; Or whenever skb_clone, skb_copy,
pskb_copy, pskb_expand_head get call from tcpdump, filtering, these userspace
buffers will be copied into kernel ... we don't want userspace apps to hold
userspace buffers too long.

3/4: Add vhost zero-copy callback in vhost when skb last refcnt is gone;
add vhost_zerocopy_signal_used to notify guest to release TX skb
buffers.

4/4: Add macvtap zero-copy in lower device when sending packet is
greater than 256 bytes.

The patchset is built against most recent net-next linux 2.6.39-rc7. It
has passed netperf/netserver multiple streams stress test, tcpdump
suspended test, dynamically SG change test.

Single TCP_STREAM 120 secs test results over ixgbe 10Gb NIC results:

Message BW(Gb/s)qemu-kvm (NumCPU)vhost-net(NumCPU) PerfTop irq/s
4K  7408.57 92.1%   22.6%   1229
4K(Orig)4913.17 118.1%  84.1%   2086
8K  9129.90 89.3%   23.3%   1141
8K(Orig)7094.55 115.9%  84.7%   2157
16K 9178.81 89.1%   23.3%   1139
16K(Orig)8927.1 118.7%  83.4%   2262
64K 9171.43 88.4%   24.9%   1253
64K(Orig)9085.85115.9%  82.4%   2229

For message size less or equal than 2K, there is a known KVM guest TX
overrun issue. With this zero-copy patch, the issue becomes more severe,
guest io_exits has tripled than before, so the performance is not good.
Once the TX overrun problem has been addressed, I will retest the small
message size performance.

 drivers/net/macvtap.c  |  132 ---
 drivers/vhost/net.c|   44 +-
 drivers/vhost/vhost.c  |   49 +++
 drivers/vhost/vhost.h  |   13 
 include/linux/netdevice.h  |   10 +++
 include/linux/skbuff.h |   26 
 include/net/sock.h |1 +
 net/core/skbuff.c  |   81 -
 8 files changed, 345 insertions(+), 17 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v6 0/4] irqfd fixes and enhancements

2009-06-29 Thread Gregory Haskins
Gregory Haskins wrote:
> (Applies to kvm.git/master:4631e094)
>
> The following is the latest attempt to fix the races in irqfd/eventfd, as
> well as restore DEASSIGN support.  For more details, please read the patch
> headers.
>
> You can also find this applied as a git tree:
>
> git pull 
> git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git 
> kvm/irqfd
>
> For reviewing convenience, here is a link to the entire virt/kvm/eventfd.c
> file after the patches are applied:
>
> http://git.kernel.org/?p=linux/kernel/git/ghaskins/linux-2.6-hacks.git;a=blob;f=virt/kvm/eventfd.c;h=409d9e160f1f85618a5e3772937b2721a249399a;hb=85cfd57e33dcaea29971513334ca003764653b21
>
> As always, this series has been tested against the kvm-eventfd unit test, and
> appears to be functioning properly. You can download this test here:
>
> ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2
>
> I've included version 4 of Davide's eventfd patch (ported to kvm.git) so
> that its a complete reviewable series.  Note, however, that there may be
> later versions of his patch to consider for merging, so we should
> coordinate with him.
>
> -Greg
>
>
> [Changelog:
>
>   v6:
>  *) Removed slow-work in favor of using a dedicated single-thread
>   workqueue.
>  *) Condensed cleanup path to always use deferred shutdown
>  *) Saved about 56 lines over v5, with the following diffstat:
>
>  include/linux/kvm_host.h |2 
>  virt/kvm/eventfd.c   |  248 
> ++-
>  2 files changed, 97 insertions(+), 153 deletions(-)
>   

Forgot another change:

  *) Fixed race in ASSIGN for the proper
acquisition order of the irqfd->eventfd

>   v5:
>Untracked..
> ]
>
> ---
>
> Davide Libenzi (1):
>   eventfd - revised interface and cleanups (4th rev)
>
> Gregory Haskins (3):
>   KVM: add irqfd DEASSIGN feature
>   KVM: Fix races in irqfd using new eventfd_kref_get interface
>   kvm: prepare irqfd for having interrupts disabled during 
> eventfd->release
>
>
>  drivers/lguest/lg.h  |2 
>  drivers/lguest/lguest_user.c |4 -
>  fs/aio.c |   24 +---
>  fs/eventfd.c |  126 ---
>  include/linux/aio.h  |4 -
>  include/linux/eventfd.h  |   35 +-
>  include/linux/kvm.h  |2 
>  include/linux/kvm_host.h |5 +
>  virt/kvm/Kconfig |1 
>  virt/kvm/eventfd.c   |  229 
> +++---
>  10 files changed, 321 insertions(+), 111 deletions(-)
>
>   




signature.asc
Description: OpenPGP digital signature


[KVM PATCH v6 0/4] irqfd fixes and enhancements

2009-06-29 Thread Gregory Haskins
(Applies to kvm.git/master:4631e094)

The following is the latest attempt to fix the races in irqfd/eventfd, as
well as restore DEASSIGN support.  For more details, please read the patch
headers.

You can also find this applied as a git tree:

git pull 
git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git 
kvm/irqfd

For reviewing convenience, here is a link to the entire virt/kvm/eventfd.c
file after the patches are applied:

http://git.kernel.org/?p=linux/kernel/git/ghaskins/linux-2.6-hacks.git;a=blob;f=virt/kvm/eventfd.c;h=409d9e160f1f85618a5e3772937b2721a249399a;hb=85cfd57e33dcaea29971513334ca003764653b21

As always, this series has been tested against the kvm-eventfd unit test, and
appears to be functioning properly. You can download this test here:

ftp://ftp.novell.com/dev/ghaskins/kvm-eventfd.tar.bz2

I've included version 4 of Davide's eventfd patch (ported to kvm.git) so
that its a complete reviewable series.  Note, however, that there may be
later versions of his patch to consider for merging, so we should
coordinate with him.

-Greg


[Changelog:

v6:
   *) Removed slow-work in favor of using a dedicated single-thread
  workqueue.
   *) Condensed cleanup path to always use deferred shutdown
   *) Saved about 56 lines over v5, with the following diffstat:

   include/linux/kvm_host.h |2 
   virt/kvm/eventfd.c   |  248 
++-
   2 files changed, 97 insertions(+), 153 deletions(-)

v5:
   Untracked..
]

---

Davide Libenzi (1):
  eventfd - revised interface and cleanups (4th rev)

Gregory Haskins (3):
  KVM: add irqfd DEASSIGN feature
  KVM: Fix races in irqfd using new eventfd_kref_get interface
  kvm: prepare irqfd for having interrupts disabled during eventfd->release


 drivers/lguest/lg.h  |2 
 drivers/lguest/lguest_user.c |4 -
 fs/aio.c |   24 +---
 fs/eventfd.c |  126 ---
 include/linux/aio.h  |4 -
 include/linux/eventfd.h  |   35 +-
 include/linux/kvm.h  |2 
 include/linux/kvm_host.h |5 +
 virt/kvm/Kconfig |1 
 virt/kvm/eventfd.c   |  229 +++---
 10 files changed, 321 insertions(+), 111 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html