date:20161130

Re: [Qemu-devel] [PATCH v2 2/6] hw/i2c: Add a NULL check for i2c slave init callbacks

2016-11-30 Thread Cédric Le Goater

On 11/30/2016 06:36 AM, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> Add a NULL check for i2c slave init callbacks, so that we no longer
> need to implement empty init functions.
> 
> Signed-off-by: Alastair D'Silva 

Reviewed-by:  Cédric Le Goater 

> ---
>  hw/arm/pxa2xx.c   | 9 +
>  hw/arm/tosa.c | 7 ---
>  hw/arm/z2.c   | 7 ---
>  hw/i2c/core.c | 6 +-
>  hw/timer/ds1338.c | 6 --
>  5 files changed, 6 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/arm/pxa2xx.c b/hw/arm/pxa2xx.c
> index 21ea1d6..bdcf6bc 100644
> --- a/hw/arm/pxa2xx.c
> +++ b/hw/arm/pxa2xx.c
> @@ -1449,17 +1449,10 @@ static const VMStateDescription vmstate_pxa2xx_i2c = {
>  }
>  };
>  
> -static int pxa2xx_i2c_slave_init(I2CSlave *i2c)
> -{
> -/* Nothing to do.  */
> -return 0;
> -}
> -
>  static void pxa2xx_i2c_slave_class_init(ObjectClass *klass, void *data)
>  {
>  I2CSlaveClass *k = I2C_SLAVE_CLASS(klass);
>  
> -k->init = pxa2xx_i2c_slave_init;
>  k->event = pxa2xx_i2c_event;
>  k->recv = pxa2xx_i2c_rx;
>  k->send = pxa2xx_i2c_tx;
> @@ -2070,7 +2063,7 @@ PXA2xxState *pxa270_init(MemoryRegion *address_space,
>  }
>  if (!revision)
>  revision = "pxa270";
> -
> +
>  s->cpu = cpu_arm_init(revision);
>  if (s->cpu == NULL) {
>  fprintf(stderr, "Unable to find CPU definition\n");
> diff --git a/hw/arm/tosa.c b/hw/arm/tosa.c
> index 1ee12f4..39d9dbb 100644
> --- a/hw/arm/tosa.c
> +++ b/hw/arm/tosa.c
> @@ -202,12 +202,6 @@ static int tosa_dac_recv(I2CSlave *s)
>  return -1;
>  }
>  
> -static int tosa_dac_init(I2CSlave *i2c)
> -{
> -/* Nothing to do.  */
> -return 0;
> -}
> -
>  static void tosa_tg_init(PXA2xxState *cpu)
>  {
>  I2CBus *bus = pxa2xx_i2c_bus(cpu->i2c[0]);
> @@ -275,7 +269,6 @@ static void tosa_dac_class_init(ObjectClass *klass, void 
> *data)
>  {
>  I2CSlaveClass *k = I2C_SLAVE_CLASS(klass);
>  
> -k->init = tosa_dac_init;
>  k->event = tosa_dac_event;
>  k->recv = tosa_dac_recv;
>  k->send = tosa_dac_send;
> diff --git a/hw/arm/z2.c b/hw/arm/z2.c
> index 68a92f3..b3a6bbd 100644
> --- a/hw/arm/z2.c
> +++ b/hw/arm/z2.c
> @@ -263,12 +263,6 @@ static int aer915_recv(I2CSlave *slave)
>  return retval;
>  }
>  
> -static int aer915_init(I2CSlave *i2c)
> -{
> -/* Nothing to do.  */
> -return 0;
> -}
> -
>  static VMStateDescription vmstate_aer915_state = {
>  .name = "aer915",
>  .version_id = 1,
> @@ -285,7 +279,6 @@ static void aer915_class_init(ObjectClass *klass, void 
> *data)
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  I2CSlaveClass *k = I2C_SLAVE_CLASS(klass);
>  
> -k->init = aer915_init;
>  k->event = aer915_event;
>  k->recv = aer915_recv;
>  k->send = aer915_send;
> diff --git a/hw/i2c/core.c b/hw/i2c/core.c
> index abd4c4c..ae3ca94 100644
> --- a/hw/i2c/core.c
> +++ b/hw/i2c/core.c
> @@ -260,7 +260,11 @@ static int i2c_slave_qdev_init(DeviceState *dev)
>  I2CSlave *s = I2C_SLAVE(dev);
>  I2CSlaveClass *sc = I2C_SLAVE_GET_CLASS(s);
>  
> -return sc->init(s);
> +if (sc->init) {
> +return sc->init(s);
> +} else {
> +return 0;
> +}
>  }
>  
>  DeviceState *i2c_create_slave(I2CBus *bus, const char *name, uint8_t addr)
> diff --git a/hw/timer/ds1338.c b/hw/timer/ds1338.c
> index 0112949..f5d04dd 100644
> --- a/hw/timer/ds1338.c
> +++ b/hw/timer/ds1338.c
> @@ -198,11 +198,6 @@ static int ds1338_send(I2CSlave *i2c, uint8_t data)
>  return 0;
>  }
>  
> -static int ds1338_init(I2CSlave *i2c)
> -{
> -return 0;
> -}
> -
>  static void ds1338_reset(DeviceState *dev)
>  {
>  DS1338State *s = DS1338(dev);
> @@ -220,7 +215,6 @@ static void ds1338_class_init(ObjectClass *klass, void 
> *data)
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  I2CSlaveClass *k = I2C_SLAVE_CLASS(klass);
>  
> -k->init = ds1338_init;
>  k->event = ds1338_event;
>  k->recv = ds1338_recv;
>  k->send = ds1338_send;
>

Re: [Qemu-devel] [PATCH v2 4/6] hw/timer: Add Epson RX8900 RTC support

2016-11-30 Thread Cédric Le Goater

On 11/30/2016 06:36 AM, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> This patch adds support for the Epson RX8900 I2C RTC.
> 
> The following chip features are implemented:
>  - RTC (wallclock based, ptimer 10x oversampling to pick up
>   wallclock transitions)
>  - Time update interrupt (per second/minute, wallclock based)
>  - Alarms (wallclock based)
>  - Temperature (set via a property)
>  - Countdown timer (emulated clock via ptimer)
>  - FOUT via GPIO (emulated clock via ptimer)
> 
> The following chip features are unimplemented:
>  - Low voltage detection
>  - i2c timeout
> 
> The implementation exports the following named GPIOs:
> rx8900-interrupt-out
> rx8900-fout-enable
> rx8900-fout
> 
> Signed-off-by: Alastair D'Silva 
> Signed-off-by: Chris Smart 

Reviewed-by:  Cédric Le Goater 


> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/timer/Makefile.objs  |   2 +
>  hw/timer/rx8900.c   | 890 
> 
>  hw/timer/rx8900_regs.h  | 139 +++
>  hw/timer/trace-events   |  31 ++
>  5 files changed, 1063 insertions(+)
>  create mode 100644 hw/timer/rx8900.c
>  create mode 100644 hw/timer/rx8900_regs.h
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 6de3e16..adb600e 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -29,6 +29,7 @@ CONFIG_SMC91C111=y
>  CONFIG_ALLWINNER_EMAC=y
>  CONFIG_IMX_FEC=y
>  CONFIG_DS1338=y
> +CONFIG_RX8900=y
>  CONFIG_PFLASH_CFI01=y
>  CONFIG_PFLASH_CFI02=y
>  CONFIG_MICRODRIVE=y
> diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
> index 7ba8c23..fa028ac 100644
> --- a/hw/timer/Makefile.objs
> +++ b/hw/timer/Makefile.objs
> @@ -3,6 +3,7 @@ common-obj-$(CONFIG_ARM_MPTIMER) += arm_mptimer.o
>  common-obj-$(CONFIG_A9_GTIMER) += a9gtimer.o
>  common-obj-$(CONFIG_CADENCE) += cadence_ttc.o
>  common-obj-$(CONFIG_DS1338) += ds1338.o
> +common-obj-$(CONFIG_RX8900) += rx8900.o
>  common-obj-$(CONFIG_HPET) += hpet.o
>  common-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
>  common-obj-$(CONFIG_M48T59) += m48t59.o
> @@ -17,6 +18,7 @@ common-obj-$(CONFIG_IMX) += imx_epit.o
>  common-obj-$(CONFIG_IMX) += imx_gpt.o
>  common-obj-$(CONFIG_LM32) += lm32_timer.o
>  common-obj-$(CONFIG_MILKYMIST) += milkymist-sysctl.o
> +common-obj-$(CONFIG_RX8900) += rx8900.o
>  
>  obj-$(CONFIG_EXYNOS4) += exynos4210_mct.o
>  obj-$(CONFIG_EXYNOS4) += exynos4210_pwm.o
> diff --git a/hw/timer/rx8900.c b/hw/timer/rx8900.c
> new file mode 100644
> index 000..e634819
> --- /dev/null
> +++ b/hw/timer/rx8900.c
> @@ -0,0 +1,890 @@
> +/*
> + * Epson RX8900SA/CE Realtime Clock Module
> + *
> + * Copyright (c) 2016 IBM Corporation
> + * Authors:
> + *  Alastair D'Silva 
> + *  Chris Smart 
> + *
> + * This code is licensed under the GPL version 2 or later.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Datasheet available at:
> + *  https://support.epson.biz/td/api/doc_check.php?dl=app_RX8900CE&lang=en
> + *
> + * Not implemented:
> + *  Implement i2c timeout
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "hw/i2c/i2c.h"
> +#include "hw/timer/rx8900_regs.h"
> +#include "hw/ptimer.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/bcd.h"
> +#include "qemu/log.h"
> +#include "qapi/error.h"
> +#include "qapi/visitor.h"
> +#include "trace.h"
> +
> + #include 
> +
> + #include 
> +
> +#define TYPE_RX8900 "rx8900"
> +#define RX8900(obj) OBJECT_CHECK(RX8900State, (obj), TYPE_RX8900)
> +
> +typedef struct RX8900State {
> +I2CSlave parent_obj;
> +
> +ptimer_state *sec_timer; /* triggered once per second */
> +ptimer_state *fout_timer;
> +ptimer_state *countdown_timer;
> +bool fout;
> +int64_t offset;
> +uint8_t weekday; /* Saved for deferred offset calculation, 0-6 */
> +uint8_t wday_offset;
> +uint8_t nvram[RX8900_NVRAM_SIZE];
> +int32_t ptr; /* Wrapped to stay within RX8900_NVRAM_SIZE */
> +bool addr_byte;
> +uint8_t last_interrupt_seconds;
> +uint8_t last_update_interrupt_minutes;
> +double supply_voltage;
> +qemu_irq interrupt_pin;
> +qemu_irq fout_pin;
> +} RX8900State;
> +
> +static const VMStateDescription vmstate_rx8900 = {
> +.name = "rx8900",
> +.version_id = 2,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_I2C_SLAVE(parent_obj, RX8900State),
> +VMSTATE_PTIMER(sec_timer, RX8900State),
> +VMSTATE_PTIMER(fout_timer, RX8900State),
> +VMSTATE_PTIMER(countdown_timer, RX8900State),
> +VMSTATE_BOOL(fout, RX8900State),
> +VMSTATE_INT64(offset, RX8900State),
> +VMSTATE_UINT8_V(weekday, RX8900State, 2),
> +VMSTATE_UINT8_V(wday_offset, RX8900State, 2),
> +VMSTATE_UINT8_ARRAY(nvram, RX8900State, RX8900_NVRAM_SIZE),
> +VMSTATE_INT32(ptr, RX8900State),
> +VMSTATE_BOOL(addr_byte, RX8900State),
> +VM

Re: [Qemu-devel] [PATCH v2 6/6] arm: Add an RX8900 RTC to the ASpeed board

2016-11-30 Thread Cédric Le Goater

On 11/30/2016 06:36 AM, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> Connect an RX8900 RTC to i2c12 of the AST2500 SOC at address 0x32
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  hw/arm/aspeed.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index c7206fd..8de95f2 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -166,7 +166,19 @@ static const TypeInfo palmetto_bmc_type = {
>  
>  static void ast2500_evb_init(MachineState *machine)
>  {
> +AspeedSoCState *soc;
> +I2CBus *i2c12;
> +DeviceState *rx8900;
> +
>  aspeed_board_init(machine, &aspeed_boards[AST2500_EVB]);
> +
> +soc = ASPEED_SOC(object_resolve_path_component(OBJECT(machine), "soc"));
> +
> +i2c12 = aspeed_i2c_get_bus((DeviceState *)&soc->i2c, 11);
> +rx8900 = i2c_create_slave(i2c12, "rx8900", 0x32);
> +
> +qdev_connect_gpio_out_named(rx8900, "rx8900-interrupt-out", 0,
> +qdev_get_gpio_in(DEVICE(&soc->vic), 22));
>  }
>  
>  static void ast2500_evb_class_init(ObjectClass *oc, void *data)
> 

I think it would be nicer to define a handler under AspeedBoardConfig,
something like : 

int (*i2c_init)(AspeedBoardState *bmc);

when a board needs I2C devices. The handler would be called 
from aspeed_board_init() directly. This is similar to what 
we do for the flash modules but we didn't have to add a 
specific handler for the flash.

Thoughts ?


C.

Re: [Qemu-devel] [PATCH v2 1/6] arm: Uniquely name imx25 I2C buses.

2016-11-30 Thread Cédric Le Goater

On 11/30/2016 06:36 AM, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> The imx25 chip provides 3 i2c buses, but they have all been named
> "i2c", which makes it difficult to predict which bus a device will
> be connected to when specified on the command line.
> 
> This patch addresses the issue by naming the buses uniquely:
>   i2c.0 i2c.1 i2c.2
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  hw/arm/imx25_pdk.c | 4 +---
>  hw/i2c/imx_i2c.c   | 6 +-
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
> index 025b608..c6f04d3 100644
> --- a/hw/arm/imx25_pdk.c
> +++ b/hw/arm/imx25_pdk.c
> @@ -138,9 +138,7 @@ static void imx25_pdk_init(MachineState *machine)
>   * We add it here (only on qtest usage) to be able to do a bit
>   * of simple qtest. See "make check" for details.
>   */
> -i2c_create_slave((I2CBus *)qdev_get_child_bus(DEVICE(&s->soc.i2c[0]),
> -  "i2c"),
> - "ds1338", 0x68);
> +i2c_create_slave(s->soc.i2c[0].bus, "ds1338", 0x68);
>  }
>  }
>  
> diff --git a/hw/i2c/imx_i2c.c b/hw/i2c/imx_i2c.c
> index 37e5a62..7be10fb 100644
> --- a/hw/i2c/imx_i2c.c
> +++ b/hw/i2c/imx_i2c.c
> @@ -305,12 +305,16 @@ static const VMStateDescription imx_i2c_vmstate = {
>  static void imx_i2c_realize(DeviceState *dev, Error **errp)
>  {
>  IMXI2CState *s = IMX_I2C(dev);
> +static int bus_count;

hmm, the static is ugly :/ 

Isn't there other ways to achieve this naming ? 

Thanks,

C.  

> +char name[16];
> +
> +snprintf(name, sizeof(name), "i2c.%d", bus_count++);
>  
>  memory_region_init_io(&s->iomem, OBJECT(s), &imx_i2c_ops, s, 
> TYPE_IMX_I2C,
>IMX_I2C_MEM_SIZE);
>  sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
>  sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->irq);
> -s->bus = i2c_init_bus(DEVICE(dev), "i2c");
> +s->bus = i2c_init_bus(DEVICE(dev), name);
>  }
>  
>  static void imx_i2c_class_init(ObjectClass *klass, void *data)
>

Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

2016-11-30 Thread Stefan Hajnoczi

On Wed, Nov 30, 2016 at 04:20:03AM +, Rakesh Ranjan wrote:
> > Why does the client have to know about failover if it's connected to
> >a server process on the same host?  I thought the server process
> >manages networking issues (like the actual protocol to speak to other
> >VxHS nodes and for failover).
> 
> Just to comment on this, the model being followed within HyperScale is to
> allow application I/O continuity (resiliency) in various cases as
> mentioned below. It really adds value for consumer/customer and tries to
> avoid culprits for single points of failure.
> 
> 1. HyperScale storage service failure (QNIO Server)
>   - Daemon managing local storage for VMs and runs on each compute node
>   - Daemon can run as a service on Hypervisor itself as well as within VSA
> (Virtual Storage Appliance or Virtual Machine running on the hypervisor),
> which depends on ecosystem where HyperScale is supported
>   - Daemon or storage service down/crash/crash-in-loop shouldn¹t lead to 
> an
> huge impact on all the VMs running on that hypervisor or compute node
> hence providing service level resiliency is very useful for
>   application I/O continuity in such case.
> 
>Solution:
>   - The service failure handling can be only done at the client side and
> not at the server side since service running as a server itself is down.
>   - Client detects an I/O error and depending on the logic, it does
> application I/O failover to another available/active QNIO server or
> HyperScale Storage service running on different compute node
> (reflection/replication node)
>   - Once the orig/old server comes back online, client gets/receives
> negotiated error (not a real application error) to do the application I/O
> failback to the original server or local HyperScale storage service to get
> better I/O performance.
>   
> 2. Local physical storage or media failure
>   - Once server or HyperScale storage service detects the media or local
> disk failure, depending on the vDisk (guest disk) configuration, if
> another storage copy is available
>   on different compute node then it internally handles the local
> fault and serves the application read and write requests otherwise
> application or client gets the fault.
>   - Client doesn¹t know about any I/O failure since Server or Storage
> service manages/handles the fault tolerance.
> - In such case, in order to get some I/O performance benefit, once
> client gets a negotiated error (not an application error) from local
> server or storage service,
>   client can initiate I/O failover and can directly send
> application I/O to another compute node where storage copy is available to
> serve the application need instead of sending it locally where media is
> faulted.   

Thanks for explaining the model.

The new information for me here is that the qnio server may run in a VM
instead of on the host and that the client will attempt to use a remote
qnio server if the local qnio server fails.

This means that although the discussion most recently focussed on local
I/O tap performance, there is a requirement for a network protocol too.
The local I/O tap stuff is just an optimization for when the local qnio
server can be used.

Stefan


signature.asc
Description: PGP signature

[Qemu-devel] [RFC Design Doc v3] Enable Shared Virtual Memory feature in pass-through scenarios

2016-11-30 Thread Liu, Yi L

What's changed from v2:
a) Detailed feature description
b) refine description in "Address translation in virtual SVM"
b) "Terms" is added

Content
===
1. Feature description
2. Why use it?
3. How to enable it
4. How to test
5. Terms

Details
===
1. Feature description
Shared virtual memory(SVM) is to let application program share its virtual
address with SVM capable devices.

Shared virtual memory details:
a) SVM feature requires ATS/PRQ/PASID support on both device side and
IOMMU side.
b) SVM capable devices could send DMA requests with PASID, the address
in the request would be a virtual address within a program's virtual address
space.
c) IOMMU would use first level page table to translate the address in the
request.
d) First level page table is a HVA->HPA mapping on bare metal.

Shared Virtual Memory feature in pass-through scenarios is actually SVM
virtualization. It is to let application programs(running in guest)share their
virtual address with assigned device(e.g. graphics processors or accelerators).

In virtualization, SVM would be:
a) Require a vIOMMU exposed to guest
b) Assigned SVM capable device could send DMA requests with PASID, the
address in the request would be a virtual address within a guest
program's virtual address space(GVA).
c) Physical IOMMU needs to do GVA->GPA->HPA translation. Nested mode
would be enabled, first level page table would achieve GVA->GPA mapping,
while second level page table would achieve GPA->HPA translation.

For more SVM detail, you may want refer to section 2.5.1.1 of Intel VT-d spec
and section 5.6 of OpenCL spec. For details about SVM address translation,
pls refer to section 3 of Intel VT-d spec.
It's also welcomed to discuss directly in this thread.

Link to related specs:
http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf

2. Why use it?
It is common to pass-through devices to guest and expect to achieve as
much similar performance as it is on host. With this feature enabled,
the application programs in guest would be able to share data-structures
with assigned devices without unnecessary overheads.

3. How to enable it
As mentioned above, SVM virtualization requires a vIOMMU exposed to guest.
Since there is an existing IOMMU emulator in host user space(QEMU), it is
more acceptable to extend the IOMMU emulator to support SVM for assigned
devices. So far, the vIOMMU exposed to guest is only for emulated devices.
In this design, it would focus on virtual SVM for assigned devices. Virtual
IOVA and virtual interrupt remapping will not be included here.

The enabling work would include the following items.

a) IOMMU Register Access Emulation
Already existed in QEMU, need some extensions to support SVM. e.g. support
page request service related registers(PQA_REG).

b) vIOMMU Capability
Report SVM related capabilities(PASID,PRS,DT,PT,ECS etc.) in ex-capability
register and cache mode, DWD, DRD in capability register.

c) QI Handling Emulation
Already existed in QEMU, need to shadow the QIs related to assigned devices to
physical IOMMU.
i. ex-context entry cache invalidation(nested mode setting, guest PASID
table
pointer shadowing)
ii. 1st level translation cache invalidation
iii.Response for recoverable faults

d) Address translation in virtual SVM
In virtualization, for requests with PASID from assigned device, the address
translation
would be subjected to first level page table and then second level page table,
which is
named nested mode. Extended context mode should be supported on hardware. DMA
remapping in SVM virtualization would be:
i. For requests with PASID, the related extended context entry should have
the NESTE bit set.
ii. Guest PASID table pointer should be shadowed to host IOMMU driver.
The PASID table pointer field in extended context entry would be a GPA as
nested mode is on.

First level page table would be maintained by guest IOMMU driver. Second level
page table would be maintained by host IOMMU driver.

e) Recoverable Address Translation Faults Handling Emulation
It is serviced by page request when device support PRS. For assigned devices,
host IOMMU driver would get page requests from pIOMMU. Here, we need a
mechanism to drain the page requests from devices which are assigned to a
guest. In this design it would be done through VFIO. Page request descriptors
would be propagated to user space and then exposed to guest IOMMU driver.
This requires following support:
i. a mechanism to notify vIOMMU emulator to fetch PRQ descriptor
ii. a notify framework in QEMU to signal the PRQ descriptor fetching when
notified by pIOMMU

f) Non-Recoverable Address Translation Handling Emulation
The non-recoverable fault propagation is similar to recoverable faults. In
this design it would propagate fault data to user space

Re: [Qemu-devel] [PATCH 1/2] virtio-net rsc: support coalescing ipv4 tcp traffic

2016-11-30 Thread Wei Xu


On 2016年11月24日 12:17, Jason Wang wrote:



On 2016年11月01日 01:41, w...@redhat.com wrote:

From: Wei Xu 

All the data packets in a tcp connection are cached
to a single buffer in every receive interval, and will
be sent out via a timer, the 'virtio_net_rsc_timeout'
controls the interval, this value may impact the
performance and response time of tcp connection,
5(50us) is an experience value to gain a performance
improvement, since the whql test sends packets every 100us,
so '30(300us)' passes the test case, it is the default
value as well, tune it via the command line parameter
'rsc_interval' within 'virtio-net-pci' device, for example,
to launch a guest with interval set as '50':

'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,rsc_interval=50'


The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of IPv4/6 in a
VirtIONet device.

A new segment becomes a 'Candidate' as well as it passed sanity check,
the main handler of TCP includes TCP window update, duplicated
ACK check and the real data coalescing.

An 'Candidate' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. An IP options or IP fragment
3. Not a TCP packet
4. Sanity size check to prevent buffer overflow attack.
5. An ECN packet

Even though, there might more cases should be considered such as
ip identification other flags, while it breaks the test because
windows set it to the same even it's not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag,
'bypass' and 'finalize', 'bypass' means should be sent out directly,
while 'finalize' means the packets should also be bypassed, but this
should be done after search for the same connection packets in the
pool and drain all of them out, this is to avoid out of order fragment.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'URG/FIN/RST/CWR/ECE' will trigger a
finalization, because this normally happens upon a connection is going
to be closed, an 'URG' packet also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the
'out of order' and 'out of window' means how many retransmitting packets,
thus describe the performance intuitively.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 602
++--
  include/hw/virtio/virtio-net.h  |   5 +-
  include/hw/virtio/virtio.h  |  76 
  include/net/eth.h   |   2 +
  include/standard-headers/linux/virtio_net.h |  14 +
  net/tap.c   |   3 +-
  6 files changed, 670 insertions(+), 32 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 06bfe4b..d1824d9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -43,6 +45,24 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
+#define VIRTIO_NET_IP4_ADDR_SIZE   8/* ipv4 saddr + daddr */


Only used once in the code, I don't see much value of this macro.


Just to keep it a bit readable.




+
+#define VIRTIO_NET_TCP_FLAG 0x3F
+#define VIRTIO_NET_TCP_HDR_LENGTH   0xF000
+
+/* IPv4 max payload, 16 bits in the header */
+#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
+#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
+
+/* header length value in ip header without option */
+#define VIRTIO_NET_IP4_HEADER_LENGTH 5
+
+/* Purge coalesced packets timer interval, This value affects the
performance
+   a lot, and should be tuned carefully, '30'(300us) is the
recommended
+   value to pass the WHQL test, '5' can gain 2x netperf
throughput with
+   tso/gso/gro 'off'. */
+#define VIRTIO_NET_RSC_INTERVAL  30


This should be a property for virito-net and the above comment can be
the description of the property.


This is a value for a property, actually I hadn't found a place to put
it.




+
  typedef struct VirtIOFeature {
  uint32_t flags;
  size_t end;
@@ -589,7 +609,12 @@ static uint64_t
virtio_net_guest_offloads_by_features(uint32_t features)
  (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
  (1ULL << VIRTIO_NET_F_GUEST_UFO);
-return guest_offloads_mask & features;
+if (features & VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) {
+return (guest_offloads_mask & features) |
+   (1ULL << VIRTIO_NET_F_GUEST_RSC

[Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-11-30 Thread Liang Li

This patch set contains two parts of changes to the virtio-balloon.
 
One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.
 
Another change is for speeding up live migration. By skipping process
guest's unused pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's unused page information in a bitmap
and send it to host with the virt queue of virtio-balloon. For an idle
guest with 8GB RAM, this can help to shorten the total live migration
time from 2Sec to about 500ms in 10Gbps network environment.
 
Changes from v4 to v5:
* Drop the code to get the max_pfn, use another way instead.
* Simplify the API to get the unused page information from mm. 

Changes from v3 to v4:
* Use the new scheme suggested by Dave Hansen to encode the bitmap.
* Add code which is missed in v3 to handle migrate page. 
* Free the memory for bitmap intime once the operation is done.
* Address some of the comments in v3.

Changes from v2 to v3:
* Change the name of 'free page' to 'unused page'.
* Use the scatter & gather bitmap instead of a 1MB page bitmap.
* Fix overwriting the page bitmap after kicking.
* Some of MST's comments for v2.
 
Changes from v1 to v2:
* Abandon the patch for dropping page cache.
* Put some structures to uapi head file.
* Use a new way to determine the page bitmap size.
* Use a unified way to send the free page information with the bitmap
* Address the issues referred in MST's comments

Liang Li (5):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and head struct
  virtio-balloon: speed up inflate/deflate process
  virtio-balloon: define flags and head for host request vq
  virtio-balloon: tell host vm's unused page info

 drivers/virtio/virtio_balloon.c | 539 
 include/linux/mm.h  |   3 +-
 include/uapi/linux/virtio_balloon.h |  41 +++
 mm/page_alloc.c |  72 +
 4 files changed, 607 insertions(+), 48 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH kernel v5 1/5] virtio-balloon: rework deflate to add page to a list

2016-11-30 Thread Liang Li

When doing the inflating/deflating operation, the current virtio-balloon
implementation uses an array to save 256 PFNS, then send these PFNS to
host through virtio and process each PFN one by one. This way is not
efficient when inflating/deflating a large mount of memory because too
many times of the following operations:

1. Virtio data transmission
2. Page allocate/free
3. Address translation(GPA->HVA)
4. madvise

The over head of these operations will consume a lot of CPU cycles and
will take a long time to complete, it may impact the QoS of the guest as
well as the host. The overhead will be reduced a lot if batch processing
is used. E.g. If there are several pages whose address are physical
contiguous in the guest, these bulk pages can be processed in one
operation.

The main idea for the optimization is to reduce the above operations as
much as possible. And it can be achieved by using a bitmap instead of an
PFN array. Comparing with PFN array, for a specific size buffer, bitmap
can present more pages, which is very important for batch processing.

Using bitmap instead of PFN is not very helpful when inflating/deflating
a small mount of pages, in this case, using PFNs is better. But using
bitmap will not impact the QoS of guest or host heavily because the
operation will be completed very soon for a small mount of pages, and we
will use some methods to make sure the efficiency not drop too much.

This patch saves the deflated pages to a list instead of the PFN array,
which will allow faster notifications using a bitmap down the road.
balloon_pfn_to_page() can be removed because it's useless.

Signed-off-by: Liang Li 
Signed-off-by: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 drivers/virtio/virtio_balloon.c | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 181793f..f59cb4f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -103,12 +103,6 @@ static u32 page_to_balloon_pfn(struct page *page)
return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE;
 }
 
-static struct page *balloon_pfn_to_page(u32 pfn)
-{
-   BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE);
-   return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE);
-}
-
 static void balloon_ack(struct virtqueue *vq)
 {
struct virtio_balloon *vb = vq->vdev->priv;
@@ -181,18 +175,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, 
size_t num)
return num_allocated_pages;
 }
 
-static void release_pages_balloon(struct virtio_balloon *vb)
+static void release_pages_balloon(struct virtio_balloon *vb,
+struct list_head *pages)
 {
-   unsigned int i;
-   struct page *page;
+   struct page *page, *next;
 
-   /* Find pfns pointing at start of each page, get pages and free them. */
-   for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev,
-  vb->pfns[i]));
+   list_for_each_entry_safe(page, next, pages, lru) {
if (!virtio_has_feature(vb->vdev,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
adjust_managed_page_count(page, 1);
+   list_del(&page->lru);
put_page(page); /* balloon reference */
}
 }
@@ -202,6 +194,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
size_t num)
unsigned num_freed_pages;
struct page *page;
struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
+   LIST_HEAD(pages);
 
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
@@ -215,6 +208,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
size_t num)
if (!page)
break;
set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+   list_add(&page->lru, &pages);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
 
@@ -226,7 +220,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, 
size_t num)
 */
if (vb->num_pfns != 0)
tell_host(vb, vb->deflate_vq);
-   release_pages_balloon(vb);
+   release_pages_balloon(vb, &pages);
mutex_unlock(&vb->balloon_lock);
return num_freed_pages;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info

2016-11-30 Thread Liang Li

This patch contains two parts:

One is to add a new API to mm go get the unused page information.
The virtio balloon driver will use this new API added to get the
unused page info and send it to hypervisor(QEMU) to speed up live
migration. During sending the bitmap, some the pages may be modified
and are used by the guest, this inaccuracy can be corrected by the
dirty page logging mechanism.

One is to add support the request for vm's unused page information,
QEMU can make use of unused page information and the dirty page
logging mechanism to skip the transportation of some of these unused
pages, this is very helpful to reduce the network traffic and speed
up the live migration process.

Signed-off-by: Liang Li 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 drivers/virtio/virtio_balloon.c | 126 +---
 include/linux/mm.h  |   3 +-
 mm/page_alloc.c |  72 +++
 3 files changed, 193 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index c3ddec3..2626cc0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -56,7 +56,7 @@
 
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *req_vq;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -75,6 +75,8 @@ struct virtio_balloon {
void *resp_hdr;
/* Pointer to the start address of response data. */
unsigned long *resp_data;
+   /* Size of response data buffer. */
+   unsigned long resp_buf_size;
/* Pointer offset of the response data. */
unsigned long resp_pos;
/* Bitmap and bitmap count used to tell the host the pages */
@@ -83,6 +85,8 @@ struct virtio_balloon {
unsigned int nr_page_bmap;
/* Used to record the processed pfn range */
unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
+   /* Request header */
+   struct virtio_balloon_req_hdr req_hdr;
/*
 * The pages we've told the Host we're not using are enqueued
 * at vb_dev_info->pages list.
@@ -551,6 +555,58 @@ static void update_balloon_stats(struct virtio_balloon *vb)
pages_to_bytes(available));
 }
 
+static void send_unused_pages_info(struct virtio_balloon *vb,
+   unsigned long req_id)
+{
+   struct scatterlist sg_in;
+   unsigned long pos = 0;
+   struct virtqueue *vq = vb->req_vq;
+   struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr;
+   int ret, order;
+
+   mutex_lock(&vb->balloon_lock);
+
+   for (order = MAX_ORDER - 1; order >= 0; order--) {
+   pos = 0;
+   ret = get_unused_pages(vb->resp_data,
+vb->resp_buf_size / sizeof(unsigned long),
+order, &pos);
+   if (ret == -ENOSPC) {
+   void *new_resp_data;
+
+   new_resp_data = kmalloc(2 * vb->resp_buf_size,
+   GFP_KERNEL);
+   if (new_resp_data) {
+   kfree(vb->resp_data);
+   vb->resp_data = new_resp_data;
+   vb->resp_buf_size *= 2;
+   order++;
+   continue;
+   } else
+   dev_warn(&vb->vdev->dev,
+"%s: omit some %d order pages\n",
+__func__, order);
+   }
+
+   if (pos > 0) {
+   vb->resp_pos = pos;
+   hdr->cmd = BALLOON_GET_UNUSED_PAGES;
+   hdr->id = req_id;
+   if (order > 0)
+   hdr->flag = BALLOON_FLAG_CONT;
+   else
+   hdr->flag = BALLOON_FLAG_DONE;
+
+   send_resp_data(vb, vq, true);
+   }
+   }
+
+   mutex_unlock(&vb->balloon_lock);
+   sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+   virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
+   virtqueue_kick(vq);
+}
+
 /*
  * While most virtqueues communicate guest-initiated requests to the 
hypervisor,
  * the stats queue operates in reverse.  The driver initializes the virtqueue
@@ -685,18 +741,56 @@ static void update_balloon_size_func(struct work_struct 
*work)
queue_work(system_freezable_wq, work);
 }
 
+static void misc_handle_rq(struct virtio_balloon *vb)
+{
+   struct virtio_balloon_req_hdr *ptr_hdr;
+

Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

2016-11-30 Thread Stefan Hajnoczi

On Mon, Nov 28, 2016 at 02:17:56PM +, Stefan Hajnoczi wrote:
> Please take a look at vhost-user-scsi, which folks from Nutanix are
> currently working on.  See "[PATCH v2 0/3] Introduce vhost-user-scsi and
> sample application" on qemu-devel.  It is a true zero-copy local I/O tap
> because it shares guest RAM.  This is more efficient than cross memory
> attach's single memory copy.  It does not require running the server as
> root.  This is the #1 thing you should evaluate for your final
> architecture.
> 
> vhost-user-scsi works on the virtio-scsi emulation level.  That means
> the server must implement the virtio-scsi vring and device emulation.
> It is not a block driver.  By hooking in at this level you can achieve
> the best performance but you lose all QEMU block layer functionality and
> need to implement your own SCSI target.  You also need to consider live
> migration.

To clarify why I think vhost-user-scsi is best suited to your
requirements for performance:

With vhost-user-scsi the qnio server would be notified by kvm.ko via
eventfd when the VM submits new I/O requests to the virtio-scsi HBA.
The QEMU process is completely bypassed for I/O request submission and
the qnio server processes the SCSI command instead.  This avoids the
context switch to QEMU and then to the qnio server.  With cross memory
attach QEMU first needs to process the I/O request and hand it to
libqnio before the qnio server can be scheduled.

The vhost-user-scsi qnio server has shared memory access to guest RAM
and is therefore able to do zero-copy I/O into guest buffers.  Cross
memory attach always incurs a memory copy.

Using this high-performance architecture requires significant changes
though.  vhost-user-scsi hooks into the stack at a different layer so a
QEMU block driver is not used at all.  QEMU also wouldn't use libqnio.
Instead everything will live in your qnio server process (not part of
QEMU).

You'd have to rethink the resiliency strategy because you currently rely
on the QEMU block driver connecting to a different qnio server if the
local qnio server fails.  In the vhost-user-scsi world it's more like
having a phyiscal SCSI adapter - redundancy and multipathing are used to
achieve resiliency.

For example, virtio-scsi HBA #1 would connect to the local qnio server
process.  virtio-scsi HBA #2 would connect to another local process
called the "proxy process" which forwards requests to a remote qnio
server (using libqnio?).  If HBA #1 fails then I/O is sent to HBA #2
instead.  The path can reset back to HBA #1 once that becomes
operational again.

If the qnio server is supposed to run in a VM instead of directly in the
host environment then it's worth looking at the vhost-pci work that Wei
Wang  is working on.  The email thread is called
"[PATCH v2 0/4] *** vhost-user spec extension for vhost-pci ***".  The
idea here is to allow inter-VM virtio device emulation so that instead
of terminating the virtio-scsi device in the qnio server process on the
host, you can terminate it inside another VM with good performance
characteristics.

Stefan

signature.asc
Description: PGP signature

[Qemu-devel] [PATCH kernel v5 2/5] virtio-balloon: define new feature bit and head struct

2016-11-30 Thread Liang Li

Add a new feature which supports sending the page information with
a bitmap. The current implementation uses PFNs array, which is not
very efficient. Using bitmap can improve the performance of
inflating/deflating significantly

The page bitmap header will used to tell the host some information
about the page bitmap. e.g. the page size, page bitmap length and
start pfn.

Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 include/uapi/linux/virtio_balloon.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 343d7dd..1be4b1f 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST0 /* Tell before reclaiming 
pages */
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_PAGE_BITMAP   3 /* Send page info with bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -82,4 +83,22 @@ struct virtio_balloon_stat {
__virtio64 val;
 } __attribute__((packed));
 
+/* Response header structure */
+struct virtio_balloon_resp_hdr {
+   __le64 cmd : 8; /* Distinguish different requests type */
+   __le64 flag: 8; /* Mark status for a specific request type */
+   __le64 id : 16; /* Distinguish requests of a specific type */
+   __le64 data_len: 32; /* Length of the following data, in bytes */
+};
+
+/* Page bitmap header structure */
+struct virtio_balloon_bmap_hdr {
+   struct {
+   __le64 start_pfn : 52; /* start pfn for the bitmap */
+   __le64 page_shift : 6; /* page shift width, in bytes */
+   __le64 bmap_len : 6;  /* bitmap length, in bytes */
+   } head;
+   __le64 bmap[0];
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.8.3.1

[Qemu-devel] [PATCH kernel v5 3/5] virtio-balloon: speed up inflate/deflate process

2016-11-30 Thread Liang Li

The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li 
Suggested-by: Michael S. Tsirkin 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 drivers/virtio/virtio_balloon.c | 395 +---
 1 file changed, 367 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f59cb4f..c3ddec3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -42,6 +42,10 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+#define BALLOON_BMAP_SIZE  (8 * PAGE_SIZE)
+#define PFNS_PER_BMAP  (BALLOON_BMAP_SIZE * BITS_PER_BYTE)
+#define BALLOON_BMAP_COUNT 32
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
@@ -67,6 +71,18 @@ struct virtio_balloon {
 
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+   /* Pointer to the response header. */
+   void *resp_hdr;
+   /* Pointer to the start address of response data. */
+   unsigned long *resp_data;
+   /* Pointer offset of the response data. */
+   unsigned long resp_pos;
+   /* Bitmap and bitmap count used to tell the host the pages */
+   unsigned long *page_bitmap[BALLOON_BMAP_COUNT];
+   /* Number of split page bitmaps */
+   unsigned int nr_page_bmap;
+   /* Used to record the processed pfn range */
+   unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
/*
 * The pages we've told the Host we're not using are enqueued
 * at vb_dev_info->pages list.
@@ -110,20 +126,228 @@ static void balloon_ack(struct virtqueue *vq)
wake_up(&vb->acked);
 }
 
-static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
+static inline void init_bmap_pfn_range(struct virtio_balloon *vb)
 {
-   struct scatterlist sg;
+   vb->min_pfn = ULONG_MAX;
+   vb->max_pfn = 0;
+}
+
+static inline void update_bmap_pfn_range(struct virtio_balloon *vb,
+struct page *page)
+{
+   unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+   vb->min_pfn = min(balloon_pfn, vb->min_pfn);
+   vb->max_pfn = max(balloon_pfn, vb->max_pfn);
+}
+
+static void extend_page_bitmap(struct virtio_balloon *vb,
+   unsigned long nr_pfn)
+{
+   int i, bmap_count;
+   unsigned long bmap_len;
+
+   bmap_len = ALIGN(nr_pfn, BITS_PER_LONG) / BITS_PER_BYTE;
+   bmap_len = ALIGN(bmap_len, BALLOON_BMAP_SIZE);
+   bmap_count = min((int)(bmap_len / BALLOON_BMAP_SIZE),
+BALLOON_BMAP_COUNT);
+
+   for (i = 1; i < bmap_count; i++) {
+   vb->page_bitmap[i] = kmalloc(BALLOON_BMAP_SIZE, GFP_KERNEL);
+   if (vb->page_bitmap[i])
+   vb->nr_page_bmap++;
+   else
+   break;
+   }
+}
+
+static void free_extended_page_bitmap(struct virtio_balloon *vb)
+{
+   int i, bmap_count = vb->nr_page_bmap;
+
+
+   for (i = 1; i < bmap_count; i++) {
+   kfree(vb->page_bitmap[i]);
+   vb->page_bitmap[i] = NULL;
+   vb->nr_page_bmap--;
+   }
+}
+
+static void kfree_page_bitmap(struct virtio_balloon *vb)
+{
+   int i;
+
+   for (i = 0; i < vb->nr_page_bmap; i++)
+   kfree(vb->page_bitmap[i]);
+}
+
+static void clear_page_bitmap(struct virtio_balloon *vb)
+{
+   int i;
+
+   for (i = 0; i < vb->nr_page_bmap; i++)
+   memset(vb->page_bitmap[i], 0, BALLOON_BMAP_SIZE);
+}
+
+static unsigned long do_set_resp_bitmap(struct virtio_balloon *vb,
+   unsigned long *bitmap,  unsigned long base_pfn,
+   unsigned long pos, int nr_page)
+
+{
+   s

Re: [Qemu-devel] qemu-system-sh4 vs qemu-system-arm/i386 default behavior

2016-11-30 Thread Aurelien Jarno

On 2016-11-30 08:33, Thomas Huth wrote:
> On 30.11.2016 02:01, Tom Rini wrote:
> > Hey all,
> > 
> > I'm trying to make use of the r2d platform for U-Boot testing via QEMU.
> > After applying a series[1] I can use the kernel.org sh4 toolchain to get
> > a u-boot.bin that runs, mostly.  I say mostly as first of all I have to
> > pass "-monitor null -serial null -serial stdio -nographic" to
> > qemu-system-sh4 and in that order for me to get output from U-Boot on
> > the prompt.  On other platforms such as arm and vexpress or i386 and the
> > 'pc' machine I do not need to do this.  Does anyone have any idea why
> > this might be and where to start poking in the code to fix this?

The reason is that u-boot and the linux kernel do not have the same way
to number the serial port than the physical hardware. Therefore u-boot
and the Linux kernel use the second physical serial port .The question is
whether we should number our ports from the software (or part of the
sofrware) or hardware point of view.

> The "-serial" parameter is related to the serial_hds[] array in the
> code, so you could search for that one.
> 
> The following line in hw/sh4/r2d.c looks somewhat suspicious:
> 
> sm501_init(address_space_mem, 0x1000, SM501_VRAM_SIZE,
>irq[SM501], serial_hds[2]);
> 
> Why is this machine always using serial_hds[2] and not a lower index?
> ... Maybe the maintainer of the board (Magnus) knows the answer here...

The third serial port is provided by the graphic chipset. The first two
serial ports are provided by the SH7750 CPU, see in hw/sh4/sh7750.c.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH kernel v5 4/5] virtio-balloon: define flags and head for host request vq

2016-11-30 Thread Liang Li

Define the flags and head struct for a new host request virtual
queue. Guest can get requests from host and then responds to them on
this new virtual queue.
Host can make use of this virtual queue to request the guest do some
operations, e.g. drop page cache, synchronize file system, etc.
And the hypervisor can get some of guest's runtime information
through this virtual queue too, e.g. the guest's unused page
information, which can be used for live migration optimization.

Signed-off-by: Liang Li 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Cc: Cornelia Huck 
Cc: Amit Shah 
Cc: Dave Hansen 
---
 include/uapi/linux/virtio_balloon.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 1be4b1f..5ac3a40 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ  1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_PAGE_BITMAP   3 /* Send page info with bitmap */
+#define VIRTIO_BALLOON_F_HOST_REQ_VQ   4 /* Host request virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -101,4 +102,25 @@ struct virtio_balloon_bmap_hdr {
__le64 bmap[0];
 };
 
+enum virtio_balloon_req_id {
+   /* Get unused page information */
+   BALLOON_GET_UNUSED_PAGES,
+};
+
+enum virtio_balloon_flag {
+   /* Have more data for a request */
+   BALLOON_FLAG_CONT,
+   /* No more data for a request */
+   BALLOON_FLAG_DONE,
+};
+
+struct virtio_balloon_req_hdr {
+   /* Used to distinguish different requests */
+   __le16 cmd;
+   /* Reserved */
+   __le16 reserved[3];
+   /* Request parameter */
+   __le64 param;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.8.3.1

Re: [Qemu-devel] Linux kernel polling for QEMU

2016-11-30 Thread Andrew Jones

On Wed, Nov 30, 2016 at 07:19:12AM +, Peter Maydell wrote:
> On 29 November 2016 at 19:38, Andrew Jones  wrote:
> > Thanks for making me look, I was simply assuming we were in the while
> > loops above.
> >
> > I couldn't get the problem to reproduce with access to the monitor,
> > but by adding '-d exec' I was able to see cpu0 was on the wfe in
> > smp_boot_secondary. It should only stay there until cpu1 executes the
> > sev in secondary_cinit, but it looks like TCG doesn't yet implement sev
> >
> >  $ grep SEV target-arm/translate.c
> > /* TODO: Implement SEV, SEVL and WFE.  May help SMP performance.
> 
> Yes, we currently NOP SEV. We only implement WFE as "yield back
> to TCG top level loop", though, so this is fine. The idea is
> that WFE gets used in busy loops so it's a helpful hint to
> try running some other TCG vCPU instead of just spinning in
> the guest on this one. Implementing SEV as a NOP and WFE as
> a more-or-less NOP is architecturally permitted (guest code
> is required to cope with WFE returning "early"). If something
> is not working correctly then it's either buggy guest code
> or a problem with the generic TCG scheduling of CPUs.

The problem is indeed with the scheduling. The way it currently works
is to depend on the iothread to kick a reschedule once in a while, or
a cpu to issue an instruction that does so (wfe/wfi). However if
there's no io and a cpu never issues a scheduling instruction, then it
won't happen. We either need a sched tick or to never have an infinite
iothread ppoll timeout (basically using the ppoll timeout as a tick).

As for being buggy guest code, I don't think so. Here's another
unit test that illustrates the issue taking wfe/sev out.

 #include 
 void secondary(void) {
 printf("secondary running\n");
 asm("yield");

 /* A "real" guest cpu shouldn't do this, but even if it
  * does, that shouldn't stop other cpus from running.
  */
 while(1);
 }
 int main(void) {
 smp_boot_secondary(1, secondary);
 printf("primary running\n");
 asm("yield");
 return 0;
 }

With that test we get the two print statements, but it never exits.

Now that I understand the problem much better, I think I may be
coming full circle and advocating the iothread's ppoll never be
allowed to have an infinite timeout again, but now only for tcg.
Something like

 if (timeout < 0 && tcg_enabled())
timeout = TCG_SCHED_TICK;

Thanks,
drew

1 2 3 >

1 - 100 of 245 matches

Mail list logo