Re: [Qemu-devel] [PATCH] hw/i386: Deprecate the machines pc-0.10 to pc-0.15

2017-05-11 Thread Thomas Huth
On 11.05.2017 17:10, Markus Armbruster wrote:
> "Daniel P. Berrange"  writes:
> 
>> On Wed, May 10, 2017 at 06:15:39PM +0200, Paolo Bonzini wrote:
>>>
>>>
>>> On 10/05/2017 16:47, Thomas Huth wrote:
> So while we can delete pc-0.12, we can't delete associated features needed
> by pc-0.12, without complicating RHEL's ability to create its back-compat
> machine types. Downstream would have to un-delete the features.

 So I guess this is why Paolo said that pc-0.12 is still in "use" ... I
 think removing pc-0.12, but not removing rombar=0 will cause confusion
 in the upstream code base sooner or later,
>>>
>>> I agree.
>>>
 so I guess we should rather
 keep the pc-0.12 machine until we can get rid of it together with the
 rombar code. We should still mark it as deprecated, of course.

> I think tieing removal to major versions is a mistake, unless we're
> going to set a fixed timeframe for delivery of major versions. ie if
> we gaurantee that we'll ship a new major version every 18 months, that
> gives people a predictable lifetime.  If we carry on inventing reasons
> for major versions at arbitrary points in time, it makes it difficult
> to have any reasonable forward planning.  It is more users friendly if
> we can set a clear fixed timeframe for machine type lifecycle / eol

 IMHO we should have a new major release after we've reached a .9 minor
 release, but so far it seems like I'm the only one with that wish...
>>>
>>> I actually like that, but then you've pretty much guaranteed that you
>>> _cannot_ remove anything deprecated until 4.0.  You and Daniel aren't
>>> disagreeing as heavily as it seems, I think.
>>
>> I don't think we should tie removal of features to version numbers. IMHO
>> we should just increment the first major digit on a fixed time scale,
>> either once a year, or whenever we get past .9.

Once a year sounds too often for my personal taste, I really prefer the
.9 way, but that's details...

>> For removal of features, IMHO, the only important thing is to give users
>> deprecation clear warning for 2-3  releases, and ensure feature detection
>> works well. As long as that is done, there shouldn't be any need to batch
>> them up for "major" releases. From libvirt POV, batching up removal to
>> major releases is not beneficial. Batching to major releases gives a very
>> inconsistent timeframe for removal too - somethign fdeprecated in .1
>> release may live on for years,  until the next $major.0, while something
>> deprecated in a .9 release can be killed in 4 months. I much prefer to
>> see a consistent deprecated for 2 releases / 8 months, then deleted
>> regardless of feature.
> 
> I concur.

Fine for me, too. My only additional point is that we *should* have
major releases from time to time (ending up with something like QEMU
2.42 is just ugly), and that we *could* use them for additional
spring-cleaning (not for the libvirt POV, but for the users POV). For
example, we've got some parameters where we warn since QEMU 1.3 that it
is deprecated and might go away soon. We now could use major releases to
remind ourselves from time to time to look through the code for such
deprecated interfaces and remove them with the next major version. (But
removal would of course also be allowed at any other minor version
release if the feature has been deprecated for at least two minor
releases already).

 Thomas




Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-11 Thread Pankaj Gupta

> 
> On Wed, May 10, 2017 at 09:26:00PM +0530, Pankaj Gupta wrote:
> > We are sharing initial project proposal for
> > 'KVM "fake DAX" device flushing' project for feedback.
> > Got the idea during discussion with 'Rik van Riel'.
> 
> CCing NVDIMM folks.
> 
> > 
> > Also, request answers to 'Questions' section.
> > 
> > Abstract :
> > --
> > Project idea is to use fake persistent memory with direct
> > access(DAX) in virtual machines. Overall goal of project
> > is to increase the number of virtual machines that can be
> > run on a physical machine, in order to increase the density
> > of customer virtual machines.
> > 
> > The idea is to avoid the guest page cache, and minimize the
> > memory footprint of virtual machines. By presenting a disk
> > image as a nvdimm direct access (DAX) memory region in a
> > virtual machine, the guest OS can avoid using page cache
> > memory for most file accesses.
> > 
> > Problem Statement :
> > --
> > * Guest uses page cache in memory to process fast requests
> >   for disk read/write. This results in big memory footprint
> >   of guests without host knowing much details of the guest
> >   memory.
> > 
> > * If guests use direct access(DAX) with fake persistent
> >   storage, the host manages the page cache for guests,
> >   allowing the host to easily reclaim/evict less frequently
> >   used page cache pages without requiring guest cooperation,
> >   like ballooning would.
> > 
> > * Host manages guest cache as ‘mmaped’ disk image area in
> >   qemu address space. This region is passed to guest as fake
> >   persistent memory range. We need a new flushing interface
> >   to flush this cache to secondary storage to persist guest
> >   writes.
> > 
> > * New asynchronous flushing interface will allow guests to
> >   cause the host flush the dirty data to backup storage file.
> >   Systems with pmem storage make use of CLFLUSH instruction
> >   to flush single cache line to persistent storage and it
> >   takes care of flushing. With fake persistent storage in
> >   guest we cannot depend on CLFLUSH instruction to flush entire
> >   dirty cache to backing storage. Even If we trap and emulate
> >   CLFLUSH instruction guest vCPU has to wait till we flush all
> >   the dirty memory. Instead of this we need to implement a new
> >   asynchronous guest flushing interface, which allows the guest
> >   to specify a larger range to be flushed at once, and allows
> >   the vCPU to run something else while the data is being synced
> >   to disk.
> > 
> > * New flushing interface will consists of a para virt driver to
> >   new fake nvdimm like device which will process guest flushing
> >   requests like fsync/msync etc instead of pmem library calls
> >   like clflush. The corresponding device at host side will be
> >   responsible for flushing requests for guest dirty pages.
> >   Guest can put current task in sleep and vCPU can run any other
> >   task while host side flushing of guests pages is in progress.
> > 
> > Host controlled fake nvdimm DAX to avoid guest page cache :
> > -
> > * Bypass guest page cache by using a fake persistent storage
> >   like nvdimm & DAX. Guest Read/Write is directly done on
> >   fake persistent storage without involving guest kernel for
> >   caching data.
> > 
> > * Fake nvdimm device passed to guest is backed by a regular
> >   file in host stored in secondary storage.
> > 
> > * Qemu has implementation of fake NVDIMM/DAX device. Use this
> >   capability of passing regular host file(disk) as nvdimm device
> >   to guest.
> > 
> > * Nvdimm with DAX works for ext4/xfs filesystem. Supported
> >   filesystem should be DAX compatible.
> > 
> > * As we are using guest disk as fake DAX/NVDIMM device, we
> >   need a mechanism for persistence of data backed on regular
> >   host storage file.
> > 
> > * For live migration use case, if host side backing file is
> >   shared storage, we need to flush the page cache for the disk
> >   image at the destination (new fadvise interface, FADV_INVALIDATE_CACHE?)
> >   before starting execution of the guest on the destination host.
> 
> Good point.  QEMU currently only supports live migration with O_DIRECT.
> I think the problem was that userspace cannot guarantee consistency in
> the general case.  If you find a solution to this problem for fake
> NVDIMM then maybe the QEMU block layer can also begin supporting live
> migration with buffered I/O.
> 
> > 
> > Design :
> > -
> > * In order to not have page cache inside the guest, qemu would:
> > 
> >  1) mmap the guest's disk image and present that disk image to
> > the guest as a persistent memory range.
> > 
> >  2) Present information to the guest telling it that the persistent
> > memory range is not physical persistent memory.
> 
> Steps 1 & 2 are already supported by QEMU NVDIMM emulation today.

Yes. I have also tested guest 'fake DAX' device using QEMU NVD

Re: [Qemu-devel] [PATCH v2] target-ppc: Enable open-pic timers to count and generate interrupts

2017-05-11 Thread David Gibson
On Tue, May 02, 2017 at 07:57:22PM -0700, Aaron Larson wrote:
> 
> Previous QEMU open-pic implemented the 4 open-pic timers including all
> timer registers, but the timers did not "count" or generate any
> interrupts.  The patch makes the timers both count and generate
> interrupts.  The timer clock frequency is fixed at 100MHZ.
> 
> Signed-off-by: Aaron Larson 

Looks sound in concept AFAICT not knowing the openpic hardware.

> ---
>  hw/intc/openpic.c | 135 
> ++
>  1 file changed, 117 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/intc/openpic.c b/hw/intc/openpic.c
> index 4349e45..e0556f1 100644
> --- a/hw/intc/openpic.c
> +++ b/hw/intc/openpic.c
> @@ -45,6 +45,7 @@
>  #include "qemu/bitops.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qemu/log.h"
> +#include "qemu/timer.h"
>  
>  //#define DEBUG_OPENPIC
>  
> @@ -54,8 +55,10 @@ static const int debug_openpic = 1;
>  static const int debug_openpic = 0;
>  #endif
>  
> +static int get_current_cpu(void);
>  #define DPRINTF(fmt, ...) do { \
>  if (debug_openpic) { \
> +printf("Core%d: ", get_current_cpu()); \
>  printf(fmt , ## __VA_ARGS__); \
>  } \
>  } while (0)
> @@ -246,9 +249,25 @@ typedef struct IRQSource {
>  #define IDR_EP  0x8000  /* external pin */
>  #define IDR_CI  0x4000  /* critical interrupt */
>  
> +/* Conversion between openpic clock ticks and nanosecs.  Ideally this clock
> +   frequency would follow the openpic spec, for now hard code to 100mz.
> +   A 100mhz clock, divided by 8, or 25mhz
> +   25,000,000 ticks/sec, 25,000/ms, 25/us, 1 tick/40ns
> +*/
> +#define CONV_FACTOR 40LL
> +static inline uint64_t ns_to_ticks(uint64_t ns)   { return ns   / 
> CONV_FACTOR; }
> +static inline uint64_t ticks_to_ns(uint64_t tick) { return tick * 
> CONV_FACTOR; }

This is a little hard to follow.  Where does the divide by 8 come
from?  Also 100MHz / 8 is 12.5 MHz, not 25MHz..

I'd prefer logic that comes from an explicit clock frequency
value, even if that's a constant 1 for now.

>  typedef struct OpenPICTimer {
>  uint32_t tccr;  /* Global timer current count register */
>  uint32_t tbcr;  /* Global timer base count register */
> +int   n_IRQ;
> +bool  qemu_timer_active; /* Is the qemu_timer is 
> running? */
> +struct QEMUTimer *qemu_timer;   /* May be NULL if not created. */
> +struct OpenPICState  *opp;  /* Device timer is part of. */
> +/* The QEMU_CLOCK_VIRTUAL time (in ns) corresponding to the last
> +   current_count written or read, only defined if qemu_timer_active. */
> +uint64_t  originTime;

qemu doesn't generally use camelCase for structure fields.  I'd
consider an exception if the name 'originTime' appears exactly like
that in the documentation, otherwise not.

>  } OpenPICTimer;
>  
>  typedef struct OpenPICMSI {
> @@ -795,37 +814,102 @@ static uint64_t openpic_gbl_read(void *opaque, hwaddr 
> addr, unsigned len)
>  return retval;
>  }
>  
> +static void openpic_tmr_set_tmr(OpenPICTimer *tmr, uint32_t val, bool 
> enabled);
> +
> +static void qemu_timer_cb(void *opaque)
> +{
> +OpenPICTimer *tmr = opaque;
> +OpenPICState *opp = tmr->opp;
> +uint32_tn_IRQ = tmr->n_IRQ;
> +uint32_t val =   tmr->tbcr & ~TBCR_CI;
> +uint32_t tog = ((tmr->tccr & TCCR_TOG) ^ TCCR_TOG);  /* invert toggle. */
> +
> +DPRINTF("%s n_IRQ=%d\n", __func__, n_IRQ);
> +/* Reload current count from base count and setup timer. */
> +tmr->tccr = val | tog;
> +openpic_tmr_set_tmr(tmr, val, /*enabled=*/true);
> +/* Raise the interrupt. */
> +opp->src[n_IRQ].destmask = read_IRQreg_idr(opp, n_IRQ);
> +openpic_set_irq(opp, n_IRQ, 1);
> +openpic_set_irq(opp, n_IRQ, 0);
> +}
> +
> +/* If enabled is true, arranges for an interrupt to be raised val clocks into
> +   the future, if enabled is false cancels the timer. */
> +static void openpic_tmr_set_tmr(OpenPICTimer *tmr, uint32_t val, bool 
> enabled)
> +{
> +/* If timer doesn't exist, create it. */
> +if (tmr->qemu_timer == NULL) {
> +tmr->qemu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &qemu_timer_cb, 
> tmr);
> +DPRINTF("Created timer for n_IRQ %d\n", tmr->n_IRQ);

Is there a reason to lazily create the timer, rather than always
creating it at init time and just activating it when the timer is set?

> +}
> +uint64_t ns = ticks_to_ns(val & ~TCCR_TOG);
> +/* A count of zero causes a timer to be set to expire immediately.  This
> +   effectively stops the simulation so we don't honor that configuration.
> +   On real hardware, this would generate an interrupt on every clock 
> cycle
> +   if the interrupt was unmasked. */

Could you also jam up if the count is non-zero but a too-small value
to make forward progress?  It's probably worth doing an error_report()
in this case too, so the user has some idea wh

Re: [Qemu-devel] [PATCH v9 4/6] hw/ppc/spapr.c: migrate pending_dimm_unplugs of spapr state

2017-05-11 Thread David Gibson
On Fri, May 05, 2017 at 05:47:44PM -0300, Daniel Henrique Barboza wrote:
> To allow for a DIMM unplug event to resume its work if a migration
> occurs in the middle of it, this patch migrates the non-empty
> pending_dimm_unplugs QTAILQ that stores the DIMM information
> that the spapr_lmb_release() callback uses.
> 
> It was considered an apprach where the DIMM states would be restored
> on the post-_load after a migration. The problem is that there is
> no way of knowing, from the sPAPRMachineState, if a given DIMM is going
> through an unplug process and the callback needs the updated DIMM State.
> 
> We could migrate a flag indicating that there is an unplug event going
> on for a certain DIMM, fetching this information from the start of the
> spapr_del_lmbs call. But this would also require a scan on post_load to
> figure out how many nr_lmbs are left. At this point we can just
> migrate the nr_lmbs information as well, given that it is being calculated
> at spapr_del_lmbs already, and spare a scanning/discovery in the
> post-load. All that we need is inside the sPAPRDIMMState structure
> that is added to the pending_dimm_unplugs queue at the start of the
> spapr_del_lmbs, so it's convenient to just migrated this queue it if it's
> not empty.
> 
> Signed-off-by: Daniel Henrique Barboza 

NACK.

As I believe I suggested previously, you can reconstruct this state on
the receiving side by doing a full scan of the DIMM and LMB DRC states.

> ---
>  hw/ppc/spapr.c | 31 +++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e190eb9..30f0b7b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1437,6 +1437,36 @@ static bool version_before_3(void *opaque, int 
> version_id)
>  return version_id < 3;
>  }
>  
> +static bool spapr_pending_dimm_unplugs_needed(void *opaque)
> +{
> +sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> +return !QTAILQ_EMPTY(&spapr->pending_dimm_unplugs);
> +}
> +
> +static const VMStateDescription vmstate_spapr_dimmstate = {
> +.name = "spapr_dimm_state",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT64(addr, sPAPRDIMMState),
> +VMSTATE_UINT32(nr_lmbs, sPAPRDIMMState),
> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
> +static const VMStateDescription vmstate_spapr_pending_dimm_unplugs = {
> +.name = "spapr_pending_dimm_unplugs",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = spapr_pending_dimm_unplugs_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_QTAILQ_V(pending_dimm_unplugs, sPAPRMachineState, 1,
> + vmstate_spapr_dimmstate, sPAPRDIMMState,
> + next),
> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
>  static bool spapr_ov5_cas_needed(void *opaque)
>  {
>  sPAPRMachineState *spapr = opaque;
> @@ -1535,6 +1565,7 @@ static const VMStateDescription vmstate_spapr = {
>  .subsections = (const VMStateDescription*[]) {
>  &vmstate_spapr_ov5_cas,
>  &vmstate_spapr_patb_entry,
> +&vmstate_spapr_pending_dimm_unplugs,
>  NULL
>  }
>  };

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v9 6/6] migration: spapr: migrate pending_events of spapr state

2017-05-11 Thread David Gibson
On Fri, May 05, 2017 at 05:47:46PM -0300, Daniel Henrique Barboza wrote:
> From: Jianjun Duan 
> 
> In racing situations between hotplug events and migration operation,
> a rtas hotplug event could have not yet be delivered to the source
> guest when migration is started. In this case the pending_events of
> spapr state need be transmitted to the target so that the hotplug
> event can be finished on the target.
> 
> All the different fields of the events are encoded as defined by
> PAPR. We can migrate them as uint8_t binary stream without any
> concerns about data padding or endianess.
> 
> pending_events is put in a subsection in the spapr state VMSD to make
> sure migration across different versions is not broken.
> 
> Signed-off-by: Jianjun Duan 
> Signed-off-by: Daniel Henrique Barboza 

This seems like it's probably a good idea, even independent of the
hotplug migration stuff.  I suspect there are other races where we
could lose a shutdown event or similar if there's a migration.

> ---
>  hw/ppc/spapr.c | 33 +
>  hw/ppc/spapr_events.c  | 24 +---
>  include/hw/ppc/spapr.h |  3 ++-
>  3 files changed, 48 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index bc56249..e924fd4 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1498,6 +1498,38 @@ static const VMStateDescription vmstate_spapr_ccs_list 
> = {
>  },
>  };
>  
> +static bool spapr_pending_events_needed(void *opaque)
> +{
> +sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> +return !QTAILQ_EMPTY(&spapr->pending_events);
> +}
> +
> +static const VMStateDescription vmstate_spapr_event_entry = {
> +.name = "spapr_event_log_entry",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_INT32(log_type, sPAPREventLogEntry),

This requires changing the actual type to int32_t in the structure.

> +VMSTATE_BOOL(exception, sPAPREventLogEntry),

So, at the moment, AFAICT every event is marked as exception == true,
so this doesn't actually tell us anything.   If that becomes not the
case in future, can the exception flag be derived from the log_type or
information in the even buffer?

> +VMSTATE_UINT32(data_size, sPAPREventLogEntry),
> +VMSTATE_VARRAY_UINT32_ALLOC(data, sPAPREventLogEntry, data_size,
> +0, vmstate_info_uint8, uint8_t),

So, data_size duplicates information that's in the event header, which
is a bit sad.  I suppose I'm ok with that, since setting up the VARRAY
thing is going to be pretty awkward otherwise.

> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
> +static const VMStateDescription vmstate_spapr_pending_events = {
> +.name = "spapr_pending_events",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = spapr_pending_events_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_QTAILQ_V(pending_events, sPAPRMachineState, 1,
> + vmstate_spapr_event_entry, sPAPREventLogEntry, 
> next),
> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
>  static bool spapr_ov5_cas_needed(void *opaque)
>  {
>  sPAPRMachineState *spapr = opaque;
> @@ -1598,6 +1630,7 @@ static const VMStateDescription vmstate_spapr = {
>  &vmstate_spapr_patb_entry,
>  &vmstate_spapr_pending_dimm_unplugs,
>  &vmstate_spapr_ccs_list,
> +&vmstate_spapr_pending_events,
>  NULL
>  }
>  };
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index f0b28d8..70c7cfc 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -342,7 +342,8 @@ static int rtas_event_log_to_irq(sPAPRMachineState 
> *spapr, int log_type)
>  return source->irq;
>  }
>  
> -static void rtas_event_log_queue(int log_type, void *data, bool exception)
> +static void rtas_event_log_queue(int log_type, void *data, bool exception,
> + int data_size)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  sPAPREventLogEntry *entry = g_new(sPAPREventLogEntry, 1);
> @@ -351,6 +352,7 @@ static void rtas_event_log_queue(int log_type, void 
> *data, bool exception)
>  entry->log_type = log_type;
>  entry->exception = exception;
>  entry->data = data;
> +entry->data_size = data_size;

I think it would make more sense to derive data_size from the buffer
header contents here, rather than in all the callers.

>  QTAILQ_INSERT_TAIL(&spapr->pending_events, entry, next);
>  }
>  
> @@ -445,6 +447,7 @@ static void spapr_powerdown_req(Notifier *n, void *opaque)
>  struct rtas_event_log_v6_mainb *mainb;
>  struct rtas_event_log_v6_epow *epow;
>  struct epow_log_full *new_epow;
> +uint32_t data_size;
>  
>  new_epow = g_malloc0(sizeof(*new_epow));
>  hdr = &new_epow->hdr;
> @@ -453,14 +456,13 @@ static void spapr_powerdown_req(Notifier *n, void 
> *opaque)

Re: [Qemu-devel] [PATCH v9 2/6] hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque

2017-05-11 Thread David Gibson
On Fri, May 05, 2017 at 05:47:42PM -0300, Daniel Henrique Barboza wrote:
> The pointer drc->detach_cb is being used as a way of informing
> the detach() function inside spapr_drc.c which cb to execute. This
> information can also be retrieved simply by checking drc->type and
> choosing the right callback based on it. In this context, detach_cb
> is redundant information that must be managed.
> 
> After the previous spapr_lmb_release change, no detach_cb_opaques
> are being used by any of the three callbacks functions. This is
> yet another information that is now unused and, on top of that, can't
> be migrated either.
> 
> This patch makes the following changes:
> 
> - removal of detach_cb_opaque. the 'opaque' argument was removed from
> the callbacks and from the detach() function of sPAPRConnectorClass. The
> attribute detach_cb_opaque of sPAPRConnector was removed.
> 
> - removal of detach_cb from the detach() call. The function pointer
> detach_cb of sPAPRConnector was removed. detach() now uses a
> switch(drc->type) to execute the apropriate callback. To achieve this,
> spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb
> callbacks were made public to be visible inside detach().
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c  | 10 ++
>  hw/ppc/spapr_drc.c  | 36 
>  hw/ppc/spapr_pci.c  |  5 +++--
>  include/hw/pci-host/spapr.h |  3 +++
>  include/hw/ppc/spapr.h  |  4 
>  include/hw/ppc/spapr_drc.h  |  8 +---
>  6 files changed, 37 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 346c827..e190eb9 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2610,7 +2610,8 @@ static uint64_t spapr_dimm_get_address(PCDIMMDevice 
> *dimm)
>  return addr;
>  }
>  
> -static void spapr_lmb_release(DeviceState *dev, void *opaque)
> +/* Callback to be called during DRC release. */
> +void spapr_lmb_release(DeviceState *dev)
>  {
>  HotplugHandler *hotplug_ctrl;
>  
> @@ -2652,7 +2653,7 @@ static void spapr_del_lmbs(DeviceState *dev, uint64_t 
> addr_start, uint64_t size,
>  g_assert(drc);
>  
>  drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -drck->detach(drc, dev, spapr_lmb_release, NULL, errp);
> +drck->detach(drc, dev, errp);
>  addr += SPAPR_MEMORY_BLOCK_SIZE;
>  }
>  
> @@ -2728,7 +2729,8 @@ static void spapr_core_unplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  object_unparent(OBJECT(dev));
>  }
>  
> -static void spapr_core_release(DeviceState *dev, void *opaque)
> +/* Callback to be called during DRC release. */
> +void spapr_core_release(DeviceState *dev)
>  {
>  HotplugHandler *hotplug_ctrl;
>  
> @@ -2761,7 +2763,7 @@ void spapr_core_unplug_request(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  g_assert(drc);
>  
>  drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -drck->detach(drc, dev, spapr_core_release, NULL, &local_err);
> +drck->detach(drc, dev, &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  return;
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a1cdc87..1c72160 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -20,6 +20,7 @@
>  #include "qapi/visitor.h"
>  #include "qemu/error-report.h"
>  #include "hw/ppc/spapr.h" /* for RTAS return codes */
> +#include "hw/pci-host/spapr.h" /* spapr_phb_remove_pci_device_cb callback */
>  #include "trace.h"
>  
>  #define DRC_CONTAINER_PATH "/dr-connector"
> @@ -99,8 +100,7 @@ static uint32_t set_isolation_state(sPAPRDRConnector *drc,
>  if (drc->awaiting_release) {
>  if (drc->configured) {
>  
> trace_spapr_drc_set_isolation_state_finalizing(get_index(drc));
> -drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
> - drc->detach_cb_opaque, NULL);
> +drck->detach(drc, DEVICE(drc->dev), NULL);
>  } else {
>  
> trace_spapr_drc_set_isolation_state_deferring(get_index(drc));
>  }
> @@ -153,8 +153,7 @@ static uint32_t set_allocation_state(sPAPRDRConnector 
> *drc,
>  if (drc->awaiting_release &&
>  drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) {
>  trace_spapr_drc_set_allocation_state_finalizing(get_index(drc));
> -drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
> - drc->detach_cb_opaque, NULL);
> +drck->detach(drc, DEVICE(drc->dev), NULL);
>  } else if (drc->allocation_state == 
> SPAPR_DR_ALLOCATION_STATE_USABLE) {
>  drc->awaiting_allocation = false;
>  }
> @@ -404,15 +403,10 @@ static void attach(sPAPRDRConnector *drc, DeviceState 
> *d, void *fdt,
>   NULL, 0, NULL);
>  }
>  
> -static void detach(sPAPRDRConnector *drc, DeviceState *d,
> -  

Re: [Qemu-devel] [PATCH v9 1/6] hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState

2017-05-11 Thread David Gibson
On Fri, May 05, 2017 at 05:47:41PM -0300, Daniel Henrique Barboza wrote:
> The LMB DRC release callback, spapr_lmb_release(), uses an opaque
> parameter, a sPAPRDIMMState struct that stores the current LMBs that
> are allocated to a DIMM (nr_lmbs). After each call to this callback,
> the nr_lmbs is decremented by one and, when it reaches zero, the callback
> proceeds with the qdev calls to hot unplug the LMB.
> 
> Using drc->detach_cb_opaque is problematic because it can't be migrated in
> the future DRC migration work. This patch makes the following changes to
> eliminate the usage of this opaque callback inside spapr_lmb_release:
> 
> - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new
> attribute called 'addr' was added to it. This is used as an unique
> identifier to associate a sPAPRDIMMState to a PCDIMM element.
> 
> - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'.
> This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs
> that are currently going under an unplug process.
> 
> - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the
> correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address
> was created to fetch the address of a PCDIMM device inside spapr_lmb_release.
> When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug
> calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs.
> 
> After these changes, the opaque argument for spapr_lmb_release is now
> unused and is passed as NULL inside spapr_del_lmbs. This and the other
> opaque arguments can now be safely removed from the code.
> 
> Signed-off-by: Daniel Henrique Barboza 

Urgh.  Moving this into the machine is really ugly.  Unfortunately, I
can't quickly see a better way to accomplish what you need.  So I
guess this approach is ok, with the hope that we can find a better way
in future.

There are a few more superficial problems to address, though.

> ---
>  hw/ppc/spapr.c | 54 
> --
>  include/hw/ppc/spapr.h | 17 
>  2 files changed, 65 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 80d12d0..346c827 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2043,6 +2043,7 @@ static void ppc_spapr_init(MachineState *machine)
>  msi_nonbroken = true;
>  
>  QLIST_INIT(&spapr->phbs);
> +QTAILQ_INIT(&spapr->pending_dimm_unplugs);
>  
>  /* Allocate RMA if necessary */
>  rma_alloc_size = kvmppc_alloc_rma(&rma);
> @@ -2596,20 +2597,32 @@ out:
>  error_propagate(errp, local_err);
>  }
>  
> -typedef struct sPAPRDIMMState {
> -uint32_t nr_lmbs;
> -} sPAPRDIMMState;
> +static uint64_t spapr_dimm_get_address(PCDIMMDevice *dimm)
> +{
> +Error *local_err = NULL;
> +uint64_t addr;
> +addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP,
> +   &local_err);
> +if (local_err) {
> +error_propagate(&error_abort, local_err);
> +return 0;
> +}
> +return addr;
> +}
>  
>  static void spapr_lmb_release(DeviceState *dev, void *opaque)
>  {
> -sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
>  HotplugHandler *hotplug_ctrl;
>  

No need for this blank line in the middle of declarations.

> +uint64_t addr = spapr_dimm_get_address(PC_DIMM(dev));
> +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, addr);
> +
>  if (--ds->nr_lmbs) {
>  return;
>  }
>  
> -g_free(ds);
> +spapr_pending_dimm_unplugs_remove(spapr, ds);
>  
>  /*
>   * Now that all the LMBs have been removed by the guest, call the
> @@ -2626,17 +2639,20 @@ static void spapr_del_lmbs(DeviceState *dev, uint64_t 
> addr_start, uint64_t size,
>  sPAPRDRConnectorClass *drck;
>  uint32_t nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
>  int i;
> +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  sPAPRDIMMState *ds = g_malloc0(sizeof(sPAPRDIMMState));
>  uint64_t addr = addr_start;
>  
>  ds->nr_lmbs = nr_lmbs;
> +ds->addr = addr_start;
> +spapr_pending_dimm_unplugs_add(spapr, ds);
>  for (i = 0; i < nr_lmbs; i++) {
>  drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
>  addr / SPAPR_MEMORY_BLOCK_SIZE);
>  g_assert(drc);
>  
>  drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -drck->detach(drc, dev, spapr_lmb_release, ds, errp);
> +drck->detach(drc, dev, spapr_lmb_release, NULL, errp);
>  addr += SPAPR_MEMORY_BLOCK_SIZE;
>  }
>  
> @@ -3515,3 +3531,29 @@ static void spapr_machine_register_types(void)
>  }
>  
>  type_init(spapr_machine_register_types)
> +
> +sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *spapr,
> +uint64_t addr)
> +{
> +sPAPR

Re: [Qemu-devel] [PATCH v9 3/6] hw/ppc: migrating the DRC state of hotplugged devices

2017-05-11 Thread David Gibson
On Fri, May 05, 2017 at 05:47:43PM -0300, Daniel Henrique Barboza wrote:
> In pseries, a firmware abstraction called Dynamic Reconfiguration
> Connector (DRC) is used to assign a particular dynamic resource
> to the guest and provide an interface to manage configuration/removal
> of the resource associated with it. In other words, DRC is the
> 'plugged state' of a device.
> 
> Before this patch, DRC wasn't being migrated. This causes
> post-migration problems due to DRC state mismatch between source and
> target. The DRC state of a device X in the source might
> change, while in the target the DRC state of X is still fresh. When
> migrating the guest, X will not have the same hotplugged state as it
> did in the source. This means that we can't hot unplug X in the
> target after migration is completed because its DRC state is not consistent.
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1677552 is one
> bug that is caused by this DRC state mismatch between source and
> target.
> 
> To migrate the DRC state, we defined the VMStateDescription struct for
> spapr_drc to enable the transmission of spapr_drc state in migration.
> Not all the elements in the DRC state are migrated - only those
> that can be modified by guest actions or device add/remove
> operations:
> 
> - 'isolation_state', 'allocation_state' and 'indicator_state'
> are involved in the DR state transition diagram from
> PAPR+ 2.7, 13.4;
> 
> - 'configured', 'signalled', 'awaiting_release' and 'awaiting_allocation'
> are needed in attaching and detaching devices;
> 
> - 'indicator_state' provides users with hardware state information.
> 
> These are the DRC elements that are migrated.
> 
> In this patch the DRC state is migrated for PCI, LMB and CPU
> connector types. At this moment there is no support to migrate
> DRC for the PHB (PCI Host Bridge) type.
> 
> In the 'realize' function the DRC is registered using vmstate_register,
> similar to what hw/ppc/spapr_iommu.c does in 'spapr_tce_table_realize'.
> This approach works because  DRCs are bus-less and do not sit
> on a BusClass that implements bc->get_dev_path, so as a fallback the
> VMSD gets identified via "spapr_drc"/get_index(drc).
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr_drc.c | 61 
> ++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 1c72160..926b945 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -519,6 +519,65 @@ static void reset(DeviceState *d)
>  }
>  }
>  
> +static bool spapr_drc_needed(void *opaque)
> +{
> +sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> +sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +bool rc = false;
> +sPAPRDREntitySense value;

Blank line after the declarations, please.

> +drck->entity_sense(drc, &value);
> +/* If no dev is plugged in there is no need to migrate the DRC state */
> +if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> +return false;
> +}
> +
> +/*
> + * If there is dev plugged in, we need to migrate the DRC state when
> + * it is different from cold-plugged state
> + */
> +switch (drc->type) {
> +

No blank line here please.

> +case SPAPR_DR_CONNECTOR_TYPE_PCI:
> +rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_UNISOLATED) 
> &&
> +   (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> +   drc->configured && drc->signalled && !drc->awaiting_release);

You don't do any more manipulation of the rc value, so you might as
well just 'return' directly here.


> +break;
> +
> +case SPAPR_DR_CONNECTOR_TYPE_LMB:
> +rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> +   (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) 
> &&
> +   drc->configured && drc->signalled && !drc->awaiting_release);
> +break;
> +
> +case SPAPR_DR_CONNECTOR_TYPE_CPU:
> +rc = !((drc->isolation_state == SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> +   (drc->allocation_state == SPAPR_DR_ALLOCATION_STATE_UNUSABLE) 
> &&
> +drc->configured && drc->signalled && !drc->awaiting_release);
> +break;
> +
> +default:
> +;

This should probably assert().

> +}
> +return rc;
> +}
> +
> +static const VMStateDescription vmstate_spapr_drc = {
> +.name = "spapr_drc",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = spapr_drc_needed,
> +.fields  = (VMStateField []) {
> +VMSTATE_UINT32(isolation_state, sPAPRDRConnector),
> +VMSTATE_UINT32(allocation_state, sPAPRDRConnector),
> +VMSTATE_UINT32(indicator_state, sPAPRDRConnector),
> +VMSTATE_BOOL(configured, sPAPRDRConnector),
> +VMSTATE_BOOL(awaiting_release, sPAPRDRConnector),
> +VMSTATE_BOOL(awaiting_allocation, sPAPRDRConnecto

Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-11 Thread Wanpeng Li
2017-05-11 22:18 GMT+08:00 Zhoujian (jay) :
> Hi Wanpeng,
>
>> 2017-05-11 21:43 GMT+08:00 Wanpeng Li :
>> > 2017-05-11 20:24 GMT+08:00 Paolo Bonzini :
>> >>
>> >>
>> >> On 11/05/2017 14:07, Zhoujian (jay) wrote:
>> >>> -* Scan sptes if dirty logging has been stopped, dropping
>> those
>> >>> -* which can be collapsed into a single large-page spte.
>> Later
>> >>> -* page faults will create the large-page sptes.
>> >>> +* Reset each vcpu's mmu, then page faults will create the
>> large-page
>> >>> +* sptes later.
>> >>>  */
>> >>> if ((change != KVM_MR_DELETE) &&
>> >>> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>> >>> -   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> >>> -   kvm_mmu_zap_collapsible_sptes(kvm, new);
>> >
>> > This is an unlikely branch(unless guest live migration fails and
>> > continue to run on the source machine) instead of hot path, do you
>> > have any performance number for your real workloads?
>>
>> I find the original discussion by google.
>> https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg04143.html
>> You will not go to this branch if the guest live migration successfully.
>
>  In our tests, this branch is taken when living migration is successful.
>  AFAIK, the kmod does not know whether living migration successful or not
>  when dealing with KVM_SET_USER_MEMORY_REGION ioctl. Do I miss something?

Original there is a bug which will not clear memslot dirty log flag
after live migration fails, a patch is submitted to fix it,
https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg00794.html,
however, I can't remember whether the dirty log flag will be cleared
if live migration complete successfully at that time, but maybe not.
Paolo replied to the patch he has a better method. Then I'm too busy
and didn't follow the qemu patch for this fix any more, I just find
this commit is merged currently:
http://git.qemu.org/?p=qemu.git;a=commit;h=6f6a5ef3e429f92f987678ea8c396aab4dc6aa19.
This commit will clear memslot dirty log flag after live migration no
matter whether it is successful or not.

Regards,
Wanpeng Li



Re: [Qemu-devel] [PATCH 04/17] qapi: merge QInt and QFloat in QNum

2017-05-11 Thread Markus Armbruster
Question for Luiz...

Marc-André Lureau  writes:

[...]
> diff --git a/tests/check-qnum.c b/tests/check-qnum.c
> new file mode 100644
> index 00..d08d35e85a
> --- /dev/null
> +++ b/tests/check-qnum.c
> @@ -0,0 +1,131 @@
> +/*
> + * QNum unit-tests.
> + *
> + * Copyright (C) 2009 Red Hat Inc.
> + *
> + * Authors:
> + *  Luiz Capitulino 
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
> later.
> + * See the COPYING.LIB file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +
> +#include "qapi/qmp/qnum.h"
> +#include "qapi/error.h"
> +#include "qemu-common.h"
> +
> +/*
> + * Public Interface test-cases
> + *
> + * (with some violations to access 'private' data)
> + */
> +
> +static void qnum_from_int_test(void)
> +{
> +QNum *qi;
> +const int value = -42;
> +
> +qi = qnum_from_int(value);
> +g_assert(qi != NULL);
> +g_assert_cmpint(qi->u.i64, ==, value);
> +g_assert_cmpint(qi->base.refcnt, ==, 1);
> +g_assert_cmpint(qobject_type(QOBJECT(qi)), ==, QTYPE_QNUM);
> +
> +// destroy doesn't exit yet
> +g_free(qi);
> +}

The comment is enigmatic.  It was first written in commit 33837ba
"Introduce QInt unit-tests", and got copied around since.  In
check-qlist.c, it's spelled "exist yet".

What is "destroy", why doesn't it exit / exist now, but will exit /
exist later?  It can't be qnum_destroy_obj(), because that certainly
exists already, exits already in the sense of returning, and shouldn't
ever exit in the sense of terminating the program.

The comment applies to a g_free().  Why do we free directly instead
decrementing the reference count?  Perhaps the comment tries to explain
that (if it does, it fails).

Luiz, any idea?

[...]



Re: [Qemu-devel] [Qemu-block] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-05-11 Thread Hailiang Zhang

On 2017/5/12 3:08, Stefan Hajnoczi wrote:

On Wed, Apr 12, 2017 at 10:05:17PM +0800, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
  block/replication.c  | 43 +--
  qapi/block-core.json | 10 +-
  2 files changed, 50 insertions(+), 3 deletions(-)

Aside from the ongoing discussion about this patch...

Reviewed-by: Stefan Hajnoczi 


Thanks,  I'll fix the related problems found by changlong.




Re: [Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-05-11 Thread Hailiang Zhang

On 2017/4/18 13:59, Xie Changlong wrote:


On 04/12/2017 10:05 PM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
   block/replication.c  | 43 +--
   qapi/block-core.json | 10 +-
   2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index bf3c395..418b81b 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
   typedef struct BDRVReplicationState {
   ReplicationMode mode;
   int replication_state;
+bool is_shared_disk;
+char *shared_disk_id;
   BdrvChild *active_disk;
   BdrvChild *hidden_disk;
   BdrvChild *secondary_disk;
+BdrvChild *primary_disk;
   char *top_id;
   ReplicationState *rs;
   Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool 
failover,
   
   #define REPLICATION_MODE"mode"

   #define REPLICATION_TOP_ID  "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
   static QemuOptsList replication_runtime_opts = {
   .name = "replication",
   .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
   .name = REPLICATION_TOP_ID,
   .type = QEMU_OPT_STRING,
   },
+{
+.name = REPLICATION_SHARED_DISK_ID,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_SHARED_DISK,
+.type = QEMU_OPT_BOOL,
+},
   { /* end of list */ }
   },
   };
@@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
   QemuOpts *opts = NULL;
   const char *mode;
   const char *top_id;
+const char *shared_disk_id;
+BlockBackend *blk;
+BlockDriverState *tmp_bs;
   
   bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,

  false, errp);
@@ -125,12 +142,33 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
  "The option mode's value should be primary or secondary");
   goto fail;
   }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+

What If secondary side is supplied with 'REPLICATION_SHARED_DISK_ID'?
Pls refer f4f2539bc to pefect the logical.


Hmm, we should not configure it for secondary side, i'll fix it in next version.



  false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(&local_err, "Missing shared disk blk option");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(&local_err, "There is no %s block", s->shared_disk_id);
+goto fail;
+}
+/* We have a BlockBackend for the primary disk but use BdrvChild for
+ * consistency - active_disk, secondary_disk, etc are also BdrvChild.
+ */
+tmp_bs = blk_bs(blk);
+s->primary_disk = QLIST_FIRST(&tmp_bs->parents);
+}
   
   s->rs = replication_new(bs, &replication_ops);
   
-ret = 0;

-
+qemu_opts_del(opts);
+return 0;
   fail:
+g_free(s->shared_disk_id);
   qemu_opts_del(opts);
   error_propagate(errp, local_err);
   
@@ -141,6 +179,7 @@ static void replication_close(BlockDriverState *bs)

   {
   BDRVReplicationState *s = bs->opaque;
   
+g_free(s->shared_disk_id);

   if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
   replication_stop(s->rs, false, NULL);
   }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 033457c..361c932 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2661,12 +2661,20 @@
   #  node who owns the replication node chain. Must not be given in
   #  primary mode.
   #
+# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk
+#  is true, this option is required (Since: 2.10)
+#

Further explanations:

For @shared-disk-id, it must/only be given when @shared-disk enable on
Primary side.


OK.

+# @shared-disk: To indicate whether or not a disk is shared by primary VM
+#   and secondary VM. (The default is false) (Since: 2.10)
+#

Further explanations:

For @shared-disk, it must be given or not-given on both side at the same
time.


OK, will fix it, thanks.


   # Since: 2.9
  

Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()

2017-05-11 Thread SF Markus Elfring via Qemu-devel
>  create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci

Will an other subdirectory be more appropriate for this SmPL script?


> +// Coccinelle helpful issue:
> +// https://github.com/coccinelle/coccinelle/issues/86

I am curious if such an information source will trigger further
software evolution.
How do you think about to mention also the corresponding topic
“Propagating values back from Python script to SmPL rule with other metavariable
type than “identifier”” just for the case that the issue number can be fragile?


> +@match@ // match shri*+andi* pattern, calls script verify_len
> +identifier ret, arg;
> +constant ofs, len;
> +identifier shr_fn =~ "^tcg_gen_shri_";
> +identifier and_fn =~ "^tcg_gen_andi_";
> +position shr_p;
> +position and_p;
> +@@
> +(
> +shr_fn@shr_p(ret, arg, ofs);
> +and_fn@and_p(ret, ret, len);
> +)

My software development attention was caught also a bit by this specification.
How much do you care for coding style there?

* Two repeated SmPL key words while using the variable list functionality 
before.

* I wonder about the relevance for the parentheses.
  Did you try to express a disjunction for the semantic patch language
  besides the usage of two function (or macro) calls?


> +print "  candidate", "IS" if is_optimizable else "is NOT", 
> "optimizable"

Would you like to move this information display into a separate function?

Do you care if the “print” is the usage of a function call or a statement?
https://docs.python.org/3.0/whatsnew/3.0.html#print-is-a-function


> +-shr_fn@shr_p(ret, arg, ofs);
> +-and_fn@and_p(ret, ret, len);
> ++extract_fn(ret, arg, ofs, len);

Are there any more cases to consider for the sown function call replacement?

Regards,
Markus



Re: [Qemu-devel] [PATCHv6 0/5] HPT resizing for pseries guests (qemu part)

2017-05-11 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20170512050451.9979-1-da...@gibson.dropbear.id.au
Subject: [Qemu-devel] [PATCHv6 0/5] HPT resizing for pseries guests (qemu part)
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
bc86869 pseries: Allow HPT resizing with KVM
1b07d16 pseries: Use smaller default hash page tables when guest can resize
5b59386 pseries: Enable HPT resizing for 2.10
61cecbb pseries: Implement HPT resizing
fe4fe32 pseries: Stubs for HPT resizing

=== OUTPUT BEGIN ===
Checking PATCH 1/5: pseries: Stubs for HPT resizing...
WARNING: line over 80 characters
#278: FILE: target/ppc/kvm.c:2719:
+error_setg(errp, "Hash page table resizing not available with this KVM 
version");

total: 0 errors, 1 warnings, 239 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 2/5: pseries: Implement HPT resizing...
ERROR: spaces required around that '/' (ctx:VxV)
#237: FILE: hw/ppc/spapr_hcall.c:516:
+stq_p(addr + HASH_PTE_SIZE_64/2, pte1);
  ^

ERROR: spaces required around that '*' (ctx:WxV)
#339: FILE: hw/ppc/spapr_hcall.c:618:
+hwaddr ptex = pteg *HPTES_PER_GROUP;
^

total: 2 errors, 0 warnings, 401 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/5: pseries: Enable HPT resizing for 2.10...
Checking PATCH 4/5: pseries: Use smaller default hash page tables when guest 
can resize...
Checking PATCH 5/5: pseries: Allow HPT resizing with KVM...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()

2017-05-11 Thread Philippe Mathieu-Daudé

Hi Julia,

Sorry I planed to send you another mail but sent this mail to QEMU list 
first.



I don't think I have seen earlier versions of this script.  Are you
proposing it to be added to the kernel?  If so, it should be put in an
appropriate subdirectory of Coccinelle.


This script is specific to QEMU codebase and wont benefit the Linux kernel.


Overall, could you explain at a high level what it is intended to do?  It
uses rather heavily regular expressions and python code, so I wonder if
this is the best way to do it.


In this patch 
http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html 
Aurelien does:


-tcg_gen_shri_i32(cpu_sr_q, src, SR_Q);
-tcg_gen_andi_i32(cpu_sr_q, cpu_sr_q, 1);
+tcg_gen_extract_i32(cpu_sr_q, src, SR_Q, 1);

having:

#define SR_Q  8

I wanted to write a Coccinelle script to check for this pattern.
My first version was wrong, as Richard Henderson reminded me this 
pattern can be applied as long as the len argument (here "1") is a 
Mersenne prime (all least significant bits as "1").


The codebase also defines:

#if TARGET_LONG_BITS == 64
# define tcg_gen_andi_tl tcg_gen_andi_i64
# define tcg_gen_shri_tl tcg_gen_shri_i64
#else
# define tcg_gen_andi_tl tcg_gen_andi_i32
# define tcg_gen_shri_tl tcg_gen_shri_i32
#endif

The same pattern can be applied for i32/i64/tl uses.


The following thread was helpful while writing this script:

https://github.com/coccinelle/coccinelle/issues/86

Signed-off-by: Philippe Mathieu-Daudé 
---
 scripts/coccinelle/tcg_gen_extract.cocci | 71 
 1 file changed, 71 insertions(+)
 create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci

diff --git a/scripts/coccinelle/tcg_gen_extract.cocci 
b/scripts/coccinelle/tcg_gen_extract.cocci
new file mode 100644
index 00..4823073005
--- /dev/null
+++ b/scripts/coccinelle/tcg_gen_extract.cocci
@@ -0,0 +1,71 @@
+// optimize TCG using extract op
+//
+// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+.
+// Confidence: High
+// Options: --macro-file scripts/cocci-macro-file.h
+//
+// Nikunj A Dadhania optimization:
+// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
+// Aurelien Jarno optimization:
+// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
+// Coccinelle helpful issue:
+// https://github.com/coccinelle/coccinelle/issues/86
+
+@match@ // match shri*+andi* pattern, calls script verify_len
+identifier ret, arg;
+constant ofs, len;
+identifier shr_fn =~ "^tcg_gen_shri_";
+identifier and_fn =~ "^tcg_gen_andi_";
+position shr_p;
+position and_p;
+@@
+(
+shr_fn@shr_p(ret, arg, ofs);
+and_fn@and_p(ret, ret, len);
+)


First I want to match any of:
- tcg_gen_andi_i32/tcg_gen_shri_i32
- tcg_gen_andi_i64/tcg_gen_shri_i64
- tcg_gen_andi_tl/tcg_gen_shri_tl

Now I want to verify "len" is Mersenne prime.


+@script:python verify_len@
+ret_s << match.ret;
+len_s << match.len;
+shr_s << match.shr_fn;
+and_s << match.and_fn;
+shr_p << match.shr_p;
+extract_fn;
+@@
+print "candidate at %s:%s" % (shr_p[0].file, shr_p[0].line)
+len_fn=len("tcg_gen_shri_")
+shr_sz=shr_s[len_fn:]
+and_sz=and_s[len_fn:]
+# TODO: op_size shr

shr_s/and_s are strings containing function name.
check we matched a combination i32/i32 or i64/i64 or tl/tl.

(I think having i32/i64 and i32/tl is also valid but expect Richard's 
confirmation, anyway I doubt those combinations are used).



+print "  op_size: %s/%s (%s)" % (shr_sz, and_sz, "same" if is_same_op_size else 
"DIFFERENT")
+is_optimizable = False
+if is_same_op_size:
+try: # only eval integer, no #define like 'SR_M' (cpp did this, else some 
headers are missing).
+len_v = long(len_s.strip("UL"), 0)


Here len_s is also a string.

Some "len" encountered:
[1, 0x, 0x1, 0x00FF00FF, 0xULL]

Now len_v is the value of len_s.


+low_bits = 0
+while (len_v & (1 << low_bits)):
+low_bits += 1


Dumbly count least significant bits.


+print "  low_bits:", low_bits, "(value: 0x%x)" % ((1 << low_bits) - 1)
+print "  len: 0x%x" % len_v
+is_optimizable = ((1 << low_bits) - 1) == len_v # check low_bits


Check if Mersenne prime of "low_bits" least significant bits is the same 
number than len_v, the function argument.


If Yes: len_v is a Mersenne prime and we can optimize.


+print "  len_bits %s= low_bits" % ("=" if is_optimizable else "!")
+print "  candidate", "IS" if is_optimizable else "is NOT", 
"optimizable"
+coccinelle.extract_fn = "tcg_gen_extract_" + and_sz


Add the "tcg_gen_extract()" function name as identifier in coccinelle 
namespace, appending if we are handling i32, i64 or tl.



+except:
+print "  ERROR (check included headers?)"
+cocci.include_match(is_optimizable)


If we can not optimize, then discard this environment.


+print
+
+@replacement depends on verify_len@
+identifier match.ret, match.arg;
+constant match.ofs, match.len;
+identifier match.

Re: [Qemu-devel] [PATCH v3 04/12] memory: fix address_space_get_iotlb_entry()

2017-05-11 Thread David Gibson
On Thu, May 11, 2017 at 05:36:03PM +0800, Peter Xu wrote:
> On Thu, May 11, 2017 at 11:56:38AM +1000, David Gibson wrote:
> > On Wed, May 10, 2017 at 04:01:47PM +0800, Peter Xu wrote:
> > > This function has an assumption that we will definitely call translate()
> > > once (or say, the addr will be located inside one IOMMU memory region),
> > > otherwise an empty IOTLB will be returned. Nevertheless, this is not
> > > what we want. When there is no IOMMU memory region, we should build up a
> > > static mapping for the caller, instead of an invalid IOTLB.
> > > 
> > > We won't trigger this path before VT-d passthrough mode. When
> > > passthrough mode for a vhost device is setup, VT-d is possible to
> > > disable the IOMMU region for that device. Without current patch, we'll
> > > get a vhost boot failure, and it'll be failed over to virtio userspace
> > > mode.
> > 
> > This doesn't look right to me.  You're assuming the target is
> > address_space_memory, which might not be the case - and you should be
> > able to check from the MR you do hit.  Furthermore it doesn't look
> > like you're accounting for the trivial translation if the section's
> > offset in the address space is different from its offset in the MR.
> 
> Do you mean this line?
> 
> addr = addr - section->offset_within_address_space
>+ section->offset_within_region;

Uh.. where is that line?  But.. wait, yes, I think I was mistaken.  I saw:
.translated_addr = iova,

and thought that meant you were assuming an identify mapping from iova
to translated addr.  But thinking more carefull, IIRC iova and
translated_addr are both relative to the MR, not the AS, so I think
that is correct after all.

> I thought it was calculating the relative address against that memory
> region. That should only be useful if we want to do further
> translate(), right? For the path that this patch tries to handle (when
> there is no translate() call), then this "addr" is useless here?
> 
> Regarding to the address space assignment - do you mean, e.g., I
> should use section->address_space here instead of
> &system_address_space? If so, I can do the switch.

Yes, I think you should.

> But after all, for
> now address_space_get_iotlb_entry() is only used by vhost codes, and
> it only check against iotlb.target_as == NULL, so the address space
> didn't count too much here...

> Another reason I used &address_space_memory is that in
> vfio_iommu_map_notify() we have a check against it:
> 
> if (iotlb->target_as != &address_space_memory) {
> error_report("Wrong target AS \"%s\", only system memory is allowed",
>  iotlb->target_as->name ? iotlb->target_as->name : 
> "none");
> return;
> }
> 
> Or say, we have some assumptions (not only in this patch) that assumes
> this iotlb.target_as should be system_address_space.

Right, the vhost code can only handle some IOMMU setups - something
like nested IOMMUs would break it.  But this way if someone sets up a
machine with an IOMMU configuration that vhost can't handle, you'll
get an error message, rather than accesses to unexpected locations,
which could cause really hard to debug corruption.

In other words we make assumptions, but we should _test_ those
assumptions.

I also think it would make sense to use address_space_translate() if we
can, since it's an existing interface for a very similar operation.


> 
> Thanks,
> 
> > 
> > I think for the fallback path you're going to want something based on
> > address_space_translate() instead.
> > 
> > > CC: Paolo Bonzini 
> > > CC: Jason Wang 
> > > CC: Michael S. Tsirkin 
> > > Signed-off-by: Peter Xu 
> > > ---
> > >  exec.c | 20 +++-
> > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/exec.c b/exec.c
> > > index 072de5d..5cfdacd 100644
> > > --- a/exec.c
> > > +++ b/exec.c
> > > @@ -463,12 +463,13 @@ 
> > > address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, 
> > > hwaddr *x
> > >  }
> > >  
> > >  /* Called from RCU critical section */
> > > -IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr 
> > > addr,
> > > +IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr 
> > > iova,
> > >  bool is_write)
> > >  {
> > >  IOMMUTLBEntry iotlb = {0};
> > >  MemoryRegionSection *section;
> > >  MemoryRegion *mr;
> > > +hwaddr addr = iova, psize;
> > >  
> > >  for (;;) {
> > >  AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
> > > @@ -478,6 +479,23 @@ IOMMUTLBEntry 
> > > address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
> > >  mr = section->mr;
> > >  
> > >  if (!mr->iommu_ops) {
> > > +/*
> > > + * We didn't translate() but reached here. It possibly
> > > + * means it's a static mapping. If so (it should be RAM),
> > > + * we set the IOTLB up.
> > > + 

Re: [Qemu-devel] [Qemu-ppc] [PATCH 7/8] target/ppc: optimize various functions using extract op

2017-05-11 Thread David Gibson
On Thu, May 11, 2017 at 10:46:01AM +0200, Laurent Vivier wrote:
> On 11/05/2017 02:41, David Gibson wrote:
> > On Wed, May 10, 2017 at 05:05:34PM -0300, Philippe Mathieu-Daudé wrote:
> >> Applied using Coccinelle script.
> >>
> >> Signed-off-by: Philippe Mathieu-Daudé 
> > 
> > Reviewed-by: David Gibson 
> 
> David, look at Nikunj's comments: only the two first changes are correct.
> 
> 'andi" uses a mask, but extract uses a 'width'. Philippe's changes work
> only when mask = 1.

Oops, that was sloppy of me, sorry.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 7/8] target/ppc: optimize various functions using extract op

2017-05-11 Thread David Gibson
On Thu, May 11, 2017 at 10:48:42PM -0300, Philippe Mathieu-Daudé wrote:
> Hi Nikunj,
> 
> On 05/11/2017 01:54 AM, Nikunj A Dadhania wrote:
> > Philippe Mathieu-Daudé  writes:
> > 
> > > Applied using Coccinelle script.
> > > 
> > > Signed-off-by: Philippe Mathieu-Daudé 
> > > ---
> > >  target/ppc/translate.c  |  9 +++--
> > >  target/ppc/translate/vsx-impl.inc.c | 21 +++--
> > >  2 files changed, 10 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > index f40b5a1abf..64ab412bf3 100644
> > > --- a/target/ppc/translate.c
> > > +++ b/target/ppc/translate.c
> > > @@ -868,8 +868,7 @@ static inline void gen_op_arith_add(DisasContext 
> > > *ctx, TCGv ret, TCGv arg1,
> > >  }
> > >  tcg_gen_xor_tl(cpu_ca, t0, t1);/* bits changed w/ 
> > > carry */
> > >  tcg_temp_free(t1);
> > > -tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);   /* extract bit 32 */
> > > -tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
> > > +tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
> > >  if (is_isa300(ctx)) {
> > >  tcg_gen_mov_tl(cpu_ca32, cpu_ca);
> > >  }
> > > @@ -1399,8 +1398,7 @@ static inline void gen_op_arith_subf(DisasContext 
> > > *ctx, TCGv ret, TCGv arg1,
> > >  tcg_temp_free(inv1);
> > >  tcg_gen_xor_tl(cpu_ca, t0, t1); /* bits changes w/ 
> > > carry */
> > >  tcg_temp_free(t1);
> > > -tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);/* extract bit 32 */
> > > -tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
> > > +tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
> > >  if (is_isa300(ctx)) {
> > >  tcg_gen_mov_tl(cpu_ca32, cpu_ca);
> > >  }
> > 
> > Above changes are correct.
> > 
> > Rest of them are wrong as discussed above in the thread with Richard.
> >
> 
> I tried to correct the cocci script and ran it again (will post in few min
> as v3) and got:
> 
> $ docker run -it -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle --sp-file
> scripts/coccinelle/tcg_gen_extract.cocci --macro-file
> scripts/cocci-macro-file.h --dir target/ppc
> init_defs_builtins: /usr/lib64/coccinelle/standard.h
> init_defs: scripts/cocci-macro-file.h
> HANDLING: target/ppc/mfrom_table_gen.c
> HANDLING: target/ppc/user_only_helper.c
> HANDLING: target/ppc/mmu-hash64.c
> HANDLING: target/ppc/timebase_helper.c
> HANDLING: target/ppc/gdbstub.c
> HANDLING: target/ppc/translate.c
> candidate at target/ppc/translate.c:5386
>   op_size: tl/tl (same)
>   low_bits: 4 (value: 0xf)
>   len: 0xf
>   len_bits == low_bits
>   candidate IS optimizable
> 
> candidate at target/ppc/translate.c:871
>   op_size: tl/tl (same)
>   low_bits: 1 (value: 0x1)
>   len: 0x1
>   len_bits == low_bits
>   candidate IS optimizable
> 
> candidate at target/ppc/translate.c:1402
>   op_size: tl/tl (same)
>   low_bits: 1 (value: 0x1)
>   len: 0x1
>   len_bits == low_bits
>   candidate IS optimizable
> 
> > > @@ -5383,8 +5381,7 @@ static void gen_mfsri(DisasContext *ctx)
> > >  CHK_SV;
> > >  t0 = tcg_temp_new();
> > >  gen_addr_reg_index(ctx, t0);
> > > -tcg_gen_shri_tl(t0, t0, 28);
> > > -tcg_gen_andi_tl(t0, t0, 0xF);
> > > +tcg_gen_extract_tl(t0, t0, 28, 0xF);
> > >  gen_helper_load_sr(cpu_gpr[rd], cpu_env, t0);
> > >  tcg_temp_free(t0);
> > >  if (ra != 0 && ra != rd)
> 
> 0xF = 0b so this one seems correct to, right?

No, I don't think so.  AFAICT tcg_gen_extract_tl() takes a field
width, not a mask as the last parameter.  So this would need to be
tcg_gen_extract_tl(t0, t0, 28, 4);

Your script needs to do essentially a log-base-2 of the mask.  I don't
know if Coccinelle can do that..

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-arm] [Qemu-devel PATCH 1/5] msf2: Add Smartfusion2 System timer

2017-05-11 Thread sundeep subbaraya
Hi Philippe,

On Fri, May 12, 2017 at 10:08 AM, Philippe Mathieu-Daudé 
wrote:

> On 05/10/2017 09:37 AM, sundeep subbaraya wrote:
>
>> Hi Phil,
>>
>> On Wed, May 10, 2017 at 3:11 PM, Philippe Mathieu-Daudé > > wrote:
>>
>>> Hi Subbaraya, nice work!
>>>
>>> The timer you are modeling is the mss_timer, which is in particular
>>>
>> used in
>>
>>> the smartfusion2, I'd rather name it mss_timer.c so it can be reused by
>>> other SoC models.
>>>
>>> Ok I will change all other file names also to mss. Do I need to change
>> type names
>> also to mss?
>>
>
> As you wish :) Actel/Microsemi keep changing how they name it, MSS, M2S...
> What I mean is this timer is valid for a Actel SmartFusion and for the
> MicroSemi SmartFusion2, naming it "msf2-timer" seems to restrict it to the
> SF2 only.
>
> Hmm. OK I will change to mss except for SoC model,file and SOM file.

> I added few comments.
>>>
>>>
>>> On 05/09/2017 01:44 PM, Subbaraya Sundeep wrote:
>>>

 Modelled System Timer in Microsemi's Smartfusion2 Soc.
 Timer has two 32bit down counters and two interrupts.

 Signed-off-by: Subbaraya Sundeep >>>
>>> >
>>
>>> ---
  hw/timer/Makefile.objs|   1 +
  hw/timer/msf2-timer.c | 252
 ++
  include/hw/timer/msf2-timer.h |  85 ++
  3 files changed, 338 insertions(+)
  create mode 100644 hw/timer/msf2-timer.c
  create mode 100644 include/hw/timer/msf2-timer.h

 [...]
>
>> +if (addr < ARRAY_SIZE(st->regs)) {
 +st->regs[addr] = value;
 +} else {
 +qemu_log_mask(LOG_GUEST_ERROR,
 + "%s: Bad offset 0x%" HWADDR_PRIx "\n",

>>> __func__,
>>
>>> + addr * 4);
 +}
 +break;
 +}
 +timer_update_irq(st);

>>>
>>>
>>> Here if addr >= (NUM_TIMERS * R_TIM1_MAX) you still update Timer1 IRQ,
>>>
>> while
>>
>>> this is unharmful right now this is likely to be break later.
>>>
>>> As long as Interrupt status register and Interrupt enable register are
>> not
>> modified calling timer_update_irq will not harm. Am I missing something
>> here?
>>
>
> Indeed, this is unharmful. It just surprised me when I follow the control
> flow.

Ok I will change it.

>
>
> +}
 +
 +static const MemoryRegionOps timer_ops = {
 +.read = timer_read,
 +.write = timer_write,
 +.endianness = DEVICE_NATIVE_ENDIAN,
 +.valid = {
 +.min_access_size = 4,

>>>
>>>
>>> I believe min_access_size = 1 is valid for any APB device.
>>>
>>>
>>> Ok. I followed Xilinx soft IP models while writing this. I am really not
>> sure it is mandatory to put access_size. Can i remove it?
>>
>
> checking the datasheet "UG0331: SmartFusion2 Microcontroller Subsystem":
>
> '''
> CMSIS Data types:
>
> The [Cortex-M3] processor:
> * supports the following data types:
> - 32-bit words
> - 16-bit halfwords
> - 8-bit bytes.
> * manages all data memory accesses as little-endian or big-endian.
> Instruction memory and Private Peripheral Bus (PPB) accesses are always
> performed as little-endian. The Cortex-M3 processor configured for
> SmartFusion2 SoC FPGA MSS uses only little-endian.
> '''
>
> So Yes, ".min_access_size = 1" is correct for this Cortex-M3.
>
> If you remove it memory_region_access_valid() will do:
>
> access_size_min = mr->ops->valid.min_access_size;
> if (!mr->ops->valid.min_access_size) {
> access_size_min = 1;
> }
>
> So that is the same, personally I prefer it to be explicit (not removed).
>
> Ok will change to 1.

> +.max_access_size = 4
 +}
 +};
 +
 +static void timer_hit(void *opaque)
 +{
 +struct Msf2Timer *st = opaque;
 +
 +st->regs[R_TIM_RIS] |= TIMER_RIS_ACK;
 +
 +if (!(st->regs[R_TIM_CTRL] & TIMER_CTRL_ONESHOT)) {
 +timer_update(st);
 +}
 +timer_update_irq(st);
 +}

>>> [...]
>
>> +/*
 + * There are two 32-bit down counting timers.
 + * Timers 1 and 2 can be concatenated into a single 64-bit Timer
 + * that operates either in Periodic mode or in One-shot mode.
 + * Writing 1 to the TIM64_MODE register bit 0 sets the Timers in 64-bit
 mode.
 + * In 64-bit mode, writing to the 32-bit registers has no effect.
 + * Similarly, in 32-bit mode, writing to the 64-bit mode registers
 + * has no effect. Only two 32-bit timers are supported currently.
 + */
 +#define NUM_TIMERS2
 +
 +#define MSF2_TIMER_FREQ   (83 * 100)

>>>
>>>
>>> I can not find this value, can you point me to the datasheet? It seems
>>> SoC
>>> specific to me.
>>>
>>> It is configured in Microsemi Libero. The SOM kit from Emcraft comes
>> with this default setting.
>> I guess this property should be set and passed from board file and not
>> from SoC.
>> A

[Qemu-devel] [PATCHv6 0/5] HPT resizing for pseries guests (qemu part)

2017-05-11 Thread David Gibson
This series implements the host side of the PAPR ACR to allow runtime
resizing of the Hashed Page Table (HPT) for pseries guests.
Exercising this feature requires a guest OS which is also aware of it.
Patches to implement the guest side in Linux are upstream as of v4.11.

Availability of the feature is controlled by a new 'resize-hpt'
machine option: it can be set to "disabled", "enabled" or "required".
The last means that qemu will refuse to boot a guest which is not
aware of the HPT resizing feature (it will instead quit during feature
negotiation).  This is potentially useful because guests which don't
support resizing will need an HPT sized for their maximum possible
RAM, which can be very wasteful of host resources.

This implementation will work for both TCG and KVM guests.  HV KVM
requires support in the host kernel which is also included in v4.11.

Changes since v5:
 * Rebase for qemu-2.10
 * Kernel patches now merged
 * Removed an unnecessary and possibly dangerous assert()
Changes since v4 (misposted as v3):
 * More minor revisions based on review comments
 * Altered to use new simpler encoding of kernel capability
Changes since v3:
 * Assorted minor revisions based on review
 * Added KVM support (PR and HV)
Changes since v2:
 * Some clearer comments based on review
 * Some minor cleanups based on review

David Gibson (5):
  pseries: Stubs for HPT resizing
  pseries: Implement HPT resizing
  pseries: Enable HPT resizing for 2.10
  pseries: Use smaller default hash page tables when guest can resize
  pseries: Allow HPT resizing with KVM

 hw/ppc/spapr.c  | 105 ++-
 hw/ppc/spapr_hcall.c| 430 
 hw/ppc/trace-events |   2 +
 include/hw/ppc/spapr.h  |  19 ++
 include/hw/ppc/spapr_ovec.h |   1 +
 target/ppc/kvm.c|  75 
 target/ppc/kvm_ppc.h|  26 +++
 target/ppc/mmu-hash64.h |   4 +
 8 files changed, 654 insertions(+), 8 deletions(-)

-- 
2.9.3




[Qemu-devel] [PATCHv6 5/5] pseries: Allow HPT resizing with KVM

2017-05-11 Thread David Gibson
So far, qemu implements the PAPR Hash Page Table (HPT) resizing extension
with TCG.  The same implementation will work with KVM PR, but we don't
currently allow that.  For KVM HV we can only implement resizing with the
assistance of the host kernel, which needs a new capability and ioctl()s.

This patch adds support for testing the new KVM capability and implementing
the resize in terms of KVM facilities when necessary.  If we're running on
a kernel which doesn't have the new capability flag at all, we fall back to
testing for PR vs. HV KVM using the same hack that we already use in a
number of places for older kernels.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr_hcall.c | 66 +++
 target/ppc/kvm.c | 67 ++--
 target/ppc/kvm_ppc.h | 21 
 3 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index f79db2d..38d51e0 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -426,6 +426,44 @@ static void cancel_hpt_prepare(sPAPRMachineState *spapr)
 free_pending_hpt(pending);
 }
 
+/* Convert a return code from the KVM ioctl()s implementing resize HPT
+ * into a PAPR hypercall return code */
+static target_ulong resize_hpt_convert_rc(int ret)
+{
+if (ret >= 10) {
+return H_LONG_BUSY_ORDER_100_SEC;
+} else if (ret >= 1) {
+return H_LONG_BUSY_ORDER_10_SEC;
+} else if (ret >= 1000) {
+return H_LONG_BUSY_ORDER_1_SEC;
+} else if (ret >= 100) {
+return H_LONG_BUSY_ORDER_100_MSEC;
+} else if (ret >= 10) {
+return H_LONG_BUSY_ORDER_10_MSEC;
+} else if (ret > 0) {
+return H_LONG_BUSY_ORDER_1_MSEC;
+}
+
+switch (ret) {
+case 0:
+return H_SUCCESS;
+case -EPERM:
+return H_AUTHORITY;
+case -EINVAL:
+return H_PARAMETER;
+case -ENXIO:
+return H_CLOSED;
+case -ENOSPC:
+return H_PTEG_FULL;
+case -EBUSY:
+return H_BUSY;
+case -ENOMEM:
+return H_NO_MEM;
+default:
+return H_HARDWARE;
+}
+}
+
 static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
  sPAPRMachineState *spapr,
  target_ulong opcode,
@@ -435,6 +473,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
 int shift = args[1];
 sPAPRPendingHPT *pending = spapr->pending_hpt;
 uint64_t current_ram_size = MACHINE(spapr)->ram_size;
+int rc;
 
 if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
 return H_AUTHORITY;
@@ -459,6 +498,11 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
 return H_RESOURCE;
 }
 
+rc = kvmppc_resize_hpt_prepare(cpu, flags, shift);
+if (rc != -ENOSYS) {
+return resize_hpt_convert_rc(rc);
+}
+
 if (pending) {
 /* something already in progress */
 if (pending->shift == shift) {
@@ -654,6 +698,11 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
 
 trace_spapr_h_resize_hpt_commit(flags, shift);
 
+rc = kvmppc_resize_hpt_commit(cpu, flags, shift);
+if (rc != -ENOSYS) {
+return resize_hpt_convert_rc(rc);
+}
+
 if (flags != 0) {
 return H_PARAMETER;
 }
@@ -676,6 +725,13 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
 spapr->htab = pending->hpt;
 spapr->htab_shift = pending->shift;
 
+if (kvm_enabled()) {
+/* For KVM PR, update the HPT pointer */
+target_ulong sdr1 = (target_ulong)(uintptr_t)spapr->htab
+| (spapr->htab_shift - 18);
+kvmppc_update_sdr1(sdr1);
+}
+
 pending->hpt = NULL; /* so it's not free()d */
 }
 
@@ -1472,11 +1528,21 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu,
 }
 
 if (spapr->htab_shift < maxshift) {
+CPUState *cs;
+
 /* Guest doesn't know about HPT resizing, so we
  * pre-emptively resize for the maximum permitted RAM.  At
  * the point this is called, nothing should have been
  * entered into the existing HPT */
 spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
+CPU_FOREACH(cs) {
+if (kvm_enabled()) {
+/* For KVM PR, update the HPT pointer */
+target_ulong sdr1 = (target_ulong)(uintptr_t)spapr->htab
+| (spapr->htab_shift - 18);
+kvmppc_update_sdr1(sdr1);
+}
+}
 }
 }
 
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 7815e01..480 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -89,6 +89,7 @@ static int cap_fixup_hcalls;
 static int cap_htm; /* Hardware transactional memory support */
 static int cap_mmu_radix;
 static int cap_mmu_hash_v3;
+

[Qemu-devel] [PATCHv6 1/5] pseries: Stubs for HPT resizing

2017-05-11 Thread David Gibson
This introduces stub implementations of the H_RESIZE_HPT_PREPARE and
H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR
extension to allow run time resizing of a guest's hash page table.  It
also adds a new machine property for controlling whether this new
facility is available.

For now we only allow resizing with TCG, allowing it with KVM will require
kernel changes as well.

Finally, it adds a new string to the hypertas property in the device
tree, advertising to the guest the availability of the HPT resizing
hypercalls.  This is a tentative suggested value, and would need to be
standardized by PAPR before being merged.

Signed-off-by: David Gibson 
Reviewed-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr.c | 75 ++
 hw/ppc/spapr_hcall.c   | 36 
 hw/ppc/trace-events|  2 ++
 include/hw/ppc/spapr.h | 11 
 target/ppc/kvm.c   | 12 
 target/ppc/kvm_ppc.h   |  5 
 6 files changed, 141 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1b7cada..50beee0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -816,6 +816,11 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void 
*fdt)
 if (!kvm_enabled() || kvmppc_spapr_use_multitce()) {
 add_str(hypertas, "hcall-multi-tce");
 }
+
+if (spapr->resize_hpt != SPAPR_RESIZE_HPT_DISABLED) {
+add_str(hypertas, "hcall-hpt-resize");
+}
+
 _FDT(fdt_setprop(fdt, rtas, "ibm,hypertas-functions",
  hypertas->str, hypertas->len));
 g_string_free(hypertas, TRUE);
@@ -2046,11 +2051,40 @@ static void ppc_spapr_init(MachineState *machine)
 hwaddr node0_size = spapr_node0_size();
 long load_limit, fw_size;
 char *filename;
+Error *resize_hpt_err = NULL;
 
 msi_nonbroken = true;
 
 QLIST_INIT(&spapr->phbs);
 
+/* Check HPT resizing availability */
+kvmppc_check_papr_resize_hpt(&resize_hpt_err);
+if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DEFAULT) {
+/*
+ * If the user explicitly requested a mode we should either
+ * supply it, or fail completely (which we do below).  But if
+ * it's not set explicitly, we reset our mode to something
+ * that works
+ */
+if (resize_hpt_err) {
+spapr->resize_hpt = SPAPR_RESIZE_HPT_DISABLED;
+error_free(resize_hpt_err);
+resize_hpt_err = NULL;
+} else {
+spapr->resize_hpt = smc->resize_hpt_default;
+}
+}
+
+assert(spapr->resize_hpt != SPAPR_RESIZE_HPT_DEFAULT);
+
+if ((spapr->resize_hpt != SPAPR_RESIZE_HPT_DISABLED) && resize_hpt_err) {
+/*
+ * User requested HPT resize, but this host can't supply it.  Bail out
+ */
+error_report_err(resize_hpt_err);
+exit(1);
+}
+
 /* Allocate RMA if necessary */
 rma_alloc_size = kvmppc_alloc_rma(&rma);
 
@@ -2467,6 +2501,40 @@ static void spapr_set_modern_hotplug_events(Object *obj, 
bool value,
 spapr->use_hotplug_event_source = value;
 }
 
+static char *spapr_get_resize_hpt(Object *obj, Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+switch (spapr->resize_hpt) {
+case SPAPR_RESIZE_HPT_DEFAULT:
+return g_strdup("default");
+case SPAPR_RESIZE_HPT_DISABLED:
+return g_strdup("disabled");
+case SPAPR_RESIZE_HPT_ENABLED:
+return g_strdup("enabled");
+case SPAPR_RESIZE_HPT_REQUIRED:
+return g_strdup("required");
+}
+assert(0);
+}
+
+static void spapr_set_resize_hpt(Object *obj, const char *value, Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+if (strcmp(value, "default") == 0) {
+spapr->resize_hpt = SPAPR_RESIZE_HPT_DEFAULT;
+} else if (strcmp(value, "disabled") == 0) {
+spapr->resize_hpt = SPAPR_RESIZE_HPT_DISABLED;
+} else if (strcmp(value, "enabled") == 0) {
+spapr->resize_hpt = SPAPR_RESIZE_HPT_ENABLED;
+} else if (strcmp(value, "required") == 0) {
+spapr->resize_hpt = SPAPR_RESIZE_HPT_REQUIRED;
+} else {
+error_setg(errp, "Bad value for \"resize-hpt\" property");
+}
+}
+
 static void spapr_machine_initfn(Object *obj)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
@@ -2487,6 +2555,12 @@ static void spapr_machine_initfn(Object *obj)
 " place of standard EPOW events when 
possible"
 " (required for memory hot-unplug 
support)",
 NULL);
+
+object_property_add_str(obj, "resize-hpt",
+spapr_get_resize_hpt, spapr_set_resize_hpt, NULL);
+object_property_set_description(obj, "resize-hpt",
+"Resizing of the Hash Page Table (enabled, 
disabled, required)",
+NULL);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -3152,6 +3

[Qemu-devel] [PATCHv6 3/5] pseries: Enable HPT resizing for 2.10

2017-05-11 Thread David Gibson
We've now implemented a PAPR extensions which allows PAPR guests (i.e.
"pseries" machine type) to resize their hash page table during runtime.

However, that extension is only enabled if explicitly chosen on the
command line.  This patch enables it by default for spapr-2.10, but leaves
it disabled (by default) for older machine types.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b9b7733..a0f5139 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3224,7 +3224,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->dr_lmb_enabled = true;
 smc->tcg_default_cpu = "POWER8";
 mc->has_hotpluggable_cpus = true;
-smc->resize_hpt_default = SPAPR_RESIZE_HPT_DISABLED;
+smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED;
 fwc->get_dev_path = spapr_get_fw_dev_path;
 nc->nmi_monitor_handler = spapr_nmi;
 smc->phb_placement = spapr_phb_placement;
@@ -3320,8 +3320,11 @@ static void 
spapr_machine_2_9_instance_options(MachineState *machine)
 
 static void spapr_machine_2_9_class_options(MachineClass *mc)
 {
+sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
 spapr_machine_2_10_class_options(mc);
 SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
+smc->resize_hpt_default = SPAPR_RESIZE_HPT_DISABLED;
 }
 
 DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
-- 
2.9.3




[Qemu-devel] [PATCHv6 4/5] pseries: Use smaller default hash page tables when guest can resize

2017-05-11 Thread David Gibson
We've now implemented a PAPR extension allowing PAPR guest to resize
their hash page table (HPT) during runtime.

This patch makes use of that facility to allocate smaller HPTs by default.
Specifically when a guest is aware of the HPT resize facility, qemu sizes
the HPT to the initial memory size, rather than the maximum memory size on
the assumption that the guest will resize its HPT if necessary for hot
plugged memory.

When the initial memory size is much smaller than the maximum memory size
(a common configuration with e.g. oVirt / RHEV) then this can save
significant memory on the HPT.

If the guest does *not* advertise HPT resize awareness when it makes the
ibm,client-architecture-support call, qemu resizes the HPT for maxmimum
memory size (unless it's been configured not to allow such guests at all).

For now we make that reallocation assuming the guest has not yet used the
HPT at all.  That's true in practice, but not, strictly, an architectural
or PAPR requirement.  If we need to in future we can fix this by having
the client-architecture-support call reboot the guest with the revised
HPT size (the client-architecture-support call is explicitly permitted to
trigger a reboot in this way).

Signed-off-by: David Gibson 
Reviewed-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr.c  | 23 ++-
 hw/ppc/spapr_hcall.c| 28 
 include/hw/ppc/spapr.h  |  2 ++
 include/hw/ppc/spapr_ovec.h |  1 +
 4 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a0f5139..785bfdb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1225,8 +1225,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize)
 return shift;
 }
 
-static void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
- Error **errp)
+void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
+  Error **errp)
 {
 long rc;
 
@@ -1277,9 +1277,17 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
*spapr, int shift,
 
 void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr)
 {
-spapr_reallocate_hpt(spapr,
- spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size),
- &error_fatal);
+int hpt_shift;
+
+if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
+|| (spapr->cas_reboot
+&& !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
+hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
+} else {
+hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->ram_size);
+}
+spapr_reallocate_hpt(spapr, hpt_shift, &error_fatal);
+
 if (spapr->vrma_adjust) {
 spapr->rma_size = kvmppc_rma_size(spapr_node0_size(),
   spapr->htab_shift);
@@ -2151,6 +2159,11 @@ static void ppc_spapr_init(MachineState *machine)
 spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
 }
 
+/* advertise support for HPT resizing */
+if (spapr->resize_hpt != SPAPR_RESIZE_HPT_DISABLED) {
+spapr_ovec_set(spapr->ov5, OV5_HPT_RESIZE);
+}
+
 /* init CPUs */
 if (machine->cpu_model == NULL) {
 machine->cpu_model = kvm_enabled() ? "host" : smc->tcg_default_cpu;
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 2979208..f79db2d 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1452,6 +1452,34 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu,
 guest_radix = spapr_ovec_test(ov5_guest, OV5_MMU_RADIX_300);
 spapr_ovec_clear(ov5_guest, OV5_MMU_RADIX_300);
 
+/*
+ * HPT resizing is a bit of a special case, because when enabled
+ * we assume the guest will support it until it says it doesn't,
+ * instead of assuming it won't support it until it says it does.
+ * Strictly speaking that approach could break for guests which
+ * don't make a CAS call, but those are so old we don't care about
+ * them.  Without that assumption we'd have to make at least a
+ * temporary allocation of an HPT sized for max memory, which
+ * could be impossibly difficult under KVM HV if maxram is large.
+ */
+if (!spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
+int maxshift = 
spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
+
+if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
+error_report(
+"h_client_architecture_support: Guest doesn't support HPT 
resizing, but resize-hpt=required");
+exit(1);
+}
+
+if (spapr->htab_shift < maxshift) {
+/* Guest doesn't know about HPT resizing, so we
+ * pre-emptively resize for the maximum permitted RAM.  At
+ * the point this is called, nothing should have been
+ * entered into the existing HPT */
+spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
+}
+}
+
 /* NO

[Qemu-devel] [PATCHv6 2/5] pseries: Implement HPT resizing

2017-05-11 Thread David Gibson
This patch implements hypercalls allowing a PAPR guest to resize its own
hash page table.  This will eventually allow for more flexible memory
hotplug.

The implementation is partially asynchronous, handled in a special thread
running the hpt_prepare_thread() function.  The state of a pending resize
is stored in SPAPR_MACHINE->pending_hpt.

The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or,
if one is already in progress, monitor it for completion.  If there is an
existing HPT resize in progress that doesn't match the size specified in
the call, it will cancel it, replacing it with a new one matching the
given size.

The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only
be called successfully once H_RESIZE_HPT_PREPARE has successfully
completed initialization of a new HPT.  The guest must ensure that there
are no concurrent accesses to the existing HPT while this is called (this
effectively means stop_machine() for Linux guests).

For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each
HPTE into the new HPT.  This can have quite high latency, but it seems to
be of the order of typical migration downtime latencies for HPTs of size
up to ~2GiB (which would be used in a 256GiB guest).

In future we probably want to move more of the rehashing to the "prepare"
phase, by having H_ENTER and other hcalls update both current and
pending HPTs.  That's a project for another day, but should be possible
without any changes to the guest interface.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  |   4 +-
 hw/ppc/spapr_hcall.c| 306 +++-
 include/hw/ppc/spapr.h  |   6 +
 target/ppc/mmu-hash64.h |   4 +
 4 files changed, 314 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 50beee0..b9b7733 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -95,8 +95,6 @@
 
 #define PHANDLE_XICP0x
 
-#define HTAB_SIZE(spapr)(1ULL << ((spapr)->htab_shift))
-
 static ICSState *spapr_ics_create(sPAPRMachineState *spapr,
   const char *type_ics,
   int nr_irqs, Error **errp)
@@ -1214,7 +1212,7 @@ static void spapr_store_hpte(PPCVirtualHypervisor *vhyp, 
hwaddr ptex,
 }
 }
 
-static int spapr_hpt_shift_for_ramsize(uint64_t ramsize)
+int spapr_hpt_shift_for_ramsize(uint64_t ramsize)
 {
 int shift;
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 29d549f..2979208 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -3,6 +3,7 @@
 #include "sysemu/hw_accel.h"
 #include "sysemu/sysemu.h"
 #include "qemu/log.h"
+#include "qemu/error-report.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "helper_regs.h"
@@ -354,20 +355,286 @@ static target_ulong h_read(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
 return H_SUCCESS;
 }
 
+struct sPAPRPendingHPT {
+/* These fields are read-only after initialization */
+int shift;
+QemuThread thread;
+
+/* These fields are protected by the BQL */
+bool complete;
+
+/* These fields are private to the preparation thread if
+ * !complete, otherwise protected by the BQL */
+int ret;
+void *hpt;
+};
+
+static void free_pending_hpt(sPAPRPendingHPT *pending)
+{
+if (pending->hpt) {
+qemu_vfree(pending->hpt);
+}
+
+g_free(pending);
+}
+
+static void *hpt_prepare_thread(void *opaque)
+{
+sPAPRPendingHPT *pending = opaque;
+size_t size = 1ULL << pending->shift;
+
+pending->hpt = qemu_memalign(size, size);
+if (pending->hpt) {
+memset(pending->hpt, 0, size);
+pending->ret = H_SUCCESS;
+} else {
+pending->ret = H_NO_MEM;
+}
+
+qemu_mutex_lock_iothread();
+
+if (SPAPR_MACHINE(qdev_get_machine())->pending_hpt == pending) {
+/* Ready to go */
+pending->complete = true;
+} else {
+/* We've been cancelled, clean ourselves up */
+free_pending_hpt(pending);
+}
+
+qemu_mutex_unlock_iothread();
+return NULL;
+}
+
+/* Must be called with BQL held */
+static void cancel_hpt_prepare(sPAPRMachineState *spapr)
+{
+sPAPRPendingHPT *pending = spapr->pending_hpt;
+
+/* Let the thread know it's cancelled */
+spapr->pending_hpt = NULL;
+
+if (!pending) {
+/* Nothing to do */
+return;
+}
+
+if (!pending->complete) {
+/* thread will clean itself up */
+return;
+}
+
+free_pending_hpt(pending);
+}
+
 static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
  sPAPRMachineState *spapr,
  target_ulong opcode,
  target_ulong *args)
 {
 target_ulong flags = args[0];
-target_ulong shift = args[1];
+int shift = args[1];
+sPAPRPendingHPT *pending = spapr->pending_hpt;
+uint64_t current_ram_size = MACHINE(spapr)->r

Re: [Qemu-devel] [Qemu-devel PATCH 4/5] msf2: Add Smartfusion2 SoC.

2017-05-11 Thread Philippe Mathieu-Daudé

On 05/12/2017 12:17 AM, sundeep subbaraya wrote:

Hi Philippe,

On Wed, May 10, 2017 at 5:20 PM, Philippe Mathieu-Daudé 
wrote:


Hi Subbaraya,


On 05/09/2017 01:44 PM, Subbaraya Sundeep wrote:


Smartfusion2 SoC has hardened Microcontroller subsystem
and flash based FPGA fabric. This patch adds support for
Microcontroller subsystem in the SoC.

Signed-off-by: Subbaraya Sundeep 
---
 default-configs/arm-softmmu.mak |   1 +
 hw/arm/Makefile.objs|   2 +-
 hw/arm/msf2-soc.c   | 188 ++
++
 include/hw/arm/msf2-soc.h   |  60 +
 4 files changed, 250 insertions(+), 1 deletion(-)
 create mode 100644 hw/arm/msf2-soc.c
 create mode 100644 include/hw/arm/msf2-soc.h

diff --git a/default-configs/arm-softmmu.mak
b/default-configs/arm-softmmu.mak
index 78d7af0..7062512 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -122,3 +122,4 @@ CONFIG_ACPI=y
 CONFIG_SMBIOS=y
 CONFIG_ASPEED_SOC=y
 CONFIG_GPIO_KEY=y
+CONFIG_MSF2=y
diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 4c5c4ee..ae5e4a3 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -1,7 +1,7 @@
 obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
 obj-$(CONFIG_DIGIC) += digic_boards.o
 obj-y += integratorcp.o mainstone.o musicpal.o nseries.o
-obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
+obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o msf2-soc.o



Not a big deal, but since you added CONFIG_MSF2 why not using it here and
the Makefiles you touched (misc/ssi/timer)?

obj-$(CONFIG_MSF2) += msf2-soc.o

  OK. Will change it.




 obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o

 obj-$(CONFIG_ACPI) += virt-acpi-build.o
 obj-y += netduino2.o

[...]

+MemoryRegion *system_memory = get_system_memory();
+MemoryRegion *nvm = g_new(MemoryRegion, 1);
+MemoryRegion *nvm_alias = g_new(MemoryRegion, 1);
+MemoryRegion *sram = g_new(MemoryRegion, 1);
+MemoryRegion *ddr = g_new(MemoryRegion, 1);
+
+memory_region_init_ram(nvm, NULL, "MSF2.envm", ENVM_SIZE,
+   &error_fatal);



Maybe you can name it "eNVM" to match the documentation.

Also envm_size should be a per-model property.



Ok.



+memory_region_init_alias(nvm_alias, NULL, "MSF2.flash.alias",

+ nvm, 0, ENVM_SIZE);



Hmmm well this would be the "Cache Matrix Remap" which happens to be
mapped by default to eNVM on cold reset.
Naming it "MSF2.flash.alias" is pretty confusing.



Exactly it is Cache Matrix Remap.
AFAIK currently we cannot remap memory during runtime in Qemu.
So I handled default remap with alias.
Please suggest the name. MSF2.eNVM.alias sounds fine?


Hmm Peter, Francis?

Personally I prefer "bus_remap.alias" which is explicit.

"eNVM.alias" is only true on Cold Reset.



+vmstate_register_ram_global(nvm);

+
+memory_region_set_readonly(nvm, true);
+memory_region_set_readonly(nvm_alias, true);
+
+memory_region_add_subregion(system_memory, ENVM_BASE_ADDRESS, nvm);
+memory_region_add_subregion(system_memory, 0, nvm_alias);
+
+memory_region_init_ram(ddr, NULL, "MSF2.ddr", DDR_SIZE,
+   &error_fatal);



Wrong, there is no DDR on this SoC.


DDR controller is there in Smartfusion2 (different from Smartfusion). As
you said below this
should be in board file.


There IS a DDRC in this SoC, but here you are registering a DDR 'ram' 
memory region, not a controller. This SoC can be used without any DDR, 
enough using embedded eNVM and eSRAM.


Now it happens your SoM board provides a DDR chip connected to this SoC.



+vmstate_register_ram_global(ddr);

+memory_region_add_subregion(system_memory, DDR_BASE_ADDRESS, ddr);
+
+memory_region_init_ram(sram, NULL, "MSF2.sram", SRAM_SIZE,
+   &error_fatal);



I'd rather like to see it named "eSRAM" somehow, so there is no confusion
possible with external SRAM a SoM/board can map at 0x6000.

Same comment than envm_size, sram_size should be a per-model property.

OK



+vmstate_register_ram_global(sram);

+memory_region_add_subregion(system_memory, SRAM_BASE_ADDRESS, sram);
+
+armv7m = DEVICE(&s->armv7m);
+qdev_prop_set_uint32(armv7m, "num-irq", 96);



Can you point me to your datasheet? I thought the SF2 had 240 IRQs.



Please go to link:
https://www.microsemi.com/document-portal/search_form
and provide search keyword as "UG0331". You can the download the spec.
It has 81 irqs I remember when I have given 81 qemu complained not multiple
of 4.
I checked again with 81 and it is fine. I will change it to 81.


Ok :)


+qdev_prop_set_string(armv7m, "cpu-model", "cortex-m3");

+object_property_set_link(OBJECT(&s->armv7m),
OBJECT(get_system_memory()),
+ "memory", &error_abort);

[...]

+#define MSF2_NUM_SPIS 2
+#define MSF2_NUM_UARTS2
+
+#define ENVM_BASE_

Re: [Qemu-devel] [Qemu-arm] [Qemu-devel PATCH 1/5] msf2: Add Smartfusion2 System timer

2017-05-11 Thread Philippe Mathieu-Daudé

On 05/10/2017 09:37 AM, sundeep subbaraya wrote:

Hi Phil,

On Wed, May 10, 2017 at 3:11 PM, Philippe Mathieu-Daudé mailto:f4...@amsat.org>> wrote:

Hi Subbaraya, nice work!

The timer you are modeling is the mss_timer, which is in particular

used in

the smartfusion2, I'd rather name it mss_timer.c so it can be reused by
other SoC models.


Ok I will change all other file names also to mss. Do I need to change
type names
also to mss?


As you wish :) Actel/Microsemi keep changing how they name it, MSS, 
M2S... What I mean is this timer is valid for a Actel SmartFusion and 
for the MicroSemi SmartFusion2, naming it "msf2-timer" seems to restrict 
it to the SF2 only.



I added few comments.


On 05/09/2017 01:44 PM, Subbaraya Sundeep wrote:


Modelled System Timer in Microsemi's Smartfusion2 Soc.
Timer has two 32bit down counters and two interrupts.

Signed-off-by: Subbaraya Sundeep 
>

---
 hw/timer/Makefile.objs|   1 +
 hw/timer/msf2-timer.c | 252
++
 include/hw/timer/msf2-timer.h |  85 ++
 3 files changed, 338 insertions(+)
 create mode 100644 hw/timer/msf2-timer.c
 create mode 100644 include/hw/timer/msf2-timer.h


[...]

+if (addr < ARRAY_SIZE(st->regs)) {
+st->regs[addr] = value;
+} else {
+qemu_log_mask(LOG_GUEST_ERROR,
+ "%s: Bad offset 0x%" HWADDR_PRIx "\n",

__func__,

+ addr * 4);
+}
+break;
+}
+timer_update_irq(st);



Here if addr >= (NUM_TIMERS * R_TIM1_MAX) you still update Timer1 IRQ,

while

this is unharmful right now this is likely to be break later.


As long as Interrupt status register and Interrupt enable register are not
modified calling timer_update_irq will not harm. Am I missing something
here?


Indeed, this is unharmful. It just surprised me when I follow the 
control flow.



+}
+
+static const MemoryRegionOps timer_ops = {
+.read = timer_read,
+.write = timer_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 4,



I believe min_access_size = 1 is valid for any APB device.



Ok. I followed Xilinx soft IP models while writing this. I am really not
sure it is mandatory to put access_size. Can i remove it?


checking the datasheet "UG0331: SmartFusion2 Microcontroller Subsystem":

'''
CMSIS Data types:

The [Cortex-M3] processor:
* supports the following data types:
- 32-bit words
- 16-bit halfwords
- 8-bit bytes.
* manages all data memory accesses as little-endian or big-endian. 
Instruction memory and Private Peripheral Bus (PPB) accesses are always 
performed as little-endian. The Cortex-M3 processor configured for 
SmartFusion2 SoC FPGA MSS uses only little-endian.

'''

So Yes, ".min_access_size = 1" is correct for this Cortex-M3.

If you remove it memory_region_access_valid() will do:

access_size_min = mr->ops->valid.min_access_size;
if (!mr->ops->valid.min_access_size) {
access_size_min = 1;
}

So that is the same, personally I prefer it to be explicit (not removed).


+.max_access_size = 4
+}
+};
+
+static void timer_hit(void *opaque)
+{
+struct Msf2Timer *st = opaque;
+
+st->regs[R_TIM_RIS] |= TIMER_RIS_ACK;
+
+if (!(st->regs[R_TIM_CTRL] & TIMER_CTRL_ONESHOT)) {
+timer_update(st);
+}
+timer_update_irq(st);
+}

[...]

+/*
+ * There are two 32-bit down counting timers.
+ * Timers 1 and 2 can be concatenated into a single 64-bit Timer
+ * that operates either in Periodic mode or in One-shot mode.
+ * Writing 1 to the TIM64_MODE register bit 0 sets the Timers in 64-bit
mode.
+ * In 64-bit mode, writing to the 32-bit registers has no effect.
+ * Similarly, in 32-bit mode, writing to the 64-bit mode registers
+ * has no effect. Only two 32-bit timers are supported currently.
+ */
+#define NUM_TIMERS2
+
+#define MSF2_TIMER_FREQ   (83 * 100)



I can not find this value, can you point me to the datasheet? It seems SoC
specific to me.


It is configured in Microsemi Libero. The SOM kit from Emcraft comes
with this default setting.
I guess this property should be set and passed from board file and not
from SoC.
Am I correct?


It seems an option configurable in Libero before synthesizing, so that 
would be SoM/bitfile specific?


What I mean here is I don't think this is a fixed value for a mss_timer 
and I'd rather have it configurable (but ok to default 83MHz in your SoM).


> Can I attach the datasheet to this thread?

Isn't this datasheet publicly available?

Eventually can you upload a binary (like your Linux patches) somewhere? 
So it would be easier to test this patchset.



Thank you,
Sundeep


Good luck!

Phil.



[Qemu-devel] [PATCH v8 3/3] ramblock: add new hmp command "info ramblock"

2017-05-11 Thread Peter Xu
To dump information about ramblocks. It looks like:

(qemu) info ramblock
  Block NamePSize  Offset   Used
  Total
/objects/mem2 MiB  0x 0x8000 
0x8000
vga.vram4 KiB  0x8006 0x0100 
0x0100
/rom@etc/acpi/tables4 KiB  0x810b 0x0002 
0x0020
 pc.bios4 KiB  0x8000 0x0004 
0x0004
  :00:03.0/e1000.rom4 KiB  0x8107 0x0004 
0x0004
  pc.rom4 KiB  0x8004 0x0002 
0x0002
:00:02.0/vga.rom4 KiB  0x8106 0x0001 
0x0001
   /rom@etc/table-loader4 KiB  0x812b 0x1000 
0x1000
  /rom@etc/acpi/rsdp4 KiB  0x812b1000 0x1000 
0x1000

Ramblock is something hidden internally in QEMU implementation, and this
command should only be used by mostly QEMU developers on RAM stuff. It
is not a command suitable for QMP interface. So only HMP interface is
provided for it.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 exec.c | 22 ++
 hmp-commands-info.hx   | 14 ++
 hmp.c  |  6 ++
 hmp.h  |  1 +
 include/exec/ramlist.h |  1 +
 5 files changed, 44 insertions(+)

diff --git a/exec.c b/exec.c
index 50519ae..821bef3 100644
--- a/exec.c
+++ b/exec.c
@@ -71,6 +71,8 @@
 #include "qemu/mmap-alloc.h"
 #endif
 
+#include "monitor/monitor.h"
+
 //#define DEBUG_SUBPAGE
 
 #if !defined(CONFIG_USER_ONLY)
@@ -1333,6 +1335,26 @@ void qemu_mutex_unlock_ramlist(void)
 qemu_mutex_unlock(&ram_list.mutex);
 }
 
+void ram_block_dump(Monitor *mon)
+{
+RAMBlock *block;
+char *psize;
+
+rcu_read_lock();
+monitor_printf(mon, "%24s %8s  %18s %18s %18s\n",
+   "Block Name", "PSize", "Offset", "Used", "Total");
+RAMBLOCK_FOREACH(block) {
+psize = size_to_str(block->page_size);
+monitor_printf(mon, "%24s %8s  0x%016" PRIx64 " 0x%016" PRIx64
+   " 0x%016" PRIx64 "\n", block->idstr, psize,
+   (uint64_t)block->offset,
+   (uint64_t)block->used_length,
+   (uint64_t)block->max_length);
+g_free(psize);
+}
+rcu_read_unlock();
+}
+
 #ifdef __linux__
 /*
  * FIXME TOCTTOU: this iterates over memory backends' mem-path, which
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index a53f105..ae16901 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -788,6 +788,20 @@ Display the latest dump status.
 ETEXI
 
 {
+.name   = "ramblock",
+.args_type  = "",
+.params = "",
+.help   = "Display system ramblock information",
+.cmd= hmp_info_ramblock,
+},
+
+STEXI
+@item info ramblock
+@findex ramblock
+Dump all the ramblocks of the system.
+ETEXI
+
+{
 .name   = "hotpluggable-cpus",
 .args_type  = "",
 .params = "",
diff --git a/hmp.c b/hmp.c
index 524e589..e0ba13c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -39,6 +39,7 @@
 #include "qemu-io.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
+#include "exec/ramlist.h"
 #include "hw/intc/intc.h"
 
 #ifdef CONFIG_SPICE
@@ -2738,6 +2739,11 @@ void hmp_info_dump(Monitor *mon, const QDict *qdict)
 qapi_free_DumpQueryResult(result);
 }
 
+void hmp_info_ramblock(Monitor *mon, const QDict *qdict)
+{
+ram_block_dump(mon);
+}
+
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
diff --git a/hmp.h b/hmp.h
index 37bb65a..d8b94ce 100644
--- a/hmp.h
+++ b/hmp.h
@@ -140,6 +140,7 @@ void hmp_rocker_ports(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
 void hmp_info_dump(Monitor *mon, const QDict *qdict);
+void hmp_info_ramblock(Monitor *mon, const QDict *qdict);
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
 
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index f1c6b45..2e2ac6c 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -73,5 +73,6 @@ void ram_block_notifier_remove(RAMBlockNotifier *n);
 void ram_block_notify_add(void *host, size_t size);
 void ram_block_notify_remove(void *host, size_t size);
 
+void ram_block_dump(Monitor *mon);
 
 #endif /* RAMLIST_H */
-- 
2.7.4




[Qemu-devel] [PATCH v8 1/3] ramblock: add RAMBLOCK_FOREACH()

2017-05-11 Thread Peter Xu
So that it can simplifies the iterators.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 exec.c | 22 +++---
 include/exec/ramlist.h |  5 +
 migration/ram.c| 13 +++--
 3 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/exec.c b/exec.c
index eac6085..50519ae 100644
--- a/exec.c
+++ b/exec.c
@@ -978,7 +978,7 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t addr)
 if (block && addr - block->offset < block->max_length) {
 return block;
 }
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 if (addr - block->offset < block->max_length) {
 goto found;
 }
@@ -1578,12 +1578,12 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
 return 0;
 }
 
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 ram_addr_t end, next = RAM_ADDR_MAX;
 
 end = block->offset + block->max_length;
 
-QLIST_FOREACH_RCU(next_block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(next_block) {
 if (next_block->offset >= end) {
 next = MIN(next, next_block->offset);
 }
@@ -1609,7 +1609,7 @@ unsigned long last_ram_page(void)
 ram_addr_t last = 0;
 
 rcu_read_lock();
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 last = MAX(last, block->offset + block->max_length);
 }
 rcu_read_unlock();
@@ -1659,7 +1659,7 @@ void qemu_ram_set_idstr(RAMBlock *new_block, const char 
*name, DeviceState *dev)
 pstrcat(new_block->idstr, sizeof(new_block->idstr), name);
 
 rcu_read_lock();
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 if (block != new_block &&
 !strcmp(block->idstr, new_block->idstr)) {
 fprintf(stderr, "RAMBlock \"%s\" already registered, abort!\n",
@@ -1693,7 +1693,7 @@ size_t qemu_ram_pagesize_largest(void)
 RAMBlock *block;
 size_t largest = 0;
 
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 largest = MAX(largest, qemu_ram_pagesize(block));
 }
 
@@ -1839,7 +1839,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
  * QLIST (which has an RCU-friendly variant) does not have insertion at
  * tail, so save the last element in last_block.
  */
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 last_block = block;
 if (block->max_length < new_block->max_length) {
 break;
@@ -2021,7 +2021,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 int flags;
 void *area, *vaddr;
 
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 offset = addr - block->offset;
 if (offset < block->max_length) {
 vaddr = ramblock_ptr(block, offset);
@@ -2167,7 +2167,7 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool 
round_offset,
 goto found;
 }
 
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 /* This case append when the block is not mapped. */
 if (block->host == NULL) {
 continue;
@@ -2200,7 +2200,7 @@ RAMBlock *qemu_ram_block_by_name(const char *name)
 {
 RAMBlock *block;
 
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 if (!strcmp(name, block->idstr)) {
 return block;
 }
@@ -3424,7 +3424,7 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void 
*opaque)
 int ret = 0;
 
 rcu_read_lock();
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 ret = func(block->idstr, block->host, block->offset,
block->used_length, opaque);
 if (ret) {
diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index c59880d..f1c6b45 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -4,6 +4,7 @@
 #include "qemu/queue.h"
 #include "qemu/thread.h"
 #include "qemu/rcu.h"
+#include "qemu/rcu_queue.h"
 
 typedef struct RAMBlockNotifier RAMBlockNotifier;
 
@@ -54,6 +55,10 @@ typedef struct RAMList {
 } RAMList;
 extern RAMList ram_list;
 
+/* Should be holding either ram_list.mutex, or the RCU lock. */
+#define  RAMBLOCK_FOREACH(block)  \
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next)
+
 void qemu_mutex_lock_ramlist(void);
 void qemu_mutex_unlock_ramlist(void);
 
diff --git a/migration/ram.c b/migration/ram.c
index 293d27c..d88afea 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -648,7 +648,7 @@ uint64_t ram_pagesize_summary(void)
 RAMBlock *block;
 uint64_t summary = 0;
 
-QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+RAMBLOCK_FOREACH(block) {
 summary |= block->page_size;
 }
 
@@ -676,7 +676,7 @@ static void migration_bitmap_sync(RAMState 

[Qemu-devel] [PATCH v8 2/3] utils: provide size_to_str()

2017-05-11 Thread Peter Xu
Moving the algorithm from print_type_size() into size_to_str() so that
other component can also leverage it. With that, refactor
print_type_size().

The assert() in that logic is removed though, since even UINT64_MAX
would not overflow.

Signed-off-by: Peter Xu 
---
 include/qemu-common.h|  1 +
 qapi/string-output-visitor.c | 22 ++
 util/cutils.c| 25 +
 3 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index d218821..387ef52 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -145,6 +145,7 @@ void qemu_hexdump(const char *buf, FILE *fp, const char 
*prefix, size_t size);
 int parse_debug_env(const char *name, int max, int initial);
 
 const char *qemu_ether_ntoa(const MACAddr *mac);
+char *size_to_str(uint64_t val);
 void page_size_init(void);
 
 /* returns non-zero if dump is in progress, otherwise zero is
diff --git a/qapi/string-output-visitor.c b/qapi/string-output-visitor.c
index 94ac821..53c2175 100644
--- a/qapi/string-output-visitor.c
+++ b/qapi/string-output-visitor.c
@@ -211,10 +211,8 @@ static void print_type_size(Visitor *v, const char *name, 
uint64_t *obj,
 Error **errp)
 {
 StringOutputVisitor *sov = to_sov(v);
-static const char suffixes[] = { 'B', 'K', 'M', 'G', 'T', 'P', 'E' };
-uint64_t div, val;
-char *out;
-int i;
+uint64_t val;
+char *out, *psize;
 
 if (!sov->human) {
 out = g_strdup_printf("%"PRIu64, *obj);
@@ -223,19 +221,11 @@ static void print_type_size(Visitor *v, const char *name, 
uint64_t *obj,
 }
 
 val = *obj;
-
-/* The exponent (returned in i) minus one gives us
- * floor(log2(val * 1024 / 1000).  The correction makes us
- * switch to the higher power when the integer part is >= 1000.
- */
-frexp(val / (1000.0 / 1024.0), &i);
-i = (i - 1) / 10;
-assert(i < ARRAY_SIZE(suffixes));
-div = 1ULL << (i * 10);
-
-out = g_strdup_printf("%"PRIu64" (%0.3g %c%s)", val,
-  (double)val/div, suffixes[i], i ? "iB" : "");
+psize = size_to_str(val);
+out = g_strdup_printf("%"PRIu64" (%s)", val, psize);
 string_output_set(sov, out);
+
+g_free(psize);
 }
 
 static void print_type_bool(Visitor *v, const char *name, bool *obj,
diff --git a/util/cutils.c b/util/cutils.c
index 50ad179..1534682 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -619,3 +619,28 @@ const char *qemu_ether_ntoa(const MACAddr *mac)
 
 return ret;
 }
+
+/*
+ * Return human readable string for size @val.
+ * @val can be anything that uint64_t allows (no more than "16 EiB").
+ * Use IEC binary units like KiB, MiB, and so forth.
+ * Caller is responsible for passing it to g_free().
+ */
+char *size_to_str(uint64_t val)
+{
+static const char *suffixes[] = { "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei" };
+unsigned long div;
+int i;
+
+/*
+ * The exponent (returned in i) minus one gives us
+ * floor(log2(val * 1024 / 1000).  The correction makes us
+ * switch to the higher power when the integer part is >= 1000.
+ * (see e41b509d68afb1f for more info)
+ */
+frexp(val / (1000.0 / 1024.0), &i);
+i = (i - 1) / 10;
+div = 1ULL << (i * 10);
+
+return g_strdup_printf("%0.3g %sB", (double)val / div, suffixes[i]);
+}
-- 
2.7.4




[Qemu-devel] [PATCH v8 0/3] ramblock: add hmp command "info ramblock"

2017-05-11 Thread Peter Xu
v8:
- patch 1: add r-b for Dave
- patch 2: use "uint64_t" for size_to_str() parameter, remove assert()
  since it's useless now [Dave]
- drop patch 4

v7:
- patch 1: removed Dave's r-b since the patch conflicted during rebase
- patch 2: add r-b for Markus, with the nice function comment that
  provided [Markus]
- patch 3: add r-b for Dave
- patch 4 (new): added new patch to remove assert in size_to_str(),
  assuming that would be better.

v6
- patch 2: instead of create a new size_to_str(), abstract the logic
  out from print_type_size(), refactor it, to make sure
  print_type_size() dumps exactly the same thing as before. (a simple
  test with info qtree is done)
- let suffixes be an array of strings [Markus]

v5
- add r-b for Dave on first patch (which I forgot in v4, so I got it
  again)
- add one more patch to introduce size_to_str() as patch 2 [Dave]
- let the last patch use the new interface

v4:
- move page_size_to_str() into util/cutil.c [Dave]

v3:
- cast the three PRIx64 addresses using (uint64_t) [Fam]
- add more comment in patch 2 to emphasize that this command is only
  suitable for HMP, not QMP [Markus]

v2:
- replace "lx" with "PRIx64" in three places

Sometimes I would like to know ramblock info for a VM. This command
would help. It provides a way to dump ramblock info. Currently the
list is by default sorted by size, though I think it's good enough.

Please review, thanks.

Peter Xu (3):
  ramblock: add RAMBLOCK_FOREACH()
  utils: provide size_to_str()
  ramblock: add new hmp command "info ramblock"

 exec.c   | 44 +---
 hmp-commands-info.hx | 14 ++
 hmp.c|  6 ++
 hmp.h|  1 +
 include/exec/ramlist.h   |  6 ++
 include/qemu-common.h|  1 +
 migration/ram.c  | 13 +++--
 qapi/string-output-visitor.c | 22 ++
 util/cutils.c| 25 +
 9 files changed, 99 insertions(+), 33 deletions(-)

-- 
2.7.4




Re: [Qemu-devel] [Qemu-devel PATCH 2/5] msf2: Microsemi Smartfusion2 System Register block.

2017-05-11 Thread Philippe Mathieu-Daudé

On 05/12/2017 12:48 AM, sundeep subbaraya wrote:

Hi Phiilippe,

On Wed, May 10, 2017 at 4:04 PM, Philippe Mathieu-Daudé 
wrote:


Hi Subbaraya,


On 05/09/2017 01:44 PM, Subbaraya Sundeep wrote:


Added Sytem register block of Smartfusion2.
This block has PLL registers which are accessed by guest.

Signed-off-by: Subbaraya Sundeep 
---
 hw/misc/Makefile.objs |   1 +
 hw/misc/msf2-sysreg.c | 131 ++

 include/hw/misc/msf2-sysreg.h |  80 ++
 3 files changed, 212 insertions(+)
 create mode 100644 hw/misc/msf2-sysreg.c
 create mode 100644 include/hw/misc/msf2-sysreg.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index c8b4893..0f52354 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -56,3 +56,4 @@ obj-$(CONFIG_EDU) += edu.o
 obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
 obj-$(CONFIG_AUX) += auxbus.o
 obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
+obj-$(CONFIG_MSF2) += msf2-sysreg.o
diff --git a/hw/misc/msf2-sysreg.c b/hw/misc/msf2-sysreg.c
new file mode 100644
index 000..53e9cba
--- /dev/null
+++ b/hw/misc/msf2-sysreg.c
@@ -0,0 +1,131 @@
+/*
+ * System Register block model of Microsemi SmartFusion2.
+ *
+ * Copyright (c) 2017 Subbaraya Sundeep 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * You should have received a copy of the GNU General Public License
along
+ * with this program; if not, see .
+ */
+
+#include "hw/misc/msf2-sysreg.h"
+
+#ifndef MSF2_SYSREG_ERR_DEBUG
+#define MSF2_SYSREG_ERR_DEBUG  0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (MSF2_SYSREG_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static void msf2_sysreg_reset(DeviceState *d)
+{
+MSF2SysregState *s = MSF2_SYSREG(d);
+
+DB_PRINT("RESET\n");
+
+s->regs[MSSDDR_PLL_STATUS_LOW_CR] = 0x02420041;
+s->regs[MSSDDR_FACC1_CR] = 0x0A482124;
+s->regs[MSSDDR_PLL_STATUS] = 0x3;
+}
+
+static uint64_t msf2_sysreg_read(void *opaque, hwaddr offset,
+unsigned size)
+{
+MSF2SysregState *s = opaque;
+offset /= 4;
+uint32_t ret = 0;
+
+if (offset < ARRAY_SIZE(s->regs)) {
+ret = s->regs[offset];
+DB_PRINT("addr: 0x%08" HWADDR_PRIx " data: 0x%08" PRIx32 "\n",
+offset * 4, ret);
+} else {
+qemu_log_mask(LOG_GUEST_ERROR,
+"%s: Bad offset 0x%08" HWADDR_PRIx "\n", __func__,
+offset * 4);
+}
+
+return ret;
+}
+
+static void msf2_sysreg_write(void *opaque, hwaddr offset,
+  uint64_t val, unsigned size)
+{
+MSF2SysregState *s = (MSF2SysregState *)opaque;
+offset /= 4;
+
+DB_PRINT("addr: 0x%08" HWADDR_PRIx " data: 0x%08" PRIx64 "\n",
+offset * 4, val);
+
+switch (offset) {
+case MSSDDR_PLL_STATUS:
+break;
+
+default:
+if (offset < ARRAY_SIZE(s->regs)) {
+s->regs[offset] = val;



I think this is pretty unsafe for the guest to continue if those registers
are accessed in your current implementation.

I'd at least somehow abort for few of them (RESET, REMAP*), what do you
think?



Ok. We can abort for REMAP. All the peripherals need to be released from
reset by writing to Sysreg.
Can we handle that case where if guest writes to Sysreg to reset SPI(say)
then we can reset SPI
model from here? I am very new here and I did not come across this case in
the models I have referred.
Please let me know if that is possible. If not possible I would simply log
messages like released
from reset/put in reset.


You are lucky if you didn't have to use the remap register :)
This can be implemented later if needed, don't get lost there now!

On Cold Reset the SF2 starts with the Cache Matrix mapping the CM3 code 
region (anything below 0x2000.) to the eNVM (which is physically 
mapped at 0x6000.). Now if you write to the REMAP reg you can 
instead map the eSRAM or the DDR space to the CM3 code, and the eNVM 
will only be accessible at 0x6000.. For the cpu the execution 
depends of the code stored at the mapped region.


If there is enough space to run your code in the eNVM, don't worry about 
modeling a remap. But now anyone can try your model, and if their 
firmware do a remap, the code flow will be completely wrong.
I think just reporting a GUEST_ERROR here is not enough since code flow 
continuing would be pretty hard to understand/debug. That's why I 
suggest to report an error and abort for now.




Thanks,
Sundeep




+} else {

+qemu_log_mask(LOG_GUEST_ERROR,
+"%s: Bad offset 0x%08" HWADDR_PRIx "\n"

Re: [Qemu-devel] [PATCH v7 2/4] utils: provide size_to_str()

2017-05-11 Thread Peter Xu
On Thu, May 11, 2017 at 02:05:26PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > Moving the algorithm from print_type_size() into size_to_str() so that
> > other component can also leverage it. With that, refactor
> > print_type_size().
> > 
> > Reviewed-by: Markus Armbruster 
> > Signed-off-by: Peter Xu 
> > ---
> >  include/qemu-common.h|  1 +
> >  qapi/string-output-visitor.c | 22 ++
> >  util/cutils.c| 26 ++
> >  3 files changed, 33 insertions(+), 16 deletions(-)
> > 
> > diff --git a/include/qemu-common.h b/include/qemu-common.h
> > index d218821..d7d0448 100644
> > --- a/include/qemu-common.h
> > +++ b/include/qemu-common.h
> > @@ -145,6 +145,7 @@ void qemu_hexdump(const char *buf, FILE *fp, const char 
> > *prefix, size_t size);
> >  int parse_debug_env(const char *name, int max, int initial);
> >  
> >  const char *qemu_ether_ntoa(const MACAddr *mac);
> > +char *size_to_str(double val);
> >  void page_size_init(void);
> >  
> >  /* returns non-zero if dump is in progress, otherwise zero is
> > diff --git a/qapi/string-output-visitor.c b/qapi/string-output-visitor.c
> > index 94ac821..53c2175 100644
> > --- a/qapi/string-output-visitor.c
> > +++ b/qapi/string-output-visitor.c
> > @@ -211,10 +211,8 @@ static void print_type_size(Visitor *v, const char 
> > *name, uint64_t *obj,
> >  Error **errp)
> >  {
> >  StringOutputVisitor *sov = to_sov(v);
> > -static const char suffixes[] = { 'B', 'K', 'M', 'G', 'T', 'P', 'E' };
> > -uint64_t div, val;
> > -char *out;
> > -int i;
> > +uint64_t val;
> > +char *out, *psize;
> >  
> >  if (!sov->human) {
> >  out = g_strdup_printf("%"PRIu64, *obj);
> > @@ -223,19 +221,11 @@ static void print_type_size(Visitor *v, const char 
> > *name, uint64_t *obj,
> >  }
> >  
> >  val = *obj;
> > -
> > -/* The exponent (returned in i) minus one gives us
> > - * floor(log2(val * 1024 / 1000).  The correction makes us
> > - * switch to the higher power when the integer part is >= 1000.
> > - */
> > -frexp(val / (1000.0 / 1024.0), &i);
> > -i = (i - 1) / 10;
> > -assert(i < ARRAY_SIZE(suffixes));
> > -div = 1ULL << (i * 10);
> > -
> > -out = g_strdup_printf("%"PRIu64" (%0.3g %c%s)", val,
> > -  (double)val/div, suffixes[i], i ? "iB" : "");
> > +psize = size_to_str(val);
> > +out = g_strdup_printf("%"PRIu64" (%s)", val, psize);
> >  string_output_set(sov, out);
> > +
> > +g_free(psize);
> >  }
> >  
> >  static void print_type_bool(Visitor *v, const char *name, bool *obj,
> > diff --git a/util/cutils.c b/util/cutils.c
> > index 50ad179..fa5ddec 100644
> > --- a/util/cutils.c
> > +++ b/util/cutils.c
> > @@ -619,3 +619,29 @@ const char *qemu_ether_ntoa(const MACAddr *mac)
> >  
> >  return ret;
> >  }
> > +
> > +/*
> > + * Return human readable string for size @val.
> > + * @val must be between (-1000Eib, 1000EiB), exclusively.
> > + * Use IEC binary units like KiB, MiB, and so forth.
> > + * Caller is responsible for passing it to g_free().
> > + */
> > +char *size_to_str(double val)
> > +{
> > +static const char *suffixes[] = { "", "Ki", "Mi", "Gi", "Ti", "Pi", 
> > "Ei" };
> > +unsigned long div;
> > +int i;
> > +
> > +/*
> > + * The exponent (returned in i) minus one gives us
> > + * floor(log2(val * 1024 / 1000).  The correction makes us
> > + * switch to the higher power when the integer part is >= 1000.
> > + * (see e41b509d68afb1f for more info)
> > + */
> > +frexp(val / (1000.0 / 1024.0), &i);
> > +i = (i - 1) / 10;
> > +assert(i < ARRAY_SIZE(suffixes));
> 
> Because your code takes a double, where as Paolo's only takes uint's,
> I think you have to be careful to check that i >= 0  as well,  for
> example, what would happen if someone did size_to_str(1.0/100.0) ?

Yes, you are right.

Let me stick to uint64_t then, after all double helps little in dump
sizes... /me chose an unwise parameter type.

And then it's safe to directly remove the assert() in this patch,
because even UINT64_MAX is only 16 EiB. Then I can further drop the
last patch...

I'll resend. Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()

2017-05-11 Thread Julia Lawall
Hello,

I don't think I have seen earlier versions of this script.  Are you
proposing it to be added to the kernel?  If so, it should be put in an
appropriate subdirectory of Coccinelle.

Overall, could you explain at a high level what it is intended to do?  It
uses rather heavily regular expressions and python code, so I wonder if
this is the best way to do it.

thanks,
julia

On Fri, 12 May 2017, Philippe Mathieu-Daudé wrote:

> If you have coccinelle installed you can apply this script using:
>
> $ spatch \
> --macro-file scripts/cocci-macro-file.h \
> --dir target --in-place
>
> You can also use directly Peter Senna Tschudin docker image (easier):
>
> $ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \
> --sp-file scripts/coccinelle/tcg_gen_extract.cocci \
> --macro-file scripts/cocci-macro-file.h \
> --dir target --in-place
>
> Then verified that no manual touchups are required.
>
> The following thread was helpful while writing this script:
>
> https://github.com/coccinelle/coccinelle/issues/86
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  scripts/coccinelle/tcg_gen_extract.cocci | 71 
> 
>  1 file changed, 71 insertions(+)
>  create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci
>
> diff --git a/scripts/coccinelle/tcg_gen_extract.cocci 
> b/scripts/coccinelle/tcg_gen_extract.cocci
> new file mode 100644
> index 00..4823073005
> --- /dev/null
> +++ b/scripts/coccinelle/tcg_gen_extract.cocci
> @@ -0,0 +1,71 @@
> +// optimize TCG using extract op
> +//
> +// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+.
> +// Confidence: High
> +// Options: --macro-file scripts/cocci-macro-file.h
> +//
> +// Nikunj A Dadhania optimization:
> +// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
> +// Aurelien Jarno optimization:
> +// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
> +// Coccinelle helpful issue:
> +// https://github.com/coccinelle/coccinelle/issues/86
> +
> +@match@ // match shri*+andi* pattern, calls script verify_len
> +identifier ret, arg;
> +constant ofs, len;
> +identifier shr_fn =~ "^tcg_gen_shri_";
> +identifier and_fn =~ "^tcg_gen_andi_";
> +position shr_p;
> +position and_p;
> +@@
> +(
> +shr_fn@shr_p(ret, arg, ofs);
> +and_fn@and_p(ret, ret, len);
> +)
> +
> +@script:python verify_len@
> +ret_s << match.ret;
> +len_s << match.len;
> +shr_s << match.shr_fn;
> +and_s << match.and_fn;
> +shr_p << match.shr_p;
> +extract_fn;
> +@@
> +print "candidate at %s:%s" % (shr_p[0].file, shr_p[0].line)
> +len_fn=len("tcg_gen_shri_")
> +shr_sz=shr_s[len_fn:]
> +and_sz=and_s[len_fn:]
> +# TODO: op_size shr +is_same_op_size = shr_sz == and_sz
> +print "  op_size: %s/%s (%s)" % (shr_sz, and_sz, "same" if is_same_op_size 
> else "DIFFERENT")
> +is_optimizable = False
> +if is_same_op_size:
> +try: # only eval integer, no #define like 'SR_M' (cpp did this, else 
> some headers are missing).
> +len_v = long(len_s.strip("UL"), 0)
> +low_bits = 0
> +while (len_v & (1 << low_bits)):
> +low_bits += 1
> +print "  low_bits:", low_bits, "(value: 0x%x)" % ((1 << low_bits) - 
> 1)
> +print "  len: 0x%x" % len_v
> +is_optimizable = ((1 << low_bits) - 1) == len_v # check low_bits
> +print "  len_bits %s= low_bits" % ("=" if is_optimizable else "!")
> +print "  candidate", "IS" if is_optimizable else "is NOT", 
> "optimizable"
> +coccinelle.extract_fn = "tcg_gen_extract_" + and_sz
> +except:
> +print "  ERROR (check included headers?)"
> +cocci.include_match(is_optimizable)
> +print
> +
> +@replacement depends on verify_len@
> +identifier match.ret, match.arg;
> +constant match.ofs, match.len;
> +identifier match.shr_fn;
> +identifier match.and_fn;
> +position match.shr_p;
> +position match.and_p;
> +identifier verify_len.extract_fn;
> +@@
> +-shr_fn@shr_p(ret, arg, ofs);
> +-and_fn@and_p(ret, ret, len);
> ++extract_fn(ret, arg, ofs, len);
> --
> 2.11.0
>


Re: [Qemu-devel] [Qemu-devel PATCH 3/5] msf2: Add Smartfusion2 SPI controller

2017-05-11 Thread Philippe Mathieu-Daudé

Hi Subbaraya,

On 05/12/2017 12:31 AM, sundeep subbaraya wrote:

Hi Philippe,

On Wed, May 10, 2017 at 5:42 PM, Philippe Mathieu-Daudé 
wrote:


Hi Subbaraya,

Like my comment for the timer model, I'd name this model "mss_spi".
The only difference I see in the SF2 is the STAT8 register.
No need to register both devices now but maybe you can add a comment about
it?



Ok I will register only SPI0 in SoC file as of now and will add comment.



What I mean is the difference I know between mss_spi from SmartFusion 
versus mss_spi from the SmartFusion2 is the STAT8 register.

Your file is SmartFusion2-specific just because of this unused register.

Now since you have the mss_spi modeled and you want to model the SF2 
SoC, you should register both SPI_0 and SPI_1, my comment was not about 
that.


Well forget about this comment about STAT8 difference :) This is not a 
problem and can be improved in the future.




On 05/09/2017 01:44 PM, Subbaraya Sundeep wrote:


Modelled Microsemi's Smartfusion2 SPI controller.

Signed-off-by: Subbaraya Sundeep 
---
 hw/ssi/Makefile.objs  |   1 +
 hw/ssi/msf2-spi.c | 378 ++

 include/hw/ssi/msf2-spi.h | 105 +
 3 files changed, 484 insertions(+)
 create mode 100644 hw/ssi/msf2-spi.c
 create mode 100644 include/hw/ssi/msf2-spi.h

diff --git a/hw/ssi/Makefile.objs b/hw/ssi/Makefile.objs
index 487add2..3105c4b 100644
--- a/hw/ssi/Makefile.objs
+++ b/hw/ssi/Makefile.objs
@@ -4,6 +4,7 @@ common-obj-$(CONFIG_XILINX_SPI) += xilinx_spi.o
 common-obj-$(CONFIG_XILINX_SPIPS) += xilinx_spips.o
 common-obj-$(CONFIG_ASPEED_SOC) += aspeed_smc.o
 common-obj-$(CONFIG_STM32F2XX_SPI) += stm32f2xx_spi.o
+common-obj-$(CONFIG_MSF2) += msf2-spi.o



Not a big deal but his define only appears after applying the next patch.


Do I need to reorder the patches?



It is probably not important, but for what it's worth if you checkout 
your tree at this patch 3/5 and build, the msf2-spi.o is not compiled.


Although I'm not sure it could bother bissecting an issue in your code.




 obj-$(CONFIG_OMAP) += omap_spi.o
 obj-$(CONFIG_IMX) += imx_spi.o
diff --git a/hw/ssi/msf2-spi.c b/hw/ssi/msf2-spi.c
new file mode 100644
index 000..2059ed9
--- /dev/null
+++ b/hw/ssi/msf2-spi.c
@@ -0,0 +1,378 @@
+/*
+ * SPI controller model of Microsemi SmartFusion2.
+ *
+ * Copyright (C) 2017 Subbaraya Sundeep 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining
a copy
+ * of this software and associated documentation files (the "Software"),
to deal
+ * in the Software without restriction, including without limitation the
rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/ssi/msf2-spi.h"
+
+#ifndef MSF2_SPI_ERR_DEBUG
+#define MSF2_SPI_ERR_DEBUG   0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (MSF2_SPI_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static void txfifo_reset(MSF2SpiState *s)
+{
+fifo32_reset(&s->tx_fifo);
+
+s->regs[R_SPI_STATUS] &= ~S_TXFIFOFUL;
+s->regs[R_SPI_STATUS] |= S_TXFIFOEMP;
+}
+
+static void rxfifo_reset(MSF2SpiState *s)
+{
+fifo32_reset(&s->rx_fifo);
+
+s->regs[R_SPI_STATUS] &= ~S_RXFIFOFUL;
+s->regs[R_SPI_STATUS] |= S_RXFIFOEMP;
+}
+
+static void set_fifodepth(MSF2SpiState *s)
+{
+int size = s->regs[R_SPI_DFSIZE] & FRAMESZ_MASK;
+
+if (0 <= size && size <= 8) {
+s->fifo_depth = 32;
+}
+if (9 <= size && size <= 16) {
+s->fifo_depth = 16;
+}
+if (17 <= size && size <= 32) {
+s->fifo_depth = 8;
+}
+}
+
+static void msf2_spi_do_reset(MSF2SpiState *s)
+{
+memset(s->regs, 0, sizeof s->regs);
+s->regs[R_SPI_CONTROL] = 0x8102;
+s->regs[R_SPI_DFSIZE] = 0x4;
+s->regs[R_SPI_STATUS] = 0x2440;
+s->regs[R_SPI_CLKGEN] = 0x7;
+s->regs[R_SPI_STAT8] = 0x7;
+s->regs[R_SPI_RIS] = 0x0;
+
+s->fifo_depth = 4;
+s->frame_count = 1;
+s->enabled = false;
+
+rxfifo_reset(s);
+txfifo_reset(s);
+}
+
+static void update_mis(MSF2SpiState *s)
+{
+uint32_t reg = s->reg

Re: [Qemu-devel] [Qemu-devel PATCH 2/5] msf2: Microsemi Smartfusion2 System Register block.

2017-05-11 Thread sundeep subbaraya
Hi Phiilippe,

On Wed, May 10, 2017 at 4:04 PM, Philippe Mathieu-Daudé 
wrote:

> Hi Subbaraya,
>
>
> On 05/09/2017 01:44 PM, Subbaraya Sundeep wrote:
>
>> Added Sytem register block of Smartfusion2.
>> This block has PLL registers which are accessed by guest.
>>
>> Signed-off-by: Subbaraya Sundeep 
>> ---
>>  hw/misc/Makefile.objs |   1 +
>>  hw/misc/msf2-sysreg.c | 131 ++
>> 
>>  include/hw/misc/msf2-sysreg.h |  80 ++
>>  3 files changed, 212 insertions(+)
>>  create mode 100644 hw/misc/msf2-sysreg.c
>>  create mode 100644 include/hw/misc/msf2-sysreg.h
>>
>> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
>> index c8b4893..0f52354 100644
>> --- a/hw/misc/Makefile.objs
>> +++ b/hw/misc/Makefile.objs
>> @@ -56,3 +56,4 @@ obj-$(CONFIG_EDU) += edu.o
>>  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
>>  obj-$(CONFIG_AUX) += auxbus.o
>>  obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
>> +obj-$(CONFIG_MSF2) += msf2-sysreg.o
>> diff --git a/hw/misc/msf2-sysreg.c b/hw/misc/msf2-sysreg.c
>> new file mode 100644
>> index 000..53e9cba
>> --- /dev/null
>> +++ b/hw/misc/msf2-sysreg.c
>> @@ -0,0 +1,131 @@
>> +/*
>> + * System Register block model of Microsemi SmartFusion2.
>> + *
>> + * Copyright (c) 2017 Subbaraya Sundeep 
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> along
>> + * with this program; if not, see .
>> + */
>> +
>> +#include "hw/misc/msf2-sysreg.h"
>> +
>> +#ifndef MSF2_SYSREG_ERR_DEBUG
>> +#define MSF2_SYSREG_ERR_DEBUG  0
>> +#endif
>> +
>> +#define DB_PRINT_L(lvl, fmt, args...) do { \
>> +if (MSF2_SYSREG_ERR_DEBUG >= lvl) { \
>> +qemu_log("%s: " fmt, __func__, ## args); \
>> +} \
>> +} while (0);
>> +
>> +#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
>> +
>> +static void msf2_sysreg_reset(DeviceState *d)
>> +{
>> +MSF2SysregState *s = MSF2_SYSREG(d);
>> +
>> +DB_PRINT("RESET\n");
>> +
>> +s->regs[MSSDDR_PLL_STATUS_LOW_CR] = 0x02420041;
>> +s->regs[MSSDDR_FACC1_CR] = 0x0A482124;
>> +s->regs[MSSDDR_PLL_STATUS] = 0x3;
>> +}
>> +
>> +static uint64_t msf2_sysreg_read(void *opaque, hwaddr offset,
>> +unsigned size)
>> +{
>> +MSF2SysregState *s = opaque;
>> +offset /= 4;
>> +uint32_t ret = 0;
>> +
>> +if (offset < ARRAY_SIZE(s->regs)) {
>> +ret = s->regs[offset];
>> +DB_PRINT("addr: 0x%08" HWADDR_PRIx " data: 0x%08" PRIx32 "\n",
>> +offset * 4, ret);
>> +} else {
>> +qemu_log_mask(LOG_GUEST_ERROR,
>> +"%s: Bad offset 0x%08" HWADDR_PRIx "\n", __func__,
>> +offset * 4);
>> +}
>> +
>> +return ret;
>> +}
>> +
>> +static void msf2_sysreg_write(void *opaque, hwaddr offset,
>> +  uint64_t val, unsigned size)
>> +{
>> +MSF2SysregState *s = (MSF2SysregState *)opaque;
>> +offset /= 4;
>> +
>> +DB_PRINT("addr: 0x%08" HWADDR_PRIx " data: 0x%08" PRIx64 "\n",
>> +offset * 4, val);
>> +
>> +switch (offset) {
>> +case MSSDDR_PLL_STATUS:
>> +break;
>> +
>> +default:
>> +if (offset < ARRAY_SIZE(s->regs)) {
>> +s->regs[offset] = val;
>>
>
> I think this is pretty unsafe for the guest to continue if those registers
> are accessed in your current implementation.
>
> I'd at least somehow abort for few of them (RESET, REMAP*), what do you
> think?


Ok. We can abort for REMAP. All the peripherals need to be released from
reset by writing to Sysreg.
Can we handle that case where if guest writes to Sysreg to reset SPI(say)
then we can reset SPI
model from here? I am very new here and I did not come across this case in
the models I have referred.
Please let me know if that is possible. If not possible I would simply log
messages like released
from reset/put in reset.

Thanks,
Sundeep

>
>
> +} else {
>> +qemu_log_mask(LOG_GUEST_ERROR,
>> +"%s: Bad offset 0x%08" HWADDR_PRIx "\n",
>> __func__,
>> +offset * 4);
>> +}
>> +break;
>> +}
>> +}
>> +
>> +static const MemoryRegionOps sysreg_ops = {
>> +.read = msf2_sysreg_read,
>> +.write = msf2_sysreg_write,
>> +.endianness = DEVICE_NATIVE_ENDIAN,
>> +};
>> +
>> +static void msf2_sysreg_init(Object *obj)
>> +{
>> +MSF2SysregState *s = MSF2_SYSREG(obj);
>> +
>> +memory_region_init_io(&s->iomem, obj, &sysreg_ops, s,
>> TYPE_MSF2_SYSREG,
>> +  MSF2_SYSREG_MMIO_SIZE);
>> +sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->iomem);
>> +}
>> +
>> +static const VMStateDescription vmstate_m

Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-05-11 Thread Peter Xu
On Thu, May 11, 2017 at 06:32:27PM +0200, Juan Quintela wrote:
> We have change in the previous patch to use migration capabilities for
> it.  Notice that we continue using the old command line flags from
> migrate command from the time being.  Remove the set_params method as
> now it is empty.
> 
> Signed-off-by: Juan Quintela 
> ---
>  include/migration/migration.h |  3 +--
>  migration/block.c | 17 ++---
>  migration/colo.c  |  5 +++--
>  migration/migration.c |  8 +---
>  migration/savevm.c|  2 --
>  5 files changed, 11 insertions(+), 24 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 30c2913..2d5525c 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -39,8 +39,7 @@
>  #define QEMU_VM_SECTION_FOOTER   0x7e
>  
>  struct MigrationParams {
> -bool blk;
> -bool shared;
> +bool unused; /* C doesn't allow empty structs */
>  };
>  
>  /* Messages sent on the return path from destination to source */
> diff --git a/migration/block.c b/migration/block.c
> index 060087f..fcfa823 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -94,9 +94,6 @@ typedef struct BlkMigBlock {
>  } BlkMigBlock;
>  
>  typedef struct BlkMigState {
> -/* Written during setup phase.  Can be read without a lock.  */
> -int blk_enable;
> -int shared_base;
>  QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
>  int64_t total_sector_sum;
>  bool zero_blocks;
> @@ -425,7 +422,7 @@ static int init_blk_migration(QEMUFile *f)
>  bmds->bulk_completed = 0;
>  bmds->total_sectors = sectors;
>  bmds->completed_sectors = 0;
> -bmds->shared_base = block_mig_state.shared_base;
> +bmds->shared_base = migrate_use_block_shared();
>  
>  assert(i < num_bs);
>  bmds_bs[i].bmds = bmds;
> @@ -994,22 +991,12 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>  return 0;
>  }
>  
> -static void block_set_params(const MigrationParams *params, void *opaque)
> -{
> -block_mig_state.blk_enable = params->blk;
> -block_mig_state.shared_base = params->shared;
> -
> -/* shared base means that blk_enable = 1 */
> -block_mig_state.blk_enable |= params->shared;
> -}
> -
>  static bool block_is_active(void *opaque)
>  {
> -return block_mig_state.blk_enable == 1;
> +return migrate_use_block_enabled();
>  }
>  
>  static SaveVMHandlers savevm_block_handlers = {
> -.set_params = block_set_params,
>  .save_live_setup = block_save_setup,
>  .save_live_iterate = block_save_iterate,
>  .save_live_complete_precopy = block_save_complete,
> diff --git a/migration/colo.c b/migration/colo.c
> index 963c802..e772384 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -14,6 +14,7 @@
>  #include "qemu/timer.h"
>  #include "sysemu/sysemu.h"
>  #include "migration/colo.h"
> +#include "migration/block.h"
>  #include "io/channel-buffer.h"
>  #include "trace.h"
>  #include "qemu/error-report.h"
> @@ -345,8 +346,8 @@ static int colo_do_checkpoint_transaction(MigrationState 
> *s,
>  }
>  
>  /* Disable block migration */
> -s->params.blk = 0;
> -s->params.shared = 0;
> +migrate_set_block_enabled(s, false);
> +migrate_set_block_shared(s, false);
>  qemu_savevm_state_header(fb);
>  qemu_savevm_state_begin(fb, &s->params);
>  qemu_mutex_lock_iothread();
> diff --git a/migration/migration.c b/migration/migration.c
> index 2f981aa..8a3bf89 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -787,6 +787,10 @@ void 
> qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>  s->enabled_capabilities[cap->value->capability] = cap->value->state;
>  }
>  
> +if (s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED]) {
> +s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = true;
> +}
> +

[1]

>  if (migrate_postcopy_ram()) {
>  if (migrate_use_compression()) {
>  /* The decompression threads asynchronously write into RAM
> @@ -1214,9 +1218,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  MigrationParams params;
>  const char *p;
>  
> -params.blk = has_blk && blk;
> -params.shared = has_inc && inc;
> -
>  if (migration_is_setup_or_active(s->state) ||
>  s->state == MIGRATION_STATUS_CANCELLING ||
>  s->state == MIGRATION_STATUS_COLO) {
> @@ -1239,6 +1240,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  }
>  
>  if (has_inc && inc) {
> +migrate_set_block_enabled(s, true);
>  migrate_set_block_shared(s, true);

[2]

IIUC for [1] & [2] we are solving the same problem that "shared"
depends on "enabled" bit. Would it be good to unitfy this dependency
somewhere? E.g., by changing migrate_set_block_shared() into:

void migrate_set_block_shared(Migr

[Qemu-devel] [PATCH v3 3/5] target/m68k: optimize bcd_flags() using extract op

2017-05-11 Thread Philippe Mathieu-Daudé
Patch created mechanically using Coccinelle script via:

$ spatch --macro-file scripts/cocci-macro-file.h --in-place \
--sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target

Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Laurent Vivier 
---
 target/m68k/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 9f60fbc0db..babb9e2c5b 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -1463,8 +1463,7 @@ static void bcd_flags(TCGv val)
 tcg_gen_andi_i32(QREG_CC_C, val, 0x0ff);
 tcg_gen_or_i32(QREG_CC_Z, QREG_CC_Z, QREG_CC_C);
 
-tcg_gen_shri_i32(QREG_CC_C, val, 8);
-tcg_gen_andi_i32(QREG_CC_C, QREG_CC_C, 1);
+tcg_gen_extract_i32(QREG_CC_C, val, 8, 1);
 
 tcg_gen_mov_i32(QREG_CC_X, QREG_CC_C);
 }
-- 
2.11.0




[Qemu-devel] [PATCH v3 5/5] target/sparc: optimize various functions using extract op

2017-05-11 Thread Philippe Mathieu-Daudé
Patch created mechanically using Coccinelle script via:

$ spatch --macro-file scripts/cocci-macro-file.h --in-place \
--sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/sparc/translate.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index aa6734d54e..a92b5c425c 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -380,29 +380,25 @@ static inline void gen_goto_tb(DisasContext *s, int 
tb_num,
 static inline void gen_mov_reg_N(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_NEG_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_NEG_SHIFT, 0x1);
 }
 
 static inline void gen_mov_reg_Z(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_ZERO_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_ZERO_SHIFT, 0x1);
 }
 
 static inline void gen_mov_reg_V(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_OVF_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_OVF_SHIFT, 0x1);
 }
 
 static inline void gen_mov_reg_C(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_CARRY_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_CARRY_SHIFT, 0x1);
 }
 
 static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
@@ -638,8 +634,7 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv 
src2)
 // env->y = (b2 << 31) | (env->y >> 1);
 tcg_gen_andi_tl(r_temp, cpu_cc_src, 0x1);
 tcg_gen_shli_tl(r_temp, r_temp, 31);
-tcg_gen_shri_tl(t0, cpu_y, 1);
-tcg_gen_andi_tl(t0, t0, 0x7fff);
+tcg_gen_extract_tl(t0, cpu_y, 1, 0x7fff);
 tcg_gen_or_tl(t0, t0, r_temp);
 tcg_gen_andi_tl(cpu_y, t0, 0x);
 
-- 
2.11.0




[Qemu-devel] [PATCH v3 2/5] target/arm: optimize rev16() using extract op

2017-05-11 Thread Philippe Mathieu-Daudé
Patch created mechanically using Coccinelle script via:

$ spatch --macro-file scripts/cocci-macro-file.h --in-place \
--sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/translate-a64.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d92c..7ea130107e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4038,14 +4038,12 @@ static void handle_rev16(DisasContext *s, unsigned int 
sf,
 tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0x);
 tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
 
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
+tcg_gen_extract_i64(tcg_tmp, tcg_rn, 16, 0x);
 tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
 tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
 
 if (sf) {
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
+tcg_gen_extract_i64(tcg_tmp, tcg_rn, 32, 0x);
 tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
 tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
 
-- 
2.11.0




[Qemu-devel] [PATCH v3 4/5] target/ppc: using various functions using extract op

2017-05-11 Thread Philippe Mathieu-Daudé
Patch created mechanically using Coccinelle script via:

$ spatch --macro-file scripts/cocci-macro-file.h --in-place \
--sp-file scripts/coccinelle/tcg_gen_extract.cocci --dir target

Signed-off-by: Philippe Mathieu-Daudé 
---

David I did not add your Reviewed-by as suggested by Laurent Vivier after
Nikunj A Dadhania review.

 target/ppc/translate.c  |  9 +++--
 target/ppc/translate/vsx-impl.inc.c | 15 +--
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f40b5a1abf..64ab412bf3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -868,8 +868,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv 
ret, TCGv arg1,
 }
 tcg_gen_xor_tl(cpu_ca, t0, t1);/* bits changed w/ carry */
 tcg_temp_free(t1);
-tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);   /* extract bit 32 */
-tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
 if (is_isa300(ctx)) {
 tcg_gen_mov_tl(cpu_ca32, cpu_ca);
 }
@@ -1399,8 +1398,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 tcg_temp_free(inv1);
 tcg_gen_xor_tl(cpu_ca, t0, t1); /* bits changes w/ carry */
 tcg_temp_free(t1);
-tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);/* extract bit 32 */
-tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
 if (is_isa300(ctx)) {
 tcg_gen_mov_tl(cpu_ca32, cpu_ca);
 }
@@ -5383,8 +5381,7 @@ static void gen_mfsri(DisasContext *ctx)
 CHK_SV;
 t0 = tcg_temp_new();
 gen_addr_reg_index(ctx, t0);
-tcg_gen_shri_tl(t0, t0, 28);
-tcg_gen_andi_tl(t0, t0, 0xF);
+tcg_gen_extract_tl(t0, t0, 28, 0xF);
 gen_helper_load_sr(cpu_gpr[rd], cpu_env, t0);
 tcg_temp_free(t0);
 if (ra != 0 && ra != rd)
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 7f12908029..9faffd2ddc 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1262,8 +1262,7 @@ static void gen_xsxexpqp(DisasContext *ctx)
 gen_exception(ctx, POWERPC_EXCP_VSXU);
 return;
 }
-tcg_gen_shri_i64(xth, xbh, 48);
-tcg_gen_andi_i64(xth, xth, 0x7FFF);
+tcg_gen_extract_i64(xth, xbh, 48, 0x7FFF);
 tcg_gen_movi_i64(xtl, 0);
 }
 
@@ -1448,10 +1447,8 @@ static void gen_xvxexpdp(DisasContext *ctx)
 gen_exception(ctx, POWERPC_EXCP_VSXU);
 return;
 }
-tcg_gen_shri_i64(xth, xbh, 52);
-tcg_gen_andi_i64(xth, xth, 0x7FF);
-tcg_gen_shri_i64(xtl, xbl, 52);
-tcg_gen_andi_i64(xtl, xtl, 0x7FF);
+tcg_gen_extract_i64(xth, xbh, 52, 0x7FF);
+tcg_gen_extract_i64(xtl, xbl, 52, 0x7FF);
 }
 
 GEN_VSX_HELPER_2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300)
@@ -1474,16 +1471,14 @@ static void gen_xvxsigdp(DisasContext *ctx)
 zr = tcg_const_i64(0);
 nan = tcg_const_i64(2047);
 
-tcg_gen_shri_i64(exp, xbh, 52);
-tcg_gen_andi_i64(exp, exp, 0x7FF);
+tcg_gen_extract_i64(exp, xbh, 52, 0x7FF);
 tcg_gen_movi_i64(t0, 0x0010);
 tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
 tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
 tcg_gen_andi_i64(xth, xbh, 0x000F);
 tcg_gen_or_i64(xth, xth, t0);
 
-tcg_gen_shri_i64(exp, xbl, 52);
-tcg_gen_andi_i64(exp, exp, 0x7FF);
+tcg_gen_extract_i64(exp, xbl, 52, 0x7FF);
 tcg_gen_movi_i64(t0, 0x0010);
 tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
 tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
-- 
2.11.0




[Qemu-devel] [RFC PATCH v3 1/5] coccinelle: add a script to optimize tcg op using tcg_gen_extract()

2017-05-11 Thread Philippe Mathieu-Daudé
If you have coccinelle installed you can apply this script using:

$ spatch \
--macro-file scripts/cocci-macro-file.h \
--dir target --in-place

You can also use directly Peter Senna Tschudin docker image (easier):

$ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \
--sp-file scripts/coccinelle/tcg_gen_extract.cocci \
--macro-file scripts/cocci-macro-file.h \
--dir target --in-place

Then verified that no manual touchups are required.

The following thread was helpful while writing this script:

https://github.com/coccinelle/coccinelle/issues/86

Signed-off-by: Philippe Mathieu-Daudé 
---
 scripts/coccinelle/tcg_gen_extract.cocci | 71 
 1 file changed, 71 insertions(+)
 create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci

diff --git a/scripts/coccinelle/tcg_gen_extract.cocci 
b/scripts/coccinelle/tcg_gen_extract.cocci
new file mode 100644
index 00..4823073005
--- /dev/null
+++ b/scripts/coccinelle/tcg_gen_extract.cocci
@@ -0,0 +1,71 @@
+// optimize TCG using extract op
+//
+// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+.
+// Confidence: High
+// Options: --macro-file scripts/cocci-macro-file.h
+//
+// Nikunj A Dadhania optimization:
+// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
+// Aurelien Jarno optimization:
+// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
+// Coccinelle helpful issue:
+// https://github.com/coccinelle/coccinelle/issues/86
+
+@match@ // match shri*+andi* pattern, calls script verify_len
+identifier ret, arg;
+constant ofs, len;
+identifier shr_fn =~ "^tcg_gen_shri_";
+identifier and_fn =~ "^tcg_gen_andi_";
+position shr_p;
+position and_p;
+@@
+(
+shr_fn@shr_p(ret, arg, ofs);
+and_fn@and_p(ret, ret, len);
+)
+
+@script:python verify_len@
+ret_s << match.ret;
+len_s << match.len;
+shr_s << match.shr_fn;
+and_s << match.and_fn;
+shr_p << match.shr_p;
+extract_fn;
+@@
+print "candidate at %s:%s" % (shr_p[0].file, shr_p[0].line)
+len_fn=len("tcg_gen_shri_")
+shr_sz=shr_s[len_fn:]
+and_sz=and_s[len_fn:]
+# TODO: op_size shr

[Qemu-devel] [PATCH v3 0/5] optimize various tcg_gen() functions using extract op

2017-05-11 Thread Philippe Mathieu-Daudé
Changes from v1:

In my first attempt I misunderstood tcg_gen_extract() intrinsics, and Richard
Henderson pointed that out.
In this patchset the cocci script is corrected and clarified, it also print how
arguments are checked while running.
Also:
- incorrect patches have been removed. (Richard Henderson, Nikunj A Dadhania)
- Coccinelle script licensed GPLv2+ (Eric Blake)
- comment in each commit about how to apply the patch (Eric Blake)
- added Acked-by for m68k (Laurent Vivier)
- Cc: Coccinelle developers.

[v1]

While reviewing a commit from Aurelien Jarno where he optimized a TCG generator
for SH-4 [1] I found the same optimization done on PPC by Nikunj A Dadhania few
months ago [2].
After asking on the ML about a cocci script [3] I thought it would be easier to
learn about Coccinelle.

citing Aurelien Jarno:
This doesn't change the generated code on x86, but optimizes it on most
RISC architectures and makes the code simpler to read.

I actually applied the script using the following command:

$ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \
--sp-file scripts/coccinelle/tcg_gen_extract.cocci \
--macro-file scripts/cocci-macro-file.h \
--dir target \
--in-place

Please review, thanks.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
[3] http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01499.html

Philippe Mathieu-Daudé (5):
  coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  target/arm: optimize rev16() using extract op
  target/m68k: optimize bcd_flags() using extract op
  target/ppc: using various functions using extract op
  target/sparc: optimize various functions using extract op

 scripts/coccinelle/tcg_gen_extract.cocci | 69 
 target/arm/translate-a64.c   |  6 +--
 target/m68k/translate.c  |  3 +-
 target/ppc/translate.c   |  9 ++---
 target/ppc/translate/vsx-impl.inc.c  | 15 +++
 target/sparc/translate.c | 15 +++
 6 files changed, 85 insertions(+), 32 deletions(-)
 create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci

-- 
2.11.0




Re: [Qemu-devel] [Qemu-devel PATCH 3/5] msf2: Add Smartfusion2 SPI controller

2017-05-11 Thread sundeep subbaraya
Hi Philippe,

On Wed, May 10, 2017 at 5:42 PM, Philippe Mathieu-Daudé 
wrote:

> Hi Subbaraya,
>
> Like my comment for the timer model, I'd name this model "mss_spi".
> The only difference I see in the SF2 is the STAT8 register.
> No need to register both devices now but maybe you can add a comment about
> it?
>

Ok I will register only SPI0 in SoC file as of now and will add comment.

>
> On 05/09/2017 01:44 PM, Subbaraya Sundeep wrote:
>
>> Modelled Microsemi's Smartfusion2 SPI controller.
>>
>> Signed-off-by: Subbaraya Sundeep 
>> ---
>>  hw/ssi/Makefile.objs  |   1 +
>>  hw/ssi/msf2-spi.c | 378 ++
>> 
>>  include/hw/ssi/msf2-spi.h | 105 +
>>  3 files changed, 484 insertions(+)
>>  create mode 100644 hw/ssi/msf2-spi.c
>>  create mode 100644 include/hw/ssi/msf2-spi.h
>>
>> diff --git a/hw/ssi/Makefile.objs b/hw/ssi/Makefile.objs
>> index 487add2..3105c4b 100644
>> --- a/hw/ssi/Makefile.objs
>> +++ b/hw/ssi/Makefile.objs
>> @@ -4,6 +4,7 @@ common-obj-$(CONFIG_XILINX_SPI) += xilinx_spi.o
>>  common-obj-$(CONFIG_XILINX_SPIPS) += xilinx_spips.o
>>  common-obj-$(CONFIG_ASPEED_SOC) += aspeed_smc.o
>>  common-obj-$(CONFIG_STM32F2XX_SPI) += stm32f2xx_spi.o
>> +common-obj-$(CONFIG_MSF2) += msf2-spi.o
>>
>
> Not a big deal but his define only appears after applying the next patch.

Do I need to reorder the patches?

>
>>  obj-$(CONFIG_OMAP) += omap_spi.o
>>  obj-$(CONFIG_IMX) += imx_spi.o
>> diff --git a/hw/ssi/msf2-spi.c b/hw/ssi/msf2-spi.c
>> new file mode 100644
>> index 000..2059ed9
>> --- /dev/null
>> +++ b/hw/ssi/msf2-spi.c
>> @@ -0,0 +1,378 @@
>> +/*
>> + * SPI controller model of Microsemi SmartFusion2.
>> + *
>> + * Copyright (C) 2017 Subbaraya Sundeep 
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining
>> a copy
>> + * of this software and associated documentation files (the "Software"),
>> to deal
>> + * in the Software without restriction, including without limitation the
>> rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or
>> sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "hw/ssi/msf2-spi.h"
>> +
>> +#ifndef MSF2_SPI_ERR_DEBUG
>> +#define MSF2_SPI_ERR_DEBUG   0
>> +#endif
>> +
>> +#define DB_PRINT_L(lvl, fmt, args...) do { \
>> +if (MSF2_SPI_ERR_DEBUG >= lvl) { \
>> +qemu_log("%s: " fmt, __func__, ## args); \
>> +} \
>> +} while (0);
>> +
>> +#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
>> +
>> +static void txfifo_reset(MSF2SpiState *s)
>> +{
>> +fifo32_reset(&s->tx_fifo);
>> +
>> +s->regs[R_SPI_STATUS] &= ~S_TXFIFOFUL;
>> +s->regs[R_SPI_STATUS] |= S_TXFIFOEMP;
>> +}
>> +
>> +static void rxfifo_reset(MSF2SpiState *s)
>> +{
>> +fifo32_reset(&s->rx_fifo);
>> +
>> +s->regs[R_SPI_STATUS] &= ~S_RXFIFOFUL;
>> +s->regs[R_SPI_STATUS] |= S_RXFIFOEMP;
>> +}
>> +
>> +static void set_fifodepth(MSF2SpiState *s)
>> +{
>> +int size = s->regs[R_SPI_DFSIZE] & FRAMESZ_MASK;
>> +
>> +if (0 <= size && size <= 8) {
>> +s->fifo_depth = 32;
>> +}
>> +if (9 <= size && size <= 16) {
>> +s->fifo_depth = 16;
>> +}
>> +if (17 <= size && size <= 32) {
>> +s->fifo_depth = 8;
>> +}
>> +}
>> +
>> +static void msf2_spi_do_reset(MSF2SpiState *s)
>> +{
>> +memset(s->regs, 0, sizeof s->regs);
>> +s->regs[R_SPI_CONTROL] = 0x8102;
>> +s->regs[R_SPI_DFSIZE] = 0x4;
>> +s->regs[R_SPI_STATUS] = 0x2440;
>> +s->regs[R_SPI_CLKGEN] = 0x7;
>> +s->regs[R_SPI_STAT8] = 0x7;
>> +s->regs[R_SPI_RIS] = 0x0;
>> +
>> +s->fifo_depth = 4;
>> +s->frame_count = 1;
>> +s->enabled = false;
>> +
>> +rxfifo_reset(s);
>> +txfifo_reset(s);
>> +}
>> +
>> +static void update_mis(MSF2SpiState *s)
>> +{
>> +uint32_t reg = s->regs[R_SPI_CONTROL];
>> +uint32_t tmp;
>> +
>> +/*
>> + * form the Control register interrupt enable bits
>> + * same as RIS, MIS and Interrupt clear registers for simplicity
>> + */
>> +tmp = ((reg & C_INTRXOVRFLO) >> 4) | ((reg & C_INTRXDATA) >> 3) |
>> +   ((reg & C_INTTXDATA) >> 5);
>> +s->regs[R_SPI_MIS] |= tmp & s->reg

Re: [Qemu-devel] [Qemu-devel PATCH 4/5] msf2: Add Smartfusion2 SoC.

2017-05-11 Thread sundeep subbaraya
Hi Philippe,

On Wed, May 10, 2017 at 5:20 PM, Philippe Mathieu-Daudé 
wrote:

> Hi Subbaraya,
>
>
> On 05/09/2017 01:44 PM, Subbaraya Sundeep wrote:
>
>> Smartfusion2 SoC has hardened Microcontroller subsystem
>> and flash based FPGA fabric. This patch adds support for
>> Microcontroller subsystem in the SoC.
>>
>> Signed-off-by: Subbaraya Sundeep 
>> ---
>>  default-configs/arm-softmmu.mak |   1 +
>>  hw/arm/Makefile.objs|   2 +-
>>  hw/arm/msf2-soc.c   | 188 ++
>> ++
>>  include/hw/arm/msf2-soc.h   |  60 +
>>  4 files changed, 250 insertions(+), 1 deletion(-)
>>  create mode 100644 hw/arm/msf2-soc.c
>>  create mode 100644 include/hw/arm/msf2-soc.h
>>
>> diff --git a/default-configs/arm-softmmu.mak
>> b/default-configs/arm-softmmu.mak
>> index 78d7af0..7062512 100644
>> --- a/default-configs/arm-softmmu.mak
>> +++ b/default-configs/arm-softmmu.mak
>> @@ -122,3 +122,4 @@ CONFIG_ACPI=y
>>  CONFIG_SMBIOS=y
>>  CONFIG_ASPEED_SOC=y
>>  CONFIG_GPIO_KEY=y
>> +CONFIG_MSF2=y
>> diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
>> index 4c5c4ee..ae5e4a3 100644
>> --- a/hw/arm/Makefile.objs
>> +++ b/hw/arm/Makefile.objs
>> @@ -1,7 +1,7 @@
>>  obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
>>  obj-$(CONFIG_DIGIC) += digic_boards.o
>>  obj-y += integratorcp.o mainstone.o musicpal.o nseries.o
>> -obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
>> +obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o msf2-soc.o
>>
>
> Not a big deal, but since you added CONFIG_MSF2 why not using it here and
> the Makefiles you touched (misc/ssi/timer)?
>
> obj-$(CONFIG_MSF2) += msf2-soc.o
>
>   OK. Will change it.

>
>  obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
>>  obj-$(CONFIG_ACPI) += virt-acpi-build.o
>>  obj-y += netduino2.o
>> diff --git a/hw/arm/msf2-soc.c b/hw/arm/msf2-soc.c
>> new file mode 100644
>> index 000..d6341a2
>> --- /dev/null
>> +++ b/hw/arm/msf2-soc.c
>> @@ -0,0 +1,188 @@
>> +/*
>> + * SmartFusion2 SoC emulation.
>> + *
>> + * Copyright (c) 2017 Subbaraya Sundeep 
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining
>> a copy
>> + * of this software and associated documentation files (the "Software"),
>> to deal
>> + * in the Software without restriction, including without limitation the
>> rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or
>> sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qapi/error.h"
>> +#include "qemu-common.h"
>> +#include "hw/arm/arm.h"
>> +#include "exec/address-spaces.h"
>> +#include "hw/char/serial.h"
>> +#include "hw/boards.h"
>> +#include "sysemu/block-backend.h"
>> +#include "hw/arm/msf2-soc.h"
>> +
>> +#define MSF2_TIMER_BASE   0x40004000
>> +#define MSF2_SYSREG_BASE  0x40038000
>> +
>> +#define MSF2_TIMER_IRQ0   14
>> +#define MSF2_TIMER_IRQ1   15
>> +
>> +static const uint32_t spi_addr[MSF2_NUM_SPIS] = { 0x40001000 ,
>> 0x40011000 };
>> +static const uint32_t uart_addr[MSF2_NUM_UARTS] = { 0x4000 ,
>> 0x4001 };
>> +
>> +static const int spi_irq[MSF2_NUM_SPIS] = { 2, 3 };
>> +static const int uart_irq[MSF2_NUM_UARTS] = { 10, 11 };
>> +
>> +static void msf2_soc_initfn(Object *obj)
>> +{
>> +MSF2State *s = MSF2_SOC(obj);
>> +int i;
>> +
>> +object_initialize(&s->armv7m, sizeof(s->armv7m), TYPE_ARMV7M);
>> +qdev_set_parent_bus(DEVICE(&s->armv7m), sysbus_get_default());
>> +
>> +object_initialize(&s->sysreg, sizeof(s->sysreg), TYPE_MSF2_SYSREG);
>> +qdev_set_parent_bus(DEVICE(&s->sysreg), sysbus_get_default());
>> +
>> +object_initialize(&s->timer, sizeof(s->timer), TYPE_MSF2_TIMER);
>> +qdev_set_parent_bus(DEVICE(&s->timer), sysbus_get_default());
>> +
>> +for (i = 0; i < MSF2_NUM_SPIS; i++) {
>> +object_initialize(&s->spi[i], sizeof(s->spi[i]),
>> +  TYPE_MSF2_SPI);
>> +qdev_set_parent_bus(DEVICE(&s->spi[i]), sysbus_get_default());
>> +}
>> +}
>> +
>> +static void msf2_soc_realize(DeviceState *dev_soc, Error **errp)
>> +{
>> +MSF2State *s = MSF2_SOC(dev_soc)

Re: [Qemu-devel] [PATCH] ram: Rename RAM_SAVE_FLAG_COMPRESS to RAM_SAVE_FLAG_ZERO

2017-05-11 Thread Peter Xu
On Thu, May 11, 2017 at 05:50:28PM +0200, Juan Quintela wrote:
> Reflects better what it does now, and avoid confussions with
> RAM_SAVE_FLAG_COMPRESS_PAGE.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Peter Xu 

> 
> ---
> 
> Hi
> 
> I always forgot the diffe9rent between COMPRESS and COMPRESS_PAGE.
> This patch makes it clean which is which.
> 
> Please, comment.
> 
> Later, Juan.
> 
> 
> ---
>  migration/ram.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 995d1fc..76c118c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -48,8 +48,14 @@
>  /***/
>  /* ram save/restore */
>  
> +/* RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it
> + * worked for pages that where filled with the same char.  We switched
> + * it to only search for the zero value.  And to avoid confusion with
> + * RAM_SSAVE_FLAG_COMPRESS_PAGE just rename it.
> + */
> +
>  #define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
> -#define RAM_SAVE_FLAG_COMPRESS 0x02
> +#define RAM_SAVE_FLAG_ZERO 0x02
>  #define RAM_SAVE_FLAG_MEM_SIZE 0x04
>  #define RAM_SAVE_FLAG_PAGE 0x08
>  #define RAM_SAVE_FLAG_EOS  0x10
> @@ -746,7 +752,7 @@ static int save_zero_page(RAMState *rs, RAMBlock *block, 
> ram_addr_t offset,
>  if (is_zero_range(p, TARGET_PAGE_SIZE)) {
>  rs->zero_pages++;
>  rs->bytes_transferred +=
> -save_page_header(rs, rs->f, block, offset | 
> RAM_SAVE_FLAG_COMPRESS);
> +save_page_header(rs, rs->f, block, offset | RAM_SAVE_FLAG_ZERO);
>  qemu_put_byte(rs->f, 0);
>  rs->bytes_transferred += 1;
>  pages = 1;
> @@ -2406,7 +2412,7 @@ static int ram_load_postcopy(QEMUFile *f)
>  
>  trace_ram_load_postcopy_loop((uint64_t)addr, flags);
>  place_needed = false;
> -if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE)) {
> +if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE)) {
>  block = ram_block_from_stream(f, flags);
>  
>  host = host_from_ram_block_offset(block, addr);
> @@ -2453,7 +2459,7 @@ static int ram_load_postcopy(QEMUFile *f)
>  last_host = host;
>  
>  switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
> -case RAM_SAVE_FLAG_COMPRESS:
> +case RAM_SAVE_FLAG_ZERO:
>  ch = qemu_get_byte(f);
>  memset(page_buffer, ch, TARGET_PAGE_SIZE);
>  if (ch) {
> @@ -2542,7 +2548,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> version_id)
>  flags = addr & ~TARGET_PAGE_MASK;
>  addr &= TARGET_PAGE_MASK;
>  
> -if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> +if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>   RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
>  RAMBlock *block = ram_block_from_stream(f, flags);
>  
> @@ -2604,7 +2610,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> version_id)
>  }
>  break;
>  
> -case RAM_SAVE_FLAG_COMPRESS:
> +case RAM_SAVE_FLAG_ZERO:
>  ch = qemu_get_byte(f);
>  ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
>  break;
> -- 
> 2.9.3
> 

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2] migration: Pass Error ** argument to {save, load}_vmstate

2017-05-11 Thread Peter Xu
On Thu, May 11, 2017 at 05:34:40PM +0200, Juan Quintela wrote:
> This way we use the "normal" way of printing errors for hmp commands.
> 
> Signed-off-by: Juan Quintela 
> Suggested-by: Paolo Bonzini 

Reviewed-by: Peter Xu 

-- 
Peter Xu



Re: [Qemu-devel] [PATCH 2/8] target/arm: optimize smul_dual() and neon_trn_u8() using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

Hi Eric,

On 05/10/2017 05:15 PM, Eric Blake wrote:

On 05/10/2017 03:05 PM, Philippe Mathieu-Daudé wrote:

Applied using Coccinelle script.


Thinking forward a year - if I want to reproduce this (to see if other
instances have crept in), I have to dig up a mail archive to learn the
formula you used.  Better is to list your coccinelle command line
directly in the commit message, so that reproducing the fix involves
less effort.  Same comment applies throughout the series.

For reference, see how I did it in commit de6e7951.


Ok!

In the cover I used cocci spatch directly thru a unofficial docker image 
which I have no idea it will be around in a year...
I'll let the docker example in the cover and add the spatch command in 
the commits.


Regards,

Phil.



Re: [Qemu-devel] [PATCH 0/8] optimize various tcg_gen() functions using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

Hi,

The patch set Patchew intented to compile is incorrect, but this error 
worried me:


On 05/10/2017 05:20 PM, no-re...@patchew.org wrote:

This series failed build test on s390x host. Please find the details below.

[...]

  CC  mips64-softmmu/target/mips/translate.o
/var/tmp/patchew-tester-tmp-f7svi4g9/src/target/mips/translate.c: In function 
‘gen_bshfl’:
/var/tmp/patchew-tester-tmp-f7svi4g9/src/target/mips/translate.c:4595:43: 
error: large integer implicitly truncated to unsigned type [-Werror=overflow]
 tcg_gen_extract_tl(t1, t0, 8, 0x00FF00FF00FF00FFULL);
   ^
/var/tmp/patchew-tester-tmp-f7svi4g9/src/target/mips/translate.c:4606:44: 
error: large integer implicitly truncated to unsigned type [-Werror=overflow]
 tcg_gen_extract_tl(t1, t0, 16, 0xULL);
^
cc1: all warnings being treated as errors
/var/tmp/patchew-tester-tmp-f7svi4g9/src/rules.mak:69: recipe for target 
'target/mips/translate.o' failed
make[1]: *** [target/mips/translate.o] Error 1
Makefile:327: recipe for target 'subdir-mips64el-softmmu' failed


Now I tried to use this code on mips64el-softmmu target:

tcg_gen_extract_tl(t1, t0, 5, 0x7ff);

And got:

error: large integer implicitly truncated to unsigned type 
[-Werror=overflow]

 tcg_gen_extract_tl(t1, t0, 5, 0x7ff);
   ^

There is no need for a such operation, but it seems legit.

I think tcg-op.h would be clearer cleaning few 'unsigned/unsigned int' 
by a 'tcg_target_long'. Like:


 void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
- unsigned int ofs, unsigned int len);
+ unsigned int ofs, tcg_target_long len);

What do you think Richard?

Regards,

Phil.



Re: [Qemu-devel] [PATCH 6/8] target/mips: optimize bshfl() using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

Hi,

As noticed by Richard in another patch, this one is also WRONG:

$ docker run -it -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle --sp-file 
scripts/coccinelle/tcg_gen_extract.cocci --macro-file 
scripts/cocci-macro-file.h target/mips/translate.c

init_defs_builtins: /usr/lib64/coccinelle/standard.h
init_defs: scripts/cocci-macro-file.h
HANDLING: target/mips/translate.c
candidate at target/mips/translate.c:4576
  op_size: tl/tl (same)
  low_bits: 8 (value: 0xff)
  len: 0xff00ff
  len_bits != low_bits
  candidate is NOT optimizable

candidate at target/mips/translate.c:4596
  op_size: tl/tl (same)
  low_bits: 8 (value: 0xff)
  len: 0xff00ff00ff00ff
  len_bits != low_bits
  candidate is NOT optimizable

candidate at target/mips/translate.c:4608
  op_size: tl/tl (same)
  low_bits: 16 (value: 0x)
  len: 0x
  len_bits != low_bits
  candidate is NOT optimizable

On 05/10/2017 05:05 PM, Philippe Mathieu-Daudé wrote:

Applied using Coccinelle script.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/mips/translate.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 3022f349cb..96177da9ae 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -4573,8 +4573,7 @@ static void gen_bshfl (DisasContext *ctx, uint32_t op2, 
int rt, int rd)
 {
 TCGv t1 = tcg_temp_new();

-tcg_gen_shri_tl(t1, t0, 8);
-tcg_gen_andi_tl(t1, t1, 0x00FF00FF);
+tcg_gen_extract_tl(t1, t0, 8, 0x00FF00FF);
 tcg_gen_shli_tl(t0, t0, 8);
 tcg_gen_andi_tl(t0, t0, ~0x00FF00FF);
 tcg_gen_or_tl(t0, t0, t1);
@@ -4593,8 +4592,7 @@ static void gen_bshfl (DisasContext *ctx, uint32_t op2, 
int rt, int rd)
 {
 TCGv t1 = tcg_temp_new();

-tcg_gen_shri_tl(t1, t0, 8);
-tcg_gen_andi_tl(t1, t1, 0x00FF00FF00FF00FFULL);
+tcg_gen_extract_tl(t1, t0, 8, 0x00FF00FF00FF00FFULL);
 tcg_gen_shli_tl(t0, t0, 8);
 tcg_gen_andi_tl(t0, t0, ~0x00FF00FF00FF00FFULL);
 tcg_gen_or_tl(cpu_gpr[rd], t0, t1);
@@ -4605,8 +4603,7 @@ static void gen_bshfl (DisasContext *ctx, uint32_t op2, 
int rt, int rd)
 {
 TCGv t1 = tcg_temp_new();

-tcg_gen_shri_tl(t1, t0, 16);
-tcg_gen_andi_tl(t1, t1, 0xULL);
+tcg_gen_extract_tl(t1, t0, 16, 0xULL);
 tcg_gen_shli_tl(t0, t0, 16);
 tcg_gen_andi_tl(t0, t0, ~0xULL);
 tcg_gen_or_tl(t0, t0, t1);





Re: [Qemu-devel] [RFC PATCH v2] coccinelle: add a script to optimize tcg op using tcg_gen_extract()

2017-05-11 Thread Philippe Mathieu-Daudé
>> Is this script likely to be rerun in the future?  If yes, keeping it in
>> scripts/coccinelle/ is a good idea.  If no, I recommend to store it in
>> the commit message instead.
>
>
> It is unlikely to be rerun in the future, at least for this specific pattern. 
> But it can be easily adapted for another TCG optimization.
>
> I could not find much documentation about how to do a such script using 
> Python, except on a thread [1]. If it is documented enough I think it is 
> worth to keep it.
>
> About putting it in each commit message, it is now 3 times bigger than the 
> patch it generates!
>
> Regards,
>
> Phil.

I missed this thread ref:

[1] https://github.com/coccinelle/coccinelle/issues/86



Re: [Qemu-devel] [PATCH v2 0/3] Remove old MigrationParams

2017-05-11 Thread Hailiang Zhang

On 2017/5/12 0:32, Juan Quintela wrote:

Hi

Changes from v1:

- make migrate_block_set_* take a boolean
- disable block migration in colo to maintain semantics.

Please review, Juan.

[v1]
Upon a time there were MigrationParms (only used for block migration)
and then MigrationParams used for everything else.  This series:

- create migration capabilities for block parameters
- make the migrate command line parameters to use capabilities
- remove MigrationParams completely

Please, review.



Looks good to me, this makes codes more grace.

Reviewed-by:zhanghailiang 



*** BLURB HERE ***

Juan Quintela (3):
   migration: Create block capabilities for shared and enable
   migration: Remove use of old MigrationParams
   migration: Remove old MigrationParams

  include/migration/block.h |  3 +++
  include/migration/migration.h | 14 +---
  include/migration/vmstate.h   |  1 -
  include/qemu/typedefs.h   |  1 -
  include/sysemu/sysemu.h   |  3 +--
  migration/block.c | 17 ++
  migration/colo.c  |  7 +++---
  migration/migration.c | 52 ---
  migration/savevm.c| 18 +++
  qapi-schema.json  |  7 +-
  10 files changed, 68 insertions(+), 55 deletions(-)





Re: [Qemu-devel] [PATCH 2/3] migration: Remove use of old MigrationParams

2017-05-11 Thread Hailiang Zhang

On 2017/5/12 0:33, Juan Quintela wrote:

Hailiang Zhang  wrote:

Hi,


Hmm you don't seem to have replaced this with anything.
I think that's a behavioural change; the trick COLO did (I'm not sure if this
is still the way it works) is that they initiate the first migration
with block migration enabled so that the two hosts (with non-shared storage)
get sync'd storage, and then at the completion of that first migration
they then switch into the checkpointing mode where they're only
doing updates - that's why it gets switched off at this point
prior to the 1st checkpoint.

Weird, really.

I did't catch that.

Will investigate.

Yes, Dave is right, for non-shared disk, we need to enable block
migration for first cycle,
to sync the disks of two sides. After that, qemu will go into COLO
state which we need to
disable block migration.

v2 posted.

My understanding is that it maintains the sematic, please test/comment.


Yes, it is right now, i have reviewed it, thanks.


Thanks, Juan.

.







Re: [Qemu-devel] [RFC PATCH v2] coccinelle: add a script to optimize tcg op using tcg_gen_extract()

2017-05-11 Thread Philippe Mathieu-Daudé

Hi Markus,

On 05/11/2017 06:03 AM, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


Ok I just understood Richard explanation, so this patch is WRONG and I
need to get some real rest :(


Ha!  Get some sleep; we'll still be around in the morning ;)


On 05/10/2017 08:52 PM, Philippe Mathieu-Daudé wrote:

Apply this script using:

$ docker run -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle \
--sp-file scripts/coccinelle/tcg_gen_extract.cocci \
--macro-file scripts/cocci-macro-file.h \
--dir target \
--in-place

Signed-off-by: Philippe Mathieu-Daudé 
---

This is a new version of the coccinelle script addressing Richard comments and
trying to do it correctly. Also changed license to GPLv2+.

The first rule matches, it calls a python2 script that basically checks the
target_ulong is not overflowed: (msk << ofs) >> sizeof(target_ulong) == 0


WRONG

[...]

Is this script likely to be rerun in the future?  If yes, keeping it in
scripts/coccinelle/ is a good idea.  If no, I recommend to store it in
the commit message instead.


It is unlikely to be rerun in the future, at least for this specific 
pattern. But it can be easily adapted for another TCG optimization.


I could not find much documentation about how to do a such script using 
Python, except on a thread [1]. If it is documented enough I think it is 
worth to keep it.


About putting it in each commit message, it is now 3 times bigger than 
the patch it generates!


Regards,

Phil.



Re: [Qemu-devel] [PATCH 5/8] target/m68k: optimize bcd_flags() using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

This patch is correct:

$ docker run -it -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle --sp-file 
scripts/coccinelle/tcg_gen_extract.cocci --macro-file 
scripts/cocci-macro-file.h --dir target/m68k

init_defs_builtins: /usr/lib64/coccinelle/standard.h
init_defs: scripts/cocci-macro-file.h
HANDLING: target/m68k/helper.c
HANDLING: target/m68k/gdbstub.c
HANDLING: target/m68k/translate.c
candidate at target/m68k/translate.c:1466
  op_size: i32/i32 (same)
  low_bits: 1 (value: 0x1)
  len: 0x1
  len_bits == low_bits
  candidate IS optimizable

On 05/11/2017 05:41 AM, Laurent Vivier wrote:

Le 10/05/2017 à 22:05, Philippe Mathieu-Daudé a écrit :

Applied using Coccinelle script.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/m68k/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 9f60fbc0db..babb9e2c5b 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -1463,8 +1463,7 @@ static void bcd_flags(TCGv val)
 tcg_gen_andi_i32(QREG_CC_C, val, 0x0ff);
 tcg_gen_or_i32(QREG_CC_Z, QREG_CC_Z, QREG_CC_C);

-tcg_gen_shri_i32(QREG_CC_C, val, 8);
-tcg_gen_andi_i32(QREG_CC_C, QREG_CC_C, 1);
+tcg_gen_extract_i32(QREG_CC_C, val, 8, 1);

 tcg_gen_mov_i32(QREG_CC_X, QREG_CC_C);
 }



Acked-by: Laurent Vivier 


Thanks!



Re: [Qemu-devel] [PATCH v3 12/12] vhost: iommu: cache static mapping if there is

2017-05-11 Thread Jason Wang



On 2017年05月11日 16:59, Peter Xu wrote:

On Thu, May 11, 2017 at 04:35:21PM +0800, Jason Wang wrote:


On 2017年05月10日 16:01, Peter Xu wrote:

This patch pre-heat vhost iotlb cache when passthrough mode enabled.

Sometimes, even if user specified iommu_platform for vhost devices,
IOMMU might still be disabled. One case is passthrough mode in VT-d
implementation. We can detect this by observing iommu_list. If it's
empty, it means IOMMU translation is disabled, then we can actually
pre-heat the translation (it'll be static mapping then) by first
invalidating all IOTLB, then cache existing memory ranges into vhost
backend iotlb using 1:1 mapping.

Signed-off-by: Peter Xu 
---
  hw/virtio/trace-events |  4 
  hw/virtio/vhost.c  | 49 +
  2 files changed, 53 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 1f7a7c1..54dcbb3 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -24,3 +24,7 @@ virtio_balloon_handle_output(const char *name, uint64_t gpa) 
"section name: %s g
  virtio_balloon_get_config(uint32_t num_pages, uint32_t actual) "num_pages: %d 
actual: %d"
  virtio_balloon_set_config(uint32_t actual, uint32_t oldactual) "actual: %d 
oldactual: %d"
  virtio_balloon_to_target(uint64_t target, uint32_t num_pages) "balloon target: 
%"PRIx64" num_pages: %d"
+
+# hw/virtio/vhost.c
+vhost_iommu_commit(void) ""
+vhost_iommu_static_preheat(void) ""
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 0001e60..1c92e62 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -27,6 +27,7 @@
  #include "hw/virtio/virtio-access.h"
  #include "migration/migration.h"
  #include "sysemu/dma.h"
+#include "trace.h"
  /* enabled until disconnected backend stabilizes */
  #define _VHOST_DEBUG 1
@@ -730,6 +731,11 @@ static void vhost_iommu_unmap_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  }
  }
+static bool vhost_iommu_mr_enabled(struct vhost_dev *dev)
+{
+return !QLIST_EMPTY(&dev->iommu_list);
+}
+
  static void vhost_iommu_region_add(MemoryListener *listener,
 MemoryRegionSection *section)
  {
@@ -782,6 +788,48 @@ static void vhost_iommu_region_del(MemoryListener 
*listener,
  }
  }
+static void vhost_iommu_commit(MemoryListener *listener)
+{
+struct vhost_dev *dev = container_of(listener, struct vhost_dev,
+ iommu_listener);
+struct vhost_memory_region *r;
+int i;
+
+trace_vhost_iommu_commit();
+
+if (!vhost_iommu_mr_enabled(dev)) {
+/*
+* This means iommu_platform is enabled, however iommu memory
+* region is disabled, e.g., when device passthrough is setup.
+* Then, no translation is needed any more.
+*
+* Let's first invalidate the whole IOTLB, then pre-heat the
+* static mapping by looping over vhost memory ranges.
+*/
+
+if (dev->vhost_ops->vhost_invalidate_device_iotlb(dev, 0,
+  UINT64_MAX-1)) {
+error_report("%s: flush existing IOTLB failed", __func__);
+return;
+}
+
+for (i = 0; i < dev->mem->nregions; i++) {
+r = &dev->mem->regions[i];
+/* Vhost regions are writable RAM, so IOMMU_RW suites. */
+if (dev->vhost_ops->vhost_update_device_iotlb(dev,
+  r->guest_phys_addr,
+  r->userspace_addr,
+  r->memory_size,
+  IOMMU_RW)) {
+error_report("%s: pre-heat static mapping failed", __func__);
+return;
+}
+}
+
+trace_vhost_iommu_static_preheat();
+}
+}

Looks like vfio does the map in region_add(), if we can have different types
of memory regions (e.g some were under an IOMMU but others were not), do we
need to switch to do this in vhost_iommu_region_add() ?

Currently this is only a pre-heat of cache only if IOMMU is totally
disabled (!vhost_iommu_mr_enabled(dev) means no IOMMU memory regions).
This patch won't be activated without this condition, so for the cases
(non-x86 platforms) where there are some IOMMU regions, it'll be just
automatically disabled. And, I don't really quite sure whether we
should cache non-IOMMU regions when there are some IOMMU regions... So
imho we can keep this until one day we really want to support some
non-x86 platforms for vhost-dmar, then we can work on top. Thanks,



Right, so let's keep this as is and do optimization on top.

Thanks



Re: [Qemu-devel] [PATCH 3/8] target/arm: optimize rev16() using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

Hi,

I'll resend as v3, just to confirm this patch is OK:

$ docker run -it -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle --sp-file 
scripts/coccinelle/tcg_gen_extract.cocci --macro-file 
scripts/cocci-macro-file.h target/arm/translate-a64.c

init_defs_builtins: /usr/lib64/coccinelle/standard.h
init_defs: scripts/cocci-macro-file.h
HANDLING: target/arm/translate-a64.c
candidate at target/arm/translate-a64.c:4041
  op_size: i64/i64 (same)
  low_bits: 16 (value: 0x)
  len: 0x
  len_bits == low_bits
  candidate IS optimizable

candidate at target/arm/translate-a64.c:4047
  op_size: i64/i64 (same)
  low_bits: 16 (value: 0x)
  len: 0x
  len_bits == low_bits
  candidate IS optimizable

On 05/10/2017 05:05 PM, Philippe Mathieu-Daudé wrote:

Applied using Coccinelle script.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/translate-a64.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d92c..7ea130107e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4038,14 +4038,12 @@ static void handle_rev16(DisasContext *s, unsigned int 
sf,
 tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0x);
 tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);

-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
+tcg_gen_extract_i64(tcg_tmp, tcg_rn, 16, 0x);
 tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
 tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);

 if (sf) {
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
+tcg_gen_extract_i64(tcg_tmp, tcg_rn, 32, 0x);
 tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
 tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);






[Qemu-devel] [PATCH V4 12/12] net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len

2017-05-11 Thread Zhang Chen
We get the vnet_hdr_len from NetClientState that make us
parse net packet correctly.

Signed-off-by: Zhang Chen 
---
 net/filter-rewriter.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index bc6d12a..be129c7 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -17,6 +17,7 @@
 #include "qemu-common.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
 #include "qapi-visit.h"
 #include "qom/object.h"
 #include "qemu/main-loop.h"
@@ -156,10 +157,24 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState 
*nf,
 ConnectionKey key;
 Packet *pkt;
 ssize_t size = iov_size(iov, iovcnt);
+ssize_t vnet_hdr_len = 0;
 char *buf = g_malloc0(size);
 
 iov_to_buf(iov, iovcnt, 0, buf, size);
-pkt = packet_new(buf, size, 0);
+
+if (s->vnet_hdr) {
+if (nf->netdev->using_vnet_hdr) {
+vnet_hdr_len = nf->netdev->vnet_hdr_len;
+} else if (nf->netdev->peer->using_vnet_hdr) {
+vnet_hdr_len = nf->netdev->peer->vnet_hdr_len;
+} else {
+error_report("filter-rewriter get vnet_hdr_len failed");
+/* When error occurred we drop the packet  */
+return 1;
+}
+}
+
+pkt = packet_new(buf, size, vnet_hdr_len);
 g_free(buf);
 
 /*
-- 
2.7.4






[Qemu-devel] [PATCH V4 10/12] net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare

2017-05-11 Thread Zhang Chen
COLO-Proxy just focus on packet payload, So we skip vnet header.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index cb0b04e..bf565f3 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -188,6 +188,8 @@ static int packet_enqueue(CompareState *s, int mode)
  */
 static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset)
 {
+int offset_all;
+
 if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
 char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20];
 
@@ -201,9 +203,12 @@ static int colo_packet_compare_common(Packet *ppkt, Packet 
*spkt, int offset)
sec_ip_src, sec_ip_dst);
 }
 
+offset_all = ppkt->vnet_hdr_len + offset;
+
 if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data + offset, spkt->data + offset,
-  spkt->size - offset);
+return memcmp(ppkt->data + offset_all,
+  spkt->data + offset_all,
+  spkt->size - offset_all);
 } else {
 trace_colo_compare_main("Net packet size are not the same");
 return -1;
@@ -261,8 +266,9 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
  */
 if (ptcp->th_off > 5) {
 ptrdiff_t tcp_offset;
+
 tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
- + (ptcp->th_off * 4);
+ + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
 res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
 } else if (ptcp->th_sum == stcp->th_sum) {
 res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
-- 
2.7.4






[Qemu-devel] [PATCH V4 11/12] net/filter-rewriter.c: Add new option to enable vnet support for filter-rewriter

2017-05-11 Thread Zhang Chen
We add the vnet_hdr option for filter-rewriter, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object filter-rewriter,id=rew0,netdev=hn0,queue=all,vnet_hdr=on

Signed-off-by: Zhang Chen 
---
 net/filter-rewriter.c | 38 ++
 qemu-options.hx   |  4 ++--
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 63256c7..bc6d12a 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -33,6 +33,7 @@ typedef struct RewriterState {
 NetQueue *incoming_queue;
 /* hashtable to save connection */
 GHashTable *connection_track_table;
+bool vnet_hdr;
 } RewriterState;
 
 static void filter_rewriter_flush(NetFilterState *nf)
@@ -237,6 +238,42 @@ static void colo_rewriter_setup(NetFilterState *nf, Error 
**errp)
 s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
 }
 
+static char *filter_rewriter_get_vnet_hdr(Object *obj, Error **errp)
+{
+RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+return s->vnet_hdr ? g_strdup("on") : g_strdup("off");
+}
+
+static void filter_rewriter_set_vnet_hdr(Object *obj,
+ const char *value,
+ Error **errp)
+{
+RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+if (strcmp(value, "on") && strcmp(value, "off")) {
+error_setg(errp, "Invalid value for filter-rewriter vnet_hdr, "
+ "should be 'on' or 'off'");
+return;
+}
+
+s->vnet_hdr = !strcmp(value, "on");
+}
+
+static void filter_rewriter_init(Object *obj)
+{
+RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_str(obj, "vnet_hdr", filter_rewriter_get_vnet_hdr,
+filter_rewriter_set_vnet_hdr, NULL);
+}
+
 static void colo_rewriter_class_init(ObjectClass *oc, void *data)
 {
 NetFilterClass *nfc = NETFILTER_CLASS(oc);
@@ -250,6 +287,7 @@ static const TypeInfo colo_rewriter_info = {
 .name = TYPE_FILTER_REWRITER,
 .parent = TYPE_NETFILTER,
 .class_init = colo_rewriter_class_init,
+.instance_init = filter_rewriter_init,
 .instance_size = sizeof(RewriterState),
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 115b83f..d191050 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4040,12 +4040,12 @@ Create a filter-redirector we need to differ outdev id 
from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
 
-@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid},rewriter-mode=@var{mode}[,queue=@var{all|rx|tx}]
+@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid},rewriter-mode=@var{mode},vnet_hdr=@var{on|off}[,queue=@var{all|rx|tx}]
 
 Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
 secondary from primary to keep secondary tcp connection,and rewrite
 tcp packet to primary from secondary make tcp packet can be handled by
-client.
+client.if vnet_hdr = on, we can parse packet with vnet header.
 
 usage:
 colo secondary:
-- 
2.7.4






Re: [Qemu-devel] [PATCH 8/8] target/sparc: optimize various functions using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

This patch seems correct:

$ docker run -it -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle --sp-file 
scripts/coccinelle/tcg_gen_extract.cocci --macro-file 
scripts/cocci-macro-file.h --dir target/sparc

init_defs_builtins: /usr/lib64/coccinelle/standard.h
init_defs: scripts/cocci-macro-file.h
HANDLING: target/sparc/helper.c
HANDLING: target/sparc/ldst_helper.c
HANDLING: target/sparc/gdbstub.c
HANDLING: target/sparc/translate.c
candidate at target/sparc/translate.c:404
  op_size: tl/tl (same)
  low_bits: 1 (value: 0x1)
  len: 0x1
  len_bits == low_bits
  candidate IS optimizable

candidate at target/sparc/translate.c:383
  op_size: tl/tl (same)
  low_bits: 1 (value: 0x1)
  len: 0x1
  len_bits == low_bits
  candidate IS optimizable

candidate at target/sparc/translate.c:397
  op_size: tl/tl (same)
  low_bits: 1 (value: 0x1)
  len: 0x1
  len_bits == low_bits
  candidate IS optimizable

candidate at target/sparc/translate.c:390
  op_size: tl/tl (same)
  low_bits: 1 (value: 0x1)
  len: 0x1
  len_bits == low_bits
  candidate IS optimizable

candidate at target/sparc/translate.c:641
  op_size: tl/tl (same)
  low_bits: 31 (value: 0x7fff)
  len: 0x7fff
  len_bits == low_bits
  candidate IS optimizable

On 05/10/2017 05:05 PM, Philippe Mathieu-Daudé wrote:

Applied using Coccinelle script.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/sparc/translate.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index aa6734d54e..a92b5c425c 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -380,29 +380,25 @@ static inline void gen_goto_tb(DisasContext *s, int 
tb_num,
 static inline void gen_mov_reg_N(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_NEG_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_NEG_SHIFT, 0x1);
 }

 static inline void gen_mov_reg_Z(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_ZERO_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_ZERO_SHIFT, 0x1);
 }

 static inline void gen_mov_reg_V(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_OVF_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_OVF_SHIFT, 0x1);
 }

 static inline void gen_mov_reg_C(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_CARRY_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_CARRY_SHIFT, 0x1);
 }

 static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
@@ -638,8 +634,7 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv 
src2)
 // env->y = (b2 << 31) | (env->y >> 1);
 tcg_gen_andi_tl(r_temp, cpu_cc_src, 0x1);
 tcg_gen_shli_tl(r_temp, r_temp, 31);
-tcg_gen_shri_tl(t0, cpu_y, 1);
-tcg_gen_andi_tl(t0, t0, 0x7fff);
+tcg_gen_extract_tl(t0, cpu_y, 1, 0x7fff);
 tcg_gen_or_tl(t0, t0, r_temp);
 tcg_gen_andi_tl(cpu_y, t0, 0x);






[Qemu-devel] [PATCH V4 07/12] net/colo.c: Make vnet_hdr_len as packet property

2017-05-11 Thread Zhang Chen
We can use this property flush and send packet with vnet_hdr_len.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c| 8 ++--
 net/colo.c| 3 ++-
 net/colo.h| 4 +++-
 net/filter-rewriter.c | 2 +-
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 99a6912..87a9529 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -122,9 +122,13 @@ static int packet_enqueue(CompareState *s, int mode)
 Connection *conn;
 
 if (mode == PRIMARY_IN) {
-pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
+pkt = packet_new(s->pri_rs.buf,
+ s->pri_rs.packet_len,
+ s->pri_rs.vnet_hdr_len);
 } else {
-pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len);
+pkt = packet_new(s->sec_rs.buf,
+ s->sec_rs.packet_len,
+ s->sec_rs.vnet_hdr_len);
 }
 
 if (parse_packet_early(pkt)) {
diff --git a/net/colo.c b/net/colo.c
index 8cc166b..180eaed 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -153,13 +153,14 @@ void connection_destroy(void *opaque)
 g_slice_free(Connection, conn);
 }
 
-Packet *packet_new(const void *data, int size)
+Packet *packet_new(const void *data, int size, int vnet_hdr_len)
 {
 Packet *pkt = g_slice_new(Packet);
 
 pkt->data = g_memdup(data, size);
 pkt->size = size;
 pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+pkt->vnet_hdr_len = vnet_hdr_len;
 
 return pkt;
 }
diff --git a/net/colo.h b/net/colo.h
index 7c524f3..caedb0d 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -43,6 +43,8 @@ typedef struct Packet {
 int size;
 /* Time of packet creation, in wall clock ms */
 int64_t creation_ms;
+/* Get vnet_hdr_len from filter */
+uint32_t vnet_hdr_len;
 } Packet;
 
 typedef struct ConnectionKey {
@@ -82,7 +84,7 @@ Connection *connection_get(GHashTable *connection_track_table,
ConnectionKey *key,
GQueue *conn_list);
 void connection_hashtable_reset(GHashTable *connection_track_table);
-Packet *packet_new(const void *data, int size);
+Packet *packet_new(const void *data, int size, int vnet_hdr_len);
 void packet_destroy(void *opaque, void *user_data);
 
 #endif /* QEMU_COLO_PROXY_H */
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index afa06e8..63256c7 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -158,7 +158,7 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
 char *buf = g_malloc0(size);
 
 iov_to_buf(iov, iovcnt, 0, buf, size);
-pkt = packet_new(buf, size);
+pkt = packet_new(buf, size, 0);
 g_free(buf);
 
 /*
-- 
2.7.4






[Qemu-devel] [PATCH V4 09/12] net/colo.c: Add vnet packet parse feature in colo-proxy

2017-05-11 Thread Zhang Chen
Make colo-compare and filter-rewriter can parse vnet packet.

Signed-off-by: Zhang Chen 
---
 net/colo.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 180eaed..28ce7c8 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -43,11 +43,11 @@ int parse_packet_early(Packet *pkt)
 {
 int network_length;
 static const uint8_t vlan[] = {0x81, 0x00};
-uint8_t *data = pkt->data;
+uint8_t *data = pkt->data + pkt->vnet_hdr_len;
 uint16_t l3_proto;
 ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
 
-if (pkt->size < ETH_HLEN) {
+if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
 trace_colo_proxy_main("pkt->size < ETH_HLEN");
 return 1;
 }
@@ -73,7 +73,7 @@ int parse_packet_early(Packet *pkt)
 }
 
 network_length = pkt->ip->ip_hl * 4;
-if (pkt->size < l2hdr_len + network_length) {
+if (pkt->size < l2hdr_len + network_length + pkt->vnet_hdr_len) {
 trace_colo_proxy_main("pkt->size < network_header + network_length");
 return 1;
 }
-- 
2.7.4






[Qemu-devel] [PATCH V4 02/12] net/filter-mirror.c: Add new option to enable vnet support for filter-mirror

2017-05-11 Thread Zhang Chen
We add the vnet_hdr option for filter-mirror, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr=on

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 34 ++
 qemu-options.hx |  5 +++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 72fa7c2..3766414 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -38,6 +38,7 @@ typedef struct MirrorState {
 NetFilterState parent_obj;
 char *indev;
 char *outdev;
+bool vnet_hdr;
 CharBackend chr_in;
 CharBackend chr_out;
 SocketReadState rs;
@@ -308,6 +309,13 @@ static char *filter_mirror_get_outdev(Object *obj, Error 
**errp)
 return g_strdup(s->outdev);
 }
 
+static char *filter_mirror_get_vnet_hdr(Object *obj, Error **errp)
+{
+MirrorState *s = FILTER_MIRROR(obj);
+
+return s->vnet_hdr ? g_strdup("on") : g_strdup("off");
+}
+
 static void
 filter_mirror_set_outdev(Object *obj, const char *value, Error **errp)
 {
@@ -322,6 +330,21 @@ filter_mirror_set_outdev(Object *obj, const char *value, 
Error **errp)
 }
 }
 
+static void filter_mirror_set_vnet_hdr(Object *obj,
+   const char *value,
+   Error **errp)
+{
+MirrorState *s = FILTER_MIRROR(obj);
+
+if (strcmp(value, "on") && strcmp(value, "off")) {
+error_setg(errp, "Invalid value for filter-mirror vnet_hdr, "
+ "should be 'on' or 'off'");
+return;
+}
+
+s->vnet_hdr = !strcmp(value, "on");
+}
+
 static char *filter_redirector_get_outdev(Object *obj, Error **errp)
 {
 MirrorState *s = FILTER_REDIRECTOR(obj);
@@ -340,8 +363,19 @@ filter_redirector_set_outdev(Object *obj, const char 
*value, Error **errp)
 
 static void filter_mirror_init(Object *obj)
 {
+MirrorState *s = FILTER_MIRROR(obj);
+
 object_property_add_str(obj, "outdev", filter_mirror_get_outdev,
 filter_mirror_set_outdev, NULL);
+
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_str(obj, "vnet_hdr", filter_mirror_get_vnet_hdr,
+ filter_mirror_set_vnet_hdr, NULL);
 }
 
 static void filter_redirector_init(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index 70c0ded..1e08481 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4024,10 +4024,11 @@ queue @var{all|rx|tx} is an option that can be applied 
to any netfilter.
 @option{tx}: the filter is attached to the transmit queue of the netdev,
  where it will receive packets sent by the netdev.
 
-@item -object 
filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
+@item -object 
filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},vnet_hdr=@var{on|off}[,queue=@var{all|rx|tx}]
 
 filter-mirror on netdev @var{netdevid},mirror net packet to chardev
-@var{chardevid}
+@var{chardevid}, if vnet_hdr = on, filter-mirror will mirror packet
+with vnet_hdr_len.
 
 @item -object 
filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
 outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
-- 
2.7.4






Re: [Qemu-devel] [PATCH 7/8] target/ppc: optimize various functions using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

Hi Nikunj,

On 05/11/2017 01:54 AM, Nikunj A Dadhania wrote:

Philippe Mathieu-Daudé  writes:


Applied using Coccinelle script.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/ppc/translate.c  |  9 +++--
 target/ppc/translate/vsx-impl.inc.c | 21 +++--
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f40b5a1abf..64ab412bf3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -868,8 +868,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv 
ret, TCGv arg1,
 }
 tcg_gen_xor_tl(cpu_ca, t0, t1);/* bits changed w/ carry */
 tcg_temp_free(t1);
-tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);   /* extract bit 32 */
-tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
 if (is_isa300(ctx)) {
 tcg_gen_mov_tl(cpu_ca32, cpu_ca);
 }
@@ -1399,8 +1398,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 tcg_temp_free(inv1);
 tcg_gen_xor_tl(cpu_ca, t0, t1); /* bits changes w/ carry */
 tcg_temp_free(t1);
-tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);/* extract bit 32 */
-tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
 if (is_isa300(ctx)) {
 tcg_gen_mov_tl(cpu_ca32, cpu_ca);
 }


Above changes are correct.

Rest of them are wrong as discussed above in the thread with Richard.

>

I tried to correct the cocci script and ran it again (will post in few 
min as v3) and got:


$ docker run -it -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle --sp-file 
scripts/coccinelle/tcg_gen_extract.cocci --macro-file 
scripts/cocci-macro-file.h --dir target/ppc

init_defs_builtins: /usr/lib64/coccinelle/standard.h
init_defs: scripts/cocci-macro-file.h
HANDLING: target/ppc/mfrom_table_gen.c
HANDLING: target/ppc/user_only_helper.c
HANDLING: target/ppc/mmu-hash64.c
HANDLING: target/ppc/timebase_helper.c
HANDLING: target/ppc/gdbstub.c
HANDLING: target/ppc/translate.c
candidate at target/ppc/translate.c:5386
  op_size: tl/tl (same)
  low_bits: 4 (value: 0xf)
  len: 0xf
  len_bits == low_bits
  candidate IS optimizable

candidate at target/ppc/translate.c:871
  op_size: tl/tl (same)
  low_bits: 1 (value: 0x1)
  len: 0x1
  len_bits == low_bits
  candidate IS optimizable

candidate at target/ppc/translate.c:1402
  op_size: tl/tl (same)
  low_bits: 1 (value: 0x1)
  len: 0x1
  len_bits == low_bits
  candidate IS optimizable


@@ -5383,8 +5381,7 @@ static void gen_mfsri(DisasContext *ctx)
 CHK_SV;
 t0 = tcg_temp_new();
 gen_addr_reg_index(ctx, t0);
-tcg_gen_shri_tl(t0, t0, 28);
-tcg_gen_andi_tl(t0, t0, 0xF);
+tcg_gen_extract_tl(t0, t0, 28, 0xF);
 gen_helper_load_sr(cpu_gpr[rd], cpu_env, t0);
 tcg_temp_free(t0);
 if (ra != 0 && ra != rd)


0xF = 0b so this one seems correct to, right?

Then I got:

candidate at target/ppc/translate/vsx-impl.inc.c:1265
  op_size: i64/i64 (same)
  low_bits: 15 (value: 0x7fff)
  len: 0x7fff
  len_bits == low_bits
  candidate IS optimizable

candidate at target/ppc/translate/vsx-impl.inc.c:1451
  op_size: i64/i64 (same)
  low_bits: 11 (value: 0x7ff)
  len: 0x7ff
  len_bits == low_bits
  candidate IS optimizable

candidate at target/ppc/translate/vsx-impl.inc.c:1453
  op_size: i64/i64 (same)
  low_bits: 11 (value: 0x7ff)
  len: 0x7ff
  len_bits == low_bits
  candidate IS optimizable

candidate at target/ppc/translate/vsx-impl.inc.c:1434
  op_size: i64/i64 (same)
  low_bits: 8 (value: 0xff)
  len: 0xff00ff
  len_bits != low_bits
  candidate is NOT optimizable

candidate at target/ppc/translate/vsx-impl.inc.c:1436
  op_size: i64/i64 (same)
  low_bits: 8 (value: 0xff)
  len: 0xff00ff
  len_bits != low_bits
  candidate is NOT optimizable

candidate at target/ppc/translate/vsx-impl.inc.c:1477
  op_size: i64/i64 (same)
  low_bits: 11 (value: 0x7ff)
  len: 0x7ff
  len_bits == low_bits
  candidate IS optimizable

candidate at target/ppc/translate/vsx-impl.inc.c:1485
  op_size: i64/i64 (same)
  low_bits: 11 (value: 0x7ff)
  len: 0x7ff
  len_bits == low_bits
  candidate IS optimizable


diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 7f12908029..354a6b113a 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1262,8 +1262,7 @@ static void gen_xsxexpqp(DisasContext *ctx)
 gen_exception(ctx, POWERPC_EXCP_VSXU);
 return;
 }
-tcg_gen_shri_i64(xth, xbh, 48);
-tcg_gen_andi_i64(xth, xth, 0x7FFF);
+tcg_gen_extract_i64(xth, xbh, 48, 0x7FFF);
 tcg_gen_movi_i64(xtl, 0);
 }


0x7FFF = 0b111 (15 bits set)

This one is correct too?



@@ -1431,10 +1430,8 @@ static void gen_xvxexpsp(DisasContext *ctx)
 gen_exception(ctx, 

[Qemu-devel] [PATCH V4 06/12] net/colo-compare.c: Add new option to enable vnet support for colo-compare

2017-05-11 Thread Zhang Chen
We add the vnet_hdr option for colo-compare, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object 
colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr=on

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 34 +-
 qemu-options.hx|  3 ++-
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 332f57e..99a6912 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -73,6 +73,7 @@ typedef struct CompareState {
 CharBackend chr_out;
 SocketReadState pri_rs;
 SocketReadState sec_rs;
+bool vnet_hdr;
 
 /* connection list: the connections belonged to this NIC could be found
  * in this list.
@@ -642,6 +643,28 @@ static void compare_set_outdev(Object *obj, const char 
*value, Error **errp)
 s->outdev = g_strdup(value);
 }
 
+static char *compare_get_vnet_hdr(Object *obj, Error **errp)
+{
+CompareState *s = COLO_COMPARE(obj);
+
+return s->vnet_hdr ? g_strdup("on") : g_strdup("off");
+}
+
+static void compare_set_vnet_hdr(Object *obj,
+ const char *value,
+ Error **errp)
+{
+CompareState *s = COLO_COMPARE(obj);
+
+if (strcmp(value, "on") && strcmp(value, "off")) {
+error_setg(errp, "Invalid value for colo-compare vnet_hdr, "
+ "should be 'on' or 'off'");
+return;
+}
+
+s->vnet_hdr = !strcmp(value, "on");
+}
+
 static void compare_pri_rs_finalize(SocketReadState *pri_rs)
 {
 CompareState *s = container_of(pri_rs, CompareState, pri_rs);
@@ -667,7 +690,6 @@ static void compare_sec_rs_finalize(SocketReadState *sec_rs)
 }
 }
 
-
 /*
  * Return 0 is success.
  * Return 1 is failed.
@@ -775,6 +797,8 @@ static void colo_compare_class_init(ObjectClass *oc, void 
*data)
 
 static void colo_compare_init(Object *obj)
 {
+CompareState *s = COLO_COMPARE(obj);
+
 object_property_add_str(obj, "primary_in",
 compare_get_pri_indev, compare_set_pri_indev,
 NULL);
@@ -784,6 +808,14 @@ static void colo_compare_init(Object *obj)
 object_property_add_str(obj, "outdev",
 compare_get_outdev, compare_set_outdev,
 NULL);
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_str(obj, "vnet_hdr", compare_get_vnet_hdr,
+compare_set_vnet_hdr, NULL);
 }
 
 static void colo_compare_finalize(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index 0f81c22..115b83f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4061,12 +4061,13 @@ The file format is libpcap, so it can be analyzed with 
tools such as tcpdump
 or Wireshark.
 
 @item -object 
colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},
-outdev=@var{chardevid}
+outdev=@var{chardevid},vnet_hdr=@var{on|off}
 
 Colo-compare gets packet from primary_in@var{chardevid} and 
secondary_in@var{chardevid}, than compare primary packet with
 secondary packet. If the packets are same, we will output primary
 packet to outdev@var{chardevid}, else we will notify colo-frame
 do checkpoint and send primary packet to outdev@var{chardevid}.
+if vnet_hdr = on, colo compare will send/recv packet with vnet_hdr_len.
 
 we must use it with the help of filter-mirror and filter-redirector.
 
-- 
2.7.4






[Qemu-devel] [PATCH V4 05/12] net/net.c: Add vnet_hdr support in SocketReadState

2017-05-11 Thread Zhang Chen
Address Jason Wang's comments add vnet header length to SocketReadState.
We add a flag to dicide whether net_fill_rstate() to read
struct  {int size; int vnet_hdr_len; const uint8_t buf[];} or not.

Signed-off-by: Zhang Chen 
---
 include/net/net.h   |  9 +++--
 net/colo-compare.c  |  4 ++--
 net/filter-mirror.c |  2 +-
 net/net.c   | 36 
 net/socket.c|  2 +-
 5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 70edfc0..0763636 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -113,14 +113,19 @@ typedef struct NICState {
 } NICState;
 
 struct SocketReadState {
-int state; /* 0 = getting length, 1 = getting data */
+/* 0 = getting length, 1 = getting vnet header length, 2 = getting data */
+int state;
 uint32_t index;
 uint32_t packet_len;
+uint32_t vnet_hdr_len;
 uint8_t buf[NET_BUFSIZE];
 SocketReadStateFinalize *finalize;
 };
 
-int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size);
+int net_fill_rstate(SocketReadState *rs,
+const uint8_t *buf,
+int size,
+bool vnet_hdr);
 char *qemu_mac_strdup_printf(const uint8_t *macaddr);
 NetClientState *qemu_find_netdev(const char *id);
 int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 4ab80b1..332f57e 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -530,7 +530,7 @@ static void compare_pri_chr_in(void *opaque, const uint8_t 
*buf, int size)
 CompareState *s = COLO_COMPARE(opaque);
 int ret;
 
-ret = net_fill_rstate(&s->pri_rs, buf, size);
+ret = net_fill_rstate(&s->pri_rs, buf, size, false);
 if (ret == -1) {
 qemu_chr_fe_set_handlers(&s->chr_pri_in, NULL, NULL, NULL,
  NULL, NULL, true);
@@ -547,7 +547,7 @@ static void compare_sec_chr_in(void *opaque, const uint8_t 
*buf, int size)
 CompareState *s = COLO_COMPARE(opaque);
 int ret;
 
-ret = net_fill_rstate(&s->sec_rs, buf, size);
+ret = net_fill_rstate(&s->sec_rs, buf, size, false);
 if (ret == -1) {
 qemu_chr_fe_set_handlers(&s->chr_sec_in, NULL, NULL, NULL,
  NULL, NULL, true);
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index a65853c..4649416 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -134,7 +134,7 @@ static void redirector_chr_read(void *opaque, const uint8_t 
*buf, int size)
 MirrorState *s = FILTER_REDIRECTOR(nf);
 int ret;
 
-ret = net_fill_rstate(&s->rs, buf, size);
+ret = net_fill_rstate(&s->rs, buf, size, s->vnet_hdr);
 
 if (ret == -1) {
 qemu_chr_fe_set_handlers(&s->chr_in, NULL, NULL, NULL,
diff --git a/net/net.c b/net/net.c
index a00a0c9..a9c97cf 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1618,13 +1618,20 @@ void net_socket_rs_init(SocketReadState *rs,
  * 0: success
  * -1: error occurs
  */
-int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size)
+int net_fill_rstate(SocketReadState *rs,
+const uint8_t *buf,
+int size,
+bool vnet_hdr)
 {
 unsigned int l;
 
 while (size > 0) {
-/* reassemble a packet from the network */
-switch (rs->state) { /* 0 = getting length, 1 = getting data */
+/* Reassemble a packet from the network.
+ * 0 = getting length.
+ * 1 = getting vnet header length.
+ * 2 = getting data.
+ */
+switch (rs->state) {
 case 0:
 l = 4 - rs->index;
 if (l > size) {
@@ -1638,10 +1645,31 @@ int net_fill_rstate(SocketReadState *rs, const uint8_t 
*buf, int size)
 /* got length */
 rs->packet_len = ntohl(*(uint32_t *)rs->buf);
 rs->index = 0;
-rs->state = 1;
+if (vnet_hdr) {
+rs->state = 1;
+} else {
+rs->state = 2;
+rs->vnet_hdr_len = 0;
+}
 }
 break;
 case 1:
+l = 4 - rs->index;
+if (l > size) {
+l = size;
+}
+memcpy(rs->buf + rs->index, buf, l);
+buf += l;
+size -= l;
+rs->index += l;
+if (rs->index == 4) {
+/* got vnet header length */
+rs->vnet_hdr_len = ntohl(*(uint32_t *)rs->buf);
+rs->index = 0;
+rs->state = 2;
+}
+break;
+case 2:
 l = rs->packet_len - rs->index;
 if (l > size) {
 l = size;
diff --git a/net/socket.c b/net/socket.c
index b8c931e..4e58eff 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -182,7 +182,7 @@ static void net_socket_send(void *opaque)
 }
 bu

[Qemu-devel] [PATCH V4 04/12] net/filter-mirror.c: Add new option to enable vnet support for filter-redirector

2017-05-11 Thread Zhang Chen
We add the vnet_hdr option for filter-redirector, default is disable.
If you use virtio-net-pci net driver, please enable it.
You can use it for example:
-object filter-redirector,id=r0,netdev=hn0,queue=tx,outdev=red0,vnet_hdr=on

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 33 +
 qemu-options.hx |  5 +++--
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 64323fc..a65853c 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -377,6 +377,13 @@ static char *filter_redirector_get_outdev(Object *obj, 
Error **errp)
 return g_strdup(s->outdev);
 }
 
+static char *filter_redirector_get_vnet_hdr(Object *obj, Error **errp)
+{
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
+return s->vnet_hdr ? g_strdup("on") : g_strdup("off");
+}
+
 static void
 filter_redirector_set_outdev(Object *obj, const char *value, Error **errp)
 {
@@ -386,6 +393,21 @@ filter_redirector_set_outdev(Object *obj, const char 
*value, Error **errp)
 s->outdev = g_strdup(value);
 }
 
+static void filter_redirector_set_vnet_hdr(Object *obj,
+   const char *value,
+   Error **errp)
+{
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
+if (strcmp(value, "on") && strcmp(value, "off")) {
+error_setg(errp, "Invalid value for filter-redirector vnet_hdr, "
+ "should be 'on' or 'off'");
+return;
+}
+
+s->vnet_hdr = !strcmp(value, "on");
+}
+
 static void filter_mirror_init(Object *obj)
 {
 MirrorState *s = FILTER_MIRROR(obj);
@@ -405,10 +427,21 @@ static void filter_mirror_init(Object *obj)
 
 static void filter_redirector_init(Object *obj)
 {
+MirrorState *s = FILTER_REDIRECTOR(obj);
+
 object_property_add_str(obj, "indev", filter_redirector_get_indev,
 filter_redirector_set_indev, NULL);
 object_property_add_str(obj, "outdev", filter_redirector_get_outdev,
 filter_redirector_set_outdev, NULL);
+
+/*
+ * The vnet_hdr is disabled by default, if you want to enable
+ * this option, you must enable all the option on related modules
+ * (like other filter or colo-compare).
+ */
+s->vnet_hdr = false;
+object_property_add_str(obj, "vnet_hdr", filter_redirector_get_vnet_hdr,
+filter_redirector_set_vnet_hdr, NULL);
 }
 
 static void filter_mirror_fini(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index 1e08481..0f81c22 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4031,10 +4031,11 @@ filter-mirror on netdev @var{netdevid},mirror net 
packet to chardev
 with vnet_hdr_len.
 
 @item -object 
filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
-outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
+outdev=@var{chardevid},vnet_hdr=@var{on|off}[,queue=@var{all|rx|tx}]
 
 filter-redirector on netdev @var{netdevid},redirect filter's net packet to 
chardev
-@var{chardevid},and redirect indev's packet to filter.
+@var{chardevid},and redirect indev's packet to filter.if vnet_hdr = on,
+filter-redirector will redirect packet with vnet_hdr_len.
 Create a filter-redirector we need to differ outdev id from indev id, id can 
not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
-- 
2.7.4






[Qemu-devel] [PATCH V4 08/12] net/colo-compare.c: Make colo-compare support vnet_hdr_len

2017-05-11 Thread Zhang Chen
COLO-compare can get vnet header length from filter,
Add vnet_hdr_len to struct packet and output packet with
the vnet_hdr_len.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 45 ++---
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 87a9529..cb0b04e 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -98,9 +98,10 @@ enum {
 SECONDARY_IN,
 };
 
-static int compare_chr_send(CharBackend *out,
+static int compare_chr_send(CompareState *s,
 const uint8_t *buf,
-uint32_t size);
+uint32_t size,
+uint32_t vnet_hdr_len);
 
 static gint seq_sorter(Packet *a, Packet *b, gpointer data)
 {
@@ -473,7 +474,10 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 }
 
 if (result) {
-ret = compare_chr_send(&s->chr_out, pkt->data, pkt->size);
+ret = compare_chr_send(s,
+   pkt->data,
+   pkt->size,
+   pkt->vnet_hdr_len);
 if (ret < 0) {
 error_report("colo_send_primary_packet failed");
 }
@@ -494,9 +498,10 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 }
 }
 
-static int compare_chr_send(CharBackend *out,
+static int compare_chr_send(CompareState *s,
 const uint8_t *buf,
-uint32_t size)
+uint32_t size,
+uint32_t vnet_hdr_len)
 {
 int ret = 0;
 uint32_t len = htonl(size);
@@ -505,12 +510,24 @@ static int compare_chr_send(CharBackend *out,
 return 0;
 }
 
-ret = qemu_chr_fe_write_all(out, (uint8_t *)&len, sizeof(len));
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
 if (ret != sizeof(len)) {
 goto err;
 }
 
-ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
+if (s->vnet_hdr) {
+/*
+ * We send vnet header len make other module(like filter-redirector)
+ * know how to parse net packet correctly.
+ */
+len = htonl(vnet_hdr_len);
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
+if (ret != sizeof(len)) {
+goto err;
+}
+}
+
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
 if (ret != size) {
 goto err;
 }
@@ -535,7 +552,7 @@ static void compare_pri_chr_in(void *opaque, const uint8_t 
*buf, int size)
 CompareState *s = COLO_COMPARE(opaque);
 int ret;
 
-ret = net_fill_rstate(&s->pri_rs, buf, size, false);
+ret = net_fill_rstate(&s->pri_rs, buf, size, s->vnet_hdr);
 if (ret == -1) {
 qemu_chr_fe_set_handlers(&s->chr_pri_in, NULL, NULL, NULL,
  NULL, NULL, true);
@@ -552,7 +569,7 @@ static void compare_sec_chr_in(void *opaque, const uint8_t 
*buf, int size)
 CompareState *s = COLO_COMPARE(opaque);
 int ret;
 
-ret = net_fill_rstate(&s->sec_rs, buf, size, false);
+ret = net_fill_rstate(&s->sec_rs, buf, size, s->vnet_hdr);
 if (ret == -1) {
 qemu_chr_fe_set_handlers(&s->chr_sec_in, NULL, NULL, NULL,
  NULL, NULL, true);
@@ -675,7 +692,10 @@ static void compare_pri_rs_finalize(SocketReadState 
*pri_rs)
 
 if (packet_enqueue(s, PRIMARY_IN)) {
 trace_colo_compare_main("primary: unsupported packet in");
-compare_chr_send(&s->chr_out, pri_rs->buf, pri_rs->packet_len);
+compare_chr_send(s,
+ pri_rs->buf,
+ pri_rs->packet_len,
+ pri_rs->vnet_hdr_len);
 } else {
 /* compare connection */
 g_queue_foreach(&s->conn_list, colo_compare_connection, s);
@@ -783,7 +803,10 @@ static void colo_flush_packets(void *opaque, void 
*user_data)
 
 while (!g_queue_is_empty(&conn->primary_list)) {
 pkt = g_queue_pop_head(&conn->primary_list);
-compare_chr_send(&s->chr_out, pkt->data, pkt->size);
+compare_chr_send(s,
+ pkt->data,
+ pkt->size,
+ pkt->vnet_hdr_len);
 packet_destroy(pkt, NULL);
 }
 while (!g_queue_is_empty(&conn->secondary_list)) {
-- 
2.7.4






[Qemu-devel] [PATCH V4 03/12] net/filter-mirror.c: Make filter_mirror_send support vnet support.

2017-05-11 Thread Zhang Chen
In this patch, if vnet_hdr=on we change the send packet format from
struct {int size; const uint8_t buf[];} to {int size; int vnet_hdr_len; const 
uint8_t buf[];}.
make other module(like colo-compare) know how to parse net packet correctly.

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 3766414..64323fc 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -44,10 +44,11 @@ typedef struct MirrorState {
 SocketReadState rs;
 } MirrorState;
 
-static int filter_mirror_send(CharBackend *chr_out,
+static int filter_mirror_send(MirrorState *s,
   const struct iovec *iov,
   int iovcnt)
 {
+NetFilterState *nf = NETFILTER(s);
 int ret = 0;
 ssize_t size = 0;
 uint32_t len = 0;
@@ -59,14 +60,38 @@ static int filter_mirror_send(CharBackend *chr_out,
 }
 
 len = htonl(size);
-ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)&len, sizeof(len));
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
 if (ret != sizeof(len)) {
 goto err;
 }
 
+if (s->vnet_hdr) {
+/*
+ * If vnet_hdr = on, we send vnet header len to make other
+ * module(like colo-compare) know how to parse net
+ * packet correctly.
+ */
+ssize_t vnet_hdr_len;
+
+if (nf->netdev->using_vnet_hdr) {
+vnet_hdr_len = nf->netdev->vnet_hdr_len;
+} else if (nf->netdev->peer->using_vnet_hdr) {
+vnet_hdr_len = nf->netdev->peer->vnet_hdr_len;
+} else {
+error_report("filter get vnet_hdr_len failed");
+goto err;
+}
+
+len = htonl(vnet_hdr_len);
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
+if (ret != sizeof(len)) {
+goto err;
+}
+}
+
 buf = g_malloc(size);
 iov_to_buf(iov, iovcnt, 0, buf, size);
-ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)buf, size);
+ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
 g_free(buf);
 if (ret != size) {
 goto err;
@@ -142,7 +167,7 @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
 MirrorState *s = FILTER_MIRROR(nf);
 int ret;
 
-ret = filter_mirror_send(&s->chr_out, iov, iovcnt);
+ret = filter_mirror_send(s, iov, iovcnt);
 if (ret) {
 error_report("filter_mirror_send failed(%s)", strerror(-ret));
 }
@@ -165,7 +190,7 @@ static ssize_t filter_redirector_receive_iov(NetFilterState 
*nf,
 int ret;
 
 if (qemu_chr_fe_get_driver(&s->chr_out)) {
-ret = filter_mirror_send(&s->chr_out, iov, iovcnt);
+ret = filter_mirror_send(s, iov, iovcnt);
 if (ret) {
 error_report("filter_mirror_send failed(%s)", strerror(-ret));
 }
-- 
2.7.4






[Qemu-devel] [PATCH V4 01/12] net: Add vnet_hdr_len related arguments in NetClientState

2017-05-11 Thread Zhang Chen
Add vnet_hdr_len and using_vnet_hdr arguments in NetClientState
that make othermodule get real vnet_hdr_len easily.

Signed-off-by: Zhang Chen 
---
 include/net/net.h | 2 ++
 net/net.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 99b28d5..70edfc0 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -100,6 +100,8 @@ struct NetClientState {
 unsigned int queue_index;
 unsigned rxfilter_notify_enabled:1;
 int vring_enable;
+bool using_vnet_hdr;
+int vnet_hdr_len;
 QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
 };
 
diff --git a/net/net.c b/net/net.c
index 0ac3b9e..a00a0c9 100644
--- a/net/net.c
+++ b/net/net.c
@@ -472,6 +472,7 @@ void qemu_using_vnet_hdr(NetClientState *nc, bool enable)
 return;
 }
 
+nc->using_vnet_hdr = enable;
 nc->info->using_vnet_hdr(nc, enable);
 }
 
@@ -491,6 +492,7 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
 return;
 }
 
+nc->vnet_hdr_len = len;
 nc->info->set_vnet_hdr_len(nc, len);
 }
 
-- 
2.7.4






[Qemu-devel] [PATCH V4 00/12] Add COLO-proxy virtio-net support

2017-05-11 Thread Zhang Chen
If user use -device virtio-net-pci, virtio-net driver will add a header
to raw net packet that colo-proxy can't handle it. COLO-proxy just
focus on the packet payload, so we skip the virtio-net header to compare
the sent packet that primary guest's to secondary guest's.

V4:
 - Add vnet_hdr option for filter-mirror, filter-redirector,
   filter-rewriter,colo-compare.
 - Use new design to impliment virtio-net support for colo-proxy.
 - Fix codestyle.
 - Remove unused option for filter-rewriter.
 - Add filter-rewriter virtio-net support.
 - Address other comments.


Zhang Chen (12):
  net: Add vnet_hdr_len related arguments in NetClientState
  net/filter-mirror.c: Add new option to enable vnet support for
filter-mirror
  net/filter-mirror.c: Make filter_mirror_send support vnet support.
  net/filter-mirror.c: Add new option to enable vnet support for
filter-redirector
  net/net.c: Add vnet_hdr support in SocketReadState
  net/colo-compare.c: Add new option to enable vnet support for
colo-compare
  net/colo.c: Make vnet_hdr_len as packet property
  net/colo-compare.c: Make colo-compare support vnet_hdr_len
  net/colo.c: Add vnet packet parse feature in colo-proxy
  net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare
  net/filter-rewriter.c: Add new option to enable vnet support for
filter-rewriter
  net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len

 include/net/net.h |  11 +-
 net/colo-compare.c|  99 ++-
 net/colo.c|   9 +++--
 net/colo.h|   4 +-
 net/filter-mirror.c   | 104 +++---
 net/filter-rewriter.c |  55 +-
 net/net.c |  38 --
 net/socket.c  |   2 +-
 qemu-options.hx   |  17 +
 9 files changed, 296 insertions(+), 43 deletions(-)

-- 
2.7.4






[Qemu-devel] [PATCH 2/3] net/filter-mirror.c: Rename filter_mirror_send() and fix codestyle

2017-05-11 Thread Zhang Chen
Because filter_mirror_receive_iov() and filter_redirector_receive_iov()
both use the filter_mirror_send() to send packet, so I change
filter_mirror_send() to filter_send() that looks more common.
And fix some codestyle.

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index fd0322f..8b1b069 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -43,9 +43,9 @@ typedef struct MirrorState {
 SocketReadState rs;
 } MirrorState;
 
-static int filter_mirror_send(CharBackend *chr_out,
-  const struct iovec *iov,
-  int iovcnt)
+static int filter_send(CharBackend *chr_out,
+   const struct iovec *iov,
+   int iovcnt)
 {
 int ret = 0;
 ssize_t size = 0;
@@ -141,9 +141,9 @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
 MirrorState *s = FILTER_MIRROR(nf);
 int ret;
 
-ret = filter_mirror_send(&s->chr_out, iov, iovcnt);
+ret = filter_send(&s->chr_out, iov, iovcnt);
 if (ret) {
-error_report("filter_mirror_send failed(%s)", strerror(-ret));
+error_report("filter mirror send failed(%s)", strerror(-ret));
 }
 
 /*
@@ -164,9 +164,9 @@ static ssize_t filter_redirector_receive_iov(NetFilterState 
*nf,
 int ret;
 
 if (qemu_chr_fe_get_driver(&s->chr_out)) {
-ret = filter_mirror_send(&s->chr_out, iov, iovcnt);
+ret = filter_send(&s->chr_out, iov, iovcnt);
 if (ret) {
-error_report("filter_mirror_send failed(%s)", strerror(-ret));
+error_report("filter redirector send failed(%s)", strerror(-ret));
 }
 return iov_size(iov, iovcnt);
 } else {
@@ -286,8 +286,9 @@ static char *filter_redirector_get_indev(Object *obj, Error 
**errp)
 return g_strdup(s->indev);
 }
 
-static void
-filter_redirector_set_indev(Object *obj, const char *value, Error **errp)
+static void filter_redirector_set_indev(Object *obj,
+const char *value,
+Error **errp)
 {
 MirrorState *s = FILTER_REDIRECTOR(obj);
 
@@ -302,8 +303,9 @@ static char *filter_mirror_get_outdev(Object *obj, Error 
**errp)
 return g_strdup(s->outdev);
 }
 
-static void
-filter_mirror_set_outdev(Object *obj, const char *value, Error **errp)
+static void filter_mirror_set_outdev(Object *obj,
+ const char *value,
+ Error **errp)
 {
 MirrorState *s = FILTER_MIRROR(obj);
 
@@ -323,8 +325,9 @@ static char *filter_redirector_get_outdev(Object *obj, 
Error **errp)
 return g_strdup(s->outdev);
 }
 
-static void
-filter_redirector_set_outdev(Object *obj, const char *value, Error **errp)
+static void filter_redirector_set_outdev(Object *obj,
+ const char *value,
+ Error **errp)
 {
 MirrorState *s = FILTER_REDIRECTOR(obj);
 
-- 
2.7.4






[Qemu-devel] [PATCH 1/3] net/filter-mirror.c: Remove duplicate check code.

2017-05-11 Thread Zhang Chen
The s->outdev have checked in filter_mirror_set_outdev().

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 72fa7c2..fd0322f 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -194,12 +194,6 @@ static void filter_mirror_setup(NetFilterState *nf, Error 
**errp)
 MirrorState *s = FILTER_MIRROR(nf);
 Chardev *chr;
 
-if (!s->outdev) {
-error_setg(errp, "filter mirror needs 'outdev' "
-   "property set");
-return;
-}
-
 chr = qemu_chr_find(s->outdev);
 if (chr == NULL) {
 error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
-- 
2.7.4






[Qemu-devel] [PATCH 3/3] net/filter-rewriter: Remove unused option in filter-rewirter

2017-05-11 Thread Zhang Chen
Signed-off-by: Zhang Chen 
---
 qemu-options.hx | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 70c0ded..f5e088e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4038,7 +4038,8 @@ Create a filter-redirector we need to differ outdev id 
from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
 
-@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid},rewriter-mode=@var{mode}[,queue=@var{all|rx|tx}]
+@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid},
+[,queue=@var{all|rx|tx}]
 
 Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
 secondary from primary to keep secondary tcp connection,and rewrite
-- 
2.7.4






[Qemu-devel] [PATCH 0/3] Optimize filter-mirror and filter-rewriter

2017-05-11 Thread Zhang Chen
Fix some duplicate codes and remove unused codes.

Zhang Chen (3):
  net/filter-mirror.c: Remove duplicate check code.
  net/filter-mirror.c: Rename filter_mirror_send() and fix codestyle
  net/filter-rewriter: Remove unused option in filter-rewirter

 net/filter-mirror.c | 35 ---
 qemu-options.hx |  3 ++-
 2 files changed, 18 insertions(+), 20 deletions(-)

-- 
2.7.4






Re: [Qemu-devel] [PATCH 2/8] target/arm: optimize smul_dual() and neon_trn_u8() using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

Hi Richard,

On 05/10/2017 05:32 PM, Philippe Mathieu-Daudé wrote:

On 05/10/2017 05:20 PM, Richard Henderson wrote:

On 05/10/2017 01:05 PM, Philippe Mathieu-Daudé wrote:

-tcg_gen_shri_i32(t1, t1, 8);
-tcg_gen_andi_i32(t1, t1, 0x00ff00ff);
+tcg_gen_extract_i32(t1, t1, 8, 0x00ff00ff);


This is very wrong.  See my previous comment.


Indeed, after correcting the script:

$ docker run -it -v `pwd`:`pwd` -w `pwd` petersenna/coccinelle --sp-file 
scripts/coccinelle/tcg_gen_extract.cocci --macro-file 
scripts/cocci-macro-file.h target/arm/translate.c --in-place

init_defs_builtins: /usr/lib64/coccinelle/standard.h
init_defs: scripts/cocci-macro-file.h
HANDLING: target/arm/translate.c
candidate at target/arm/translate.c:4703
  op_size: i32/i32 (same)
  low_bits: 8 (value: 0xff)
  len: 0xff00ff
  len_bits != low_bits
  candidate is NOT optimizable

candidate at target/arm/translate.c:342
  op_size: i32/i32 (same)
  low_bits: 8 (value: 0xff)
  len: 0xff00ff
  len_bits != low_bits
  candidate is NOT optimizable



Arghhh I see, I checked manually and though I had it...

I'll first check with the cocci script if it can handles this better
then review the serie manually before bother you again.

Thinking about it, this should be quite easy unit-testable somehow ...
Not sure if I want to start this path although.

Sorry for the noise and thank a lot for the review!

Phil.




Re: [Qemu-devel] [PATCH 4/8] target/cris: optimize gen_swapb() using extract op

2017-05-11 Thread Philippe Mathieu-Daudé

Hi,

As reviewd by Richard this patch is WRONG, so need to further review :)

http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg02551.html

On 05/10/2017 05:05 PM, Philippe Mathieu-Daudé wrote:

Applied using Coccinelle script.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/cris/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 0ee05ca02d..c03403ac62 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -442,8 +442,7 @@ static inline void t_gen_swapb(TCGv d, TCGv s)
 tcg_gen_mov_tl(org_s, s);
 tcg_gen_shli_tl(t, org_s, 8);
 tcg_gen_andi_tl(d, t, 0xff00ff00);
-tcg_gen_shri_tl(t, org_s, 8);
-tcg_gen_andi_tl(t, t, 0x00ff00ff);
+tcg_gen_extract_tl(t, org_s, 8, 0x00ff00ff);
 tcg_gen_or_tl(d, d, t);
 tcg_temp_free(t);
 tcg_temp_free(org_s);



corrected Coccinelle script displays:

candidate at target/cris/translate.c:445
  op_size: tl/tl (same)
  low_bits: 8 (value: 0xff)
  len: 0xff00ff
  len_bits != low_bits
  candidate is NOT optimizable




Re: [Qemu-devel] [RFC v1 8/9] virtio-crypto: add host feature bits support

2017-05-11 Thread Gonglei (Arei)
>
> From: Cornelia Huck [mailto:cornelia.h...@de.ibm.com]
> Sent: Thursday, May 11, 2017 11:05 PM
> Subject: Re: [RFC v1 8/9] virtio-crypto: add host feature bits support
> 
> On Mon, 8 May 2017 19:38:23 +0800
> Gonglei  wrote:
> 
> > We enable all feature bits acquiescently.
> >
> > Signed-off-by: Gonglei 
> > ---
> >  hw/virtio/virtio-crypto.c | 15 +++
> >  include/hw/virtio/virtio-crypto.h |  1 +
> >  2 files changed, 16 insertions(+)
> >
> > diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
> > index 5422f25..3dc0ff2 100644
> > --- a/hw/virtio/virtio-crypto.c
> > +++ b/hw/virtio/virtio-crypto.c
> > @@ -1034,6 +1034,11 @@ static uint64_t
> virtio_crypto_get_features(VirtIODevice *vdev,
> > uint64_t features,
> > Error **errp)
> >  {
> > +VirtIOCrypto *vcrypto = VIRTIO_CRYPTO(vdev);
> > +
> > +/* Firstly sync all virtio-crypto possible supported features */
> > +features |= vcrypto->host_features;
> > +
> >  return features;
> >  }
> >
> > @@ -1144,6 +1149,16 @@ static const VMStateDescription
> vmstate_virtio_crypto = {
> >  };
> >
> >  static Property virtio_crypto_properties[] = {
> > +DEFINE_PROP_BIT("mux_mode", VirtIOCrypto, host_features,
> > +VIRTIO_CRYPTO_F_MUX_MODE, true),
> > +DEFINE_PROP_BIT("cipher_stateless_mode", VirtIOCrypto,
> host_features,
> > +VIRTIO_CRYPTO_F_CIPHER_STATELESS_MODE,
> true),
> > +DEFINE_PROP_BIT("hash_stateless_mode", VirtIOCrypto,
> host_features,
> > +VIRTIO_CRYPTO_F_HASH_STATELESS_MODE,
> true),
> > +DEFINE_PROP_BIT("mac_stateless_mode", VirtIOCrypto,
> host_features,
> > +VIRTIO_CRYPTO_F_MAC_STATELESS_MODE, true),
> > +DEFINE_PROP_BIT("aead_stateless_mode", VirtIOCrypto,
> host_features,
> > +VIRTIO_CRYPTO_F_AEAD_STATELESS_MODE,
> true),
> >  DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > diff --git a/include/hw/virtio/virtio-crypto.h 
> > b/include/hw/virtio/virtio-crypto.h
> > index 465ad20..30ea51d 100644
> > --- a/include/hw/virtio/virtio-crypto.h
> > +++ b/include/hw/virtio/virtio-crypto.h
> > @@ -97,6 +97,7 @@ typedef struct VirtIOCrypto {
> >  int multiqueue;
> >  uint32_t curr_queues;
> >  size_t config_size;
> > +uint32_t host_features;
> 
> I'd just make that 64 bits from the start.
> 
Yes, that's better.

> >  } VirtIOCrypto;
> >
> >  #endif /* _QEMU_VIRTIO_CRYPTO_H */
> 
> Don't you need some kind of compat handling?

I did that in patch 6 according to the results of those feature bits negotiated.
Patch 9 tests both session mode and stateless mode, they are work. :)

Thanks,
-Gonglei



Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-11 Thread Xu, Anthony
Hi Paolo,

In KVM mode, seems A20 is ignored.
Do you see any potential issue here?


Anthony 


> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Thursday, May 11, 2017 9:35 AM
> To: Paolo Bonzini 
> Cc: qemu-devel@nongnu.org; Xu, Anthony 
> Subject: Re: [PATCH] target/i386: enable A20 automatically in system
> management mode
> 
> On Thu, May 11, 2017 at 05:32:47PM +0200, Paolo Bonzini wrote:
> > On 11/05/2017 16:53, Kevin O'Connor wrote:
> > > On Thu, May 11, 2017 at 01:35:28PM +0200, Paolo Bonzini wrote:
> > >> Ignore env->a20_mask when running in system management mode.
> > >
> > > Thanks Paolo.  I don't think this patch will help SeaBIOS though.  The
> > > SeaBIOS SMM handler doesn't do much - it doesn't even access ram
> above
> > > 1MiB.  See SeaBIOS' code in src/fw/smm.c:handle_smi().
> > >
> > > Instead, the SeaBIOS code does a cpu state backup/restore to switch
> > > into 32bit mode.  I thought the A20 state would be part of that cpu
> > > backup/restore.  However, looking at the Intel SDM docs now, it's not
> > > really clear to me how the processor "inhibits" A20 when in SMM mode -
> > > does it save/restore that state on SMI/RSM or does it have special
> > > logic to ignore A20 while in SMM mode?
> >
> > There isn't any documented place for A20 in the state save map (I checked
> > AMD's BIOS/Kernel Developer Guide which is pretty comprehensive), so I
> > think the latter is more plausible.  What I'm doing in this patch is
> > ignoring A20 while in SMM mode.
> 
> Okay.
> 
> > Then you would have to add an A20 save/restore in handle_smi; since
> > CALL32SMM_ENTERID should not nest, I think you can just do this:
> 
> Yes, that should be fine.
> 
> > --- a/src/fw/smm.c
> > +++ b/src/fw/smm.c
> > @@ -54,7 +54,8 @@ struct smm_layout {
> >  struct smm_state backup2;
> >  u8 stack[0x7c00];
> >  u64 codeentry;
> > -u8 pad_8008[0x7df8];
> > +u8 a20;
> > +u8 pad_8009[0x7df7];
> >  struct smm_state cpu;
> >  };
> 
> In order to avoid mixing code and data in the same cache line we could
> do this instead:
> 
>  struct smm_layout {
>  struct smm_state backup1;
>  struct smm_state backup2;
> -u8 stack[0x7c00];
> +u32 backup_a20;
> +u8 stack[0x8000 - sizeof(struct smm_state)*2 - sizeof(u32)];
>  u64 codeentry;
>  u8 pad_8008[0x7df8];
>  struct smm_state cpu;
> 
> Thanks,
> -Kevin



Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-11 Thread Dan Williams
[ adding nvdimm mailing list ]

On Wed, May 10, 2017 at 8:56 AM, Pankaj Gupta  wrote:
> We are sharing initial project proposal for
> 'KVM "fake DAX" device flushing' project for feedback.
> Got the idea during discussion with 'Rik van Riel'.
>
> Also, request answers to 'Questions' section.
>
> Abstract :
> --
> Project idea is to use fake persistent memory with direct
> access(DAX) in virtual machines. Overall goal of project
> is to increase the number of virtual machines that can be
> run on a physical machine, in order to increase the density
> of customer virtual machines.
>
> The idea is to avoid the guest page cache, and minimize the
> memory footprint of virtual machines. By presenting a disk
> image as a nvdimm direct access (DAX) memory region in a
> virtual machine, the guest OS can avoid using page cache
> memory for most file accesses.
>
> Problem Statement :
> --
> * Guest uses page cache in memory to process fast requests
>   for disk read/write. This results in big memory footprint
>   of guests without host knowing much details of the guest
>   memory.
>
> * If guests use direct access(DAX) with fake persistent
>   storage, the host manages the page cache for guests,
>   allowing the host to easily reclaim/evict less frequently
>   used page cache pages without requiring guest cooperation,
>   like ballooning would.
>
> * Host manages guest cache as ‘mmaped’ disk image area in
>   qemu address space. This region is passed to guest as fake
>   persistent memory range. We need a new flushing interface
>   to flush this cache to secondary storage to persist guest
>   writes.
>
> * New asynchronous flushing interface will allow guests to
>   cause the host flush the dirty data to backup storage file.
>   Systems with pmem storage make use of CLFLUSH instruction
>   to flush single cache line to persistent storage and it
>   takes care of flushing. With fake persistent storage in
>   guest we cannot depend on CLFLUSH instruction to flush entire
>   dirty cache to backing storage. Even If we trap and emulate
>   CLFLUSH instruction guest vCPU has to wait till we flush all
>   the dirty memory. Instead of this we need to implement a new
>   asynchronous guest flushing interface, which allows the guest
>   to specify a larger range to be flushed at once, and allows
>   the vCPU to run something else while the data is being synced
>   to disk.
>
> * New flushing interface will consists of a para virt driver to
>   new fake nvdimm like device which will process guest flushing
>   requests like fsync/msync etc instead of pmem library calls
>   like clflush. The corresponding device at host side will be
>   responsible for flushing requests for guest dirty pages.
>   Guest can put current task in sleep and vCPU can run any other
>   task while host side flushing of guests pages is in progress.
>
> Host controlled fake nvdimm DAX to avoid guest page cache :
> -
> * Bypass guest page cache by using a fake persistent storage
>   like nvdimm & DAX. Guest Read/Write is directly done on
>   fake persistent storage without involving guest kernel for
>   caching data.
>
> * Fake nvdimm device passed to guest is backed by a regular
>   file in host stored in secondary storage.
>
> * Qemu has implementation of fake NVDIMM/DAX device. Use this
>   capability of passing regular host file(disk) as nvdimm device
>   to guest.
>
> * Nvdimm with DAX works for ext4/xfs filesystem. Supported
>   filesystem should be DAX compatible.
>
> * As we are using guest disk as fake DAX/NVDIMM device, we
>   need a mechanism for persistence of data backed on regular
>   host storage file.
>
> * For live migration use case, if host side backing file is
>   shared storage, we need to flush the page cache for the disk
>   image at the destination (new fadvise interface, FADV_INVALIDATE_CACHE?)
>   before starting execution of the guest on the destination host.
>
> Design :
> -
> * In order to not have page cache inside the guest, qemu would:
>
>  1) mmap the guest's disk image and present that disk image to
> the guest as a persistent memory range.
>
>  2) Present information to the guest telling it that the persistent
> memory range is not physical persistent memory.
>
>  3) Present an additional paravirt device alongside the persistent
> memory range, that can be used to sync (ranges of) data to disk.
>
> * Guest would use the disk image mostly like a persistent memory
>   device, with two exceptions:
>
>   1) It would not tell userspace that the files on that device are
>  persistent memory. This is  done so userspace knows to call
>  fsync/msync, instead of the pmem clflush library call.

There are no (safe) pmem applications today that can get by without
calling fsync/msync after an mmap write to a file on ext4 or xfs.
We're trying to fix that, more details below.

>   2) When userspace calls fsync/msync on files on 

Re: [Qemu-devel] [PATCH] alpha-user: wire epoll_create, epoll_ctl, epoll_wait

2017-05-11 Thread Sergei Trofimovich
On Sat,  8 Apr 2017 20:33:22 +0100
Sergei Trofimovich  wrote:

> Noticed when ran GHC on alpha:
> $ qemu-alpha -L /usr/alpha-unknown-linux-gnu/ /tmp/a
> qemu: Unsupported syscall: 407
> 
> linux-user/syscall.c does have 'epoll_create' wiring,
> but under nondeprecated name.
> 
> Instead of defining both
> TARGET_NR_sys_epoll_create
> and
> TARGET_NR_epoll_create
> I've renamed former to later as old name is not used
> anywhere else in qemu.
> 
> After this change GHC works fine under qemu-alpha:
> $ ./alpha-linux-user/qemu-alpha -L /usr/alpha-unknown-linux-gnu/ /tmp/a
> ...
> 
> Cc: Peter Maydell 
> Cc: Riku Voipio 
> Cc: qemu-devel@nongnu.org
> Signed-off-by: Sergei Trofimovich 
> ---
>  linux-user/alpha/syscall_nr.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/linux-user/alpha/syscall_nr.h b/linux-user/alpha/syscall_nr.h
> index 00e14bb6b3..e848154663 100644
> --- a/linux-user/alpha/syscall_nr.h
> +++ b/linux-user/alpha/syscall_nr.h
> @@ -343,9 +343,9 @@
>  #define TARGET_NR_io_cancel  402
>  #define TARGET_NR_exit_group 405
>  #define TARGET_NR_lookup_dcookie 406
> -#define TARGET_NR_sys_epoll_create   407
> -#define TARGET_NR_sys_epoll_ctl  408
> -#define TARGET_NR_sys_epoll_wait 409
> +#define TARGET_NR_epoll_create   407
> +#define TARGET_NR_epoll_ctl  408
> +#define TARGET_NR_epoll_wait 409
>  #define TARGET_NR_remap_file_pages   410
>  #define TARGET_NR_set_tid_address411
>  #define TARGET_NR_restart_syscall412
> -- 
> 2.12.2
> 

Ping.

-- 

  Sergei


pgp3xU4Tvhddl.pgp
Description: Цифровая подпись OpenPGP


Re: [Qemu-devel] [PATCH v7 13/13] MAINTAINERS: Add vfio-ccw maintainer

2017-05-11 Thread Alex Williamson
On Fri,  5 May 2017 04:03:52 +0200
Dong Jia Shi  wrote:

> Add Cornelia Huck as the vfio-ccw maintainer.
> 
> Signed-off-by: Dong Jia Shi 
> ---


Acked-by: Alex Williamson 


>  MAINTAINERS | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cae3b09..c1f9917 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -999,6 +999,11 @@ S: Supported
>  F: hw/vfio/*
>  F: include/hw/vfio/
>  
> +vfio-ccw
> +M: Cornelia Huck 
> +S: Supported
> +F: hw/vfio/ccw.c
> +
>  vhost
>  M: Michael S. Tsirkin 
>  S: Supported




Re: [Qemu-devel] [PATCH v7 12/13] vfio/ccw: update sense data if a unit check is pending

2017-05-11 Thread Alex Williamson
On Fri,  5 May 2017 04:03:51 +0200
Dong Jia Shi  wrote:

> Concurrent-sense data is currently not delivered. This patch stores
> the concurrent-sense data to the subchannel if a unit check is pending
> and the concurrent-sense bit is enabled. Then a TSCH can retreive the
> right IRB data back to the guest.
> 
> Signed-off-by: Dong Jia Shi 
> ---


Acked-by: Alex Williamson 


>  hw/vfio/ccw.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 3c8b518..73f326f 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -94,6 +94,7 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
>  CcwDevice *ccw_dev = CCW_DEVICE(cdev);
>  SubchDev *sch = ccw_dev->sch;
>  SCSW *s = &sch->curr_status.scsw;
> +PMCW *p = &sch->curr_status.pmcw;
>  IRB irb;
>  int size;
>  
> @@ -143,6 +144,12 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
>  /* Update control block via irb. */
>  copy_scsw_to_guest(s, &irb.scsw);
>  
> +/* If a uint check is pending, copy sense data. */
> +if ((s->dstat & SCSW_DSTAT_UNIT_CHECK) &&
> +(p->chars & PMCW_CHARS_MASK_CSENSE)) {
> +memcpy(sch->sense_data, irb.ecw, sizeof(irb.ecw));
> +}
> +
>  read_err:
>  css_inject_io_interrupt(sch);
>  }




Re: [Qemu-devel] [PATCH v7 10/13] s390x/css: introduce and realize ccw-request callback

2017-05-11 Thread Alex Williamson
On Fri,  5 May 2017 04:03:49 +0200
Dong Jia Shi  wrote:

> From: Xiao Feng Ren 
> 
> Introduce a new callback on subchannel to handle ccw-request.
> Realize the callback in vfio-ccw device. Besides, resort to
> the event notifier handler to handling the ccw-request results.
> 1. Pread the I/O results via MMIO region.
> 2. Update the scsw info to guest.
> 3. Inject an I/O interrupt to notify guest the I/O result.
> 
> Signed-off-by: Xiao Feng Ren 
> Signed-off-by: Dong Jia Shi 
> ---


Acked-by: Alex Williamson 


>  hw/s390x/css.c |  4 +--
>  hw/s390x/s390-ccw.h|  1 +
>  hw/vfio/ccw.c  | 85 
> ++
>  include/hw/s390x/css.h |  2 ++
>  4 files changed, 90 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> index 1052eea..507c60f 100644
> --- a/hw/s390x/css.c
> +++ b/hw/s390x/css.c
> @@ -259,7 +259,7 @@ uint16_t css_build_subchannel_id(SubchDev *sch)
>  return css_do_build_subchannel_id(sch->cssid, sch->ssid);
>  }
>  
> -static void css_inject_io_interrupt(SubchDev *sch)
> +void css_inject_io_interrupt(SubchDev *sch)
>  {
>  uint8_t isc = (sch->curr_status.pmcw.flags & PMCW_FLAGS_MASK_ISC) >> 11;
>  
> @@ -668,7 +668,7 @@ static void copy_pmcw_to_guest(PMCW *dest, const PMCW 
> *src)
>  dest->chars = cpu_to_be32(src->chars);
>  }
>  
> -static void copy_scsw_to_guest(SCSW *dest, const SCSW *src)
> +void copy_scsw_to_guest(SCSW *dest, const SCSW *src)
>  {
>  dest->flags = cpu_to_be16(src->flags);
>  dest->ctrl = cpu_to_be16(src->ctrl);
> diff --git a/hw/s390x/s390-ccw.h b/hw/s390x/s390-ccw.h
> index b58d8e9..9f45cf1 100644
> --- a/hw/s390x/s390-ccw.h
> +++ b/hw/s390x/s390-ccw.h
> @@ -33,6 +33,7 @@ typedef struct S390CCWDeviceClass {
>  CCWDeviceClass parent_class;
>  void (*realize)(S390CCWDevice *dev, char *sysfsdev, Error **errp);
>  void (*unrealize)(S390CCWDevice *dev, Error **errp);
> +int (*handle_request) (ORB *, SCSW *, void *);
>  } S390CCWDeviceClass;
>  
>  #endif
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 6760cee..3c8b518 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -47,6 +47,36 @@ struct VFIODeviceOps vfio_ccw_ops = {
>  .vfio_compute_needs_reset = vfio_ccw_compute_needs_reset,
>  };
>  
> +static int vfio_ccw_handle_request(ORB *orb, SCSW *scsw, void *data)
> +{
> +S390CCWDevice *cdev = data;
> +VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
> +struct ccw_io_region *region = vcdev->io_region;
> +int ret;
> +
> +QEMU_BUILD_BUG_ON(sizeof(region->orb_area) != sizeof(ORB));
> +QEMU_BUILD_BUG_ON(sizeof(region->scsw_area) != sizeof(SCSW));
> +QEMU_BUILD_BUG_ON(sizeof(region->irb_area) != sizeof(IRB));
> +
> +memset(region, 0, sizeof(*region));
> +
> +memcpy(region->orb_area, orb, sizeof(ORB));
> +memcpy(region->scsw_area, scsw, sizeof(SCSW));
> +
> +again:
> +ret = pwrite(vcdev->vdev.fd, region,
> + vcdev->io_region_size, vcdev->io_region_offset);
> +if (ret != vcdev->io_region_size) {
> +if (errno == EAGAIN) {
> +goto again;
> +}
> +error_report("vfio-ccw: wirte I/O region failed with errno=%d", 
> errno);
> +return -errno;
> +}
> +
> +return region->ret_code;
> +}
> +
>  static void vfio_ccw_reset(DeviceState *dev)
>  {
>  CcwDevice *ccw_dev = DO_UPCAST(CcwDevice, parent_obj, dev);
> @@ -59,10 +89,62 @@ static void vfio_ccw_reset(DeviceState *dev)
>  static void vfio_ccw_io_notifier_handler(void *opaque)
>  {
>  VFIOCCWDevice *vcdev = opaque;
> +struct ccw_io_region *region = vcdev->io_region;
> +S390CCWDevice *cdev = S390_CCW_DEVICE(vcdev);
> +CcwDevice *ccw_dev = CCW_DEVICE(cdev);
> +SubchDev *sch = ccw_dev->sch;
> +SCSW *s = &sch->curr_status.scsw;
> +IRB irb;
> +int size;
>  
>  if (!event_notifier_test_and_clear(&vcdev->io_notifier)) {
>  return;
>  }
> +
> +size = pread(vcdev->vdev.fd, region, vcdev->io_region_size,
> + vcdev->io_region_offset);
> +if (size == -1) {
> +switch (errno) {
> +case ENODEV:
> +/* Generate a deferred cc 3 condition. */
> +s->flags |= SCSW_FLAGS_MASK_CC;
> +s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
> +s->ctrl |= (SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND);
> +goto read_err;
> +case EFAULT:
> +/* Memory problem, generate channel data check. */
> +s->ctrl &= ~SCSW_ACTL_START_PEND;
> +s->cstat = SCSW_CSTAT_DATA_CHECK;
> +s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
> +s->ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
> +   SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
> +goto read_err;
> +default:
> +/* Error, generate channel program check. */
> +s->ctrl &= ~SCSW_ACTL_START_PEND;
> +s->cstat = SCSW_CSTAT_PROG_CHE

Re: [Qemu-devel] [PATCH v7 07/13] vfio/ccw: vfio based subchannel passthrough driver

2017-05-11 Thread Alex Williamson
On Fri,  5 May 2017 04:03:46 +0200
Dong Jia Shi  wrote:

> From: Xiao Feng Ren 
> 
> We use the IOMMU_TYPE1 of VFIO to realize the subchannels
> passthrough, implement a vfio based subchannels passthrough
> driver called "vfio-ccw".
> 
> Support qemu parameters in the style of:
> "-device vfio-ccw,sysfsdev=$mdev_file_path,devno=xx.x.'
> 
> Signed-off-by: Xiao Feng Ren 
> Signed-off-by: Dong Jia Shi 
> ---


Acked-by: Alex Williamson 


>  default-configs/s390x-softmmu.mak |   1 +
>  hw/vfio/Makefile.objs |   1 +
>  hw/vfio/ccw.c | 187 
> ++
>  include/hw/vfio/vfio-common.h |   1 +
>  4 files changed, 190 insertions(+)
>  create mode 100644 hw/vfio/ccw.c
> 
> diff --git a/default-configs/s390x-softmmu.mak 
> b/default-configs/s390x-softmmu.mak
> index 36e15de..5576b0a 100644
> --- a/default-configs/s390x-softmmu.mak
> +++ b/default-configs/s390x-softmmu.mak
> @@ -4,4 +4,5 @@ CONFIG_VIRTIO=y
>  CONFIG_SCLPCONSOLE=y
>  CONFIG_S390_FLIC=y
>  CONFIG_S390_FLIC_KVM=$(CONFIG_KVM)
> +CONFIG_VFIO_CCW=$(CONFIG_LINUX)
>  CONFIG_WDT_DIAG288=y
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index 05e7fbb..c3ab909 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,6 +1,7 @@
>  ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
>  obj-$(CONFIG_PCI) += pci.o pci-quirks.o
> +obj-$(CONFIG_VFIO_CCW) += ccw.o
>  obj-$(CONFIG_SOFTMMU) += platform.o
>  obj-$(CONFIG_VFIO_XGMAC) += calxeda-xgmac.o
>  obj-$(CONFIG_VFIO_AMD_XGBE) += amd-xgbe.o
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> new file mode 100644
> index 000..7d2497c
> --- /dev/null
> +++ b/hw/vfio/ccw.c
> @@ -0,0 +1,187 @@
> +/*
> + * vfio based subchannel assignment support
> + *
> + * Copyright 2017 IBM Corp.
> + * Author(s): Dong Jia Shi 
> + *Xiao Feng Ren 
> + *Pierre Morel 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or(at
> + * your option) any version. See the COPYING file in the top-level
> + * directory.
> + */
> +
> +#include 
> +#include 
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/sysbus.h"
> +#include "hw/vfio/vfio.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/s390x/s390-ccw.h"
> +#include "hw/s390x/ccw-device.h"
> +
> +#define TYPE_VFIO_CCW "vfio-ccw"
> +typedef struct VFIOCCWDevice {
> +S390CCWDevice cdev;
> +VFIODevice vdev;
> +} VFIOCCWDevice;
> +
> +static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
> +{
> +vdev->needs_reset = false;
> +}
> +
> +/*
> + * We don't need vfio_hot_reset_multi and vfio_eoi operations for
> + * vfio_ccw device now.
> + */
> +struct VFIODeviceOps vfio_ccw_ops = {
> +.vfio_compute_needs_reset = vfio_ccw_compute_needs_reset,
> +};
> +
> +static void vfio_ccw_reset(DeviceState *dev)
> +{
> +CcwDevice *ccw_dev = DO_UPCAST(CcwDevice, parent_obj, dev);
> +S390CCWDevice *cdev = DO_UPCAST(S390CCWDevice, parent_obj, ccw_dev);
> +VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
> +
> +ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
> +}
> +
> +static void vfio_put_device(VFIOCCWDevice *vcdev)
> +{
> +g_free(vcdev->vdev.name);
> +vfio_put_base_device(&vcdev->vdev);
> +}
> +
> +static VFIOGroup *vfio_ccw_get_group(S390CCWDevice *cdev, Error **errp)
> +{
> +char *tmp, group_path[PATH_MAX];
> +ssize_t len;
> +int groupid;
> +
> +tmp = g_strdup_printf("/sys/bus/css/devices/%x.%x.%04x/%s/iommu_group",
> +  cdev->hostid.cssid, cdev->hostid.ssid,
> +  cdev->hostid.devid, cdev->mdevid);
> +len = readlink(tmp, group_path, sizeof(group_path));
> +g_free(tmp);
> +
> +if (len <= 0 || len >= sizeof(group_path)) {
> +error_setg(errp, "vfio: no iommu_group found");
> +return NULL;
> +}
> +
> +group_path[len] = 0;
> +
> +if (sscanf(basename(group_path), "%d", &groupid) != 1) {
> +error_setg(errp, "vfio: failed to read %s", group_path);
> +return NULL;
> +}
> +
> +return vfio_get_group(groupid, &address_space_memory, errp);
> +}
> +
> +static void vfio_ccw_realize(DeviceState *dev, Error **errp)
> +{
> +VFIODevice *vbasedev;
> +VFIOGroup *group;
> +CcwDevice *ccw_dev = DO_UPCAST(CcwDevice, parent_obj, dev);
> +S390CCWDevice *cdev = DO_UPCAST(S390CCWDevice, parent_obj, ccw_dev);
> +VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
> +S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
> +Error *err = NULL;
> +
> +/* Call the class init function for subchannel. */
> +if (cdc->realize) {
> +cdc->realize(cdev, vcdev->vdev.sysfsdev, &err);
> +if (err) {
> +goto out_err_propagate;
> +}
> +}
> +
> +group = vfio_ccw_get_group(cdev, &err);
> +if (!group) {
> +goto out_group_err;
> +}
> +
> +vcdev->vdev.ops = &vfio_ccw_ops;
> +vcdev->vdev.t

Re: [Qemu-devel] [PATCH v7 09/13] vfio/ccw: get irqs info and set the eventfd fd

2017-05-11 Thread Alex Williamson
On Fri,  5 May 2017 04:03:48 +0200
Dong Jia Shi  wrote:

> vfio-ccw resorts to the eventfd mechanism to communicate with userspace.
> We fetch the irqs info via the ioctl VFIO_DEVICE_GET_IRQ_INFO,
> register a event notifier to get the eventfd fd which is sent
> to kernel via the ioctl VFIO_DEVICE_SET_IRQS, then we can implement
> read operation once kernel sends the signal.
> 
> Signed-off-by: Dong Jia Shi 
> ---
>  hw/vfio/ccw.c | 103 
> ++
>  1 file changed, 103 insertions(+)
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 7ddcfd7..6760cee 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -22,6 +22,7 @@
>  #include "hw/vfio/vfio-common.h"
>  #include "hw/s390x/s390-ccw.h"
>  #include "hw/s390x/ccw-device.h"
> +#include "qemu/error-report.h"
>  
>  #define TYPE_VFIO_CCW "vfio-ccw"
>  typedef struct VFIOCCWDevice {
> @@ -30,6 +31,7 @@ typedef struct VFIOCCWDevice {
>  uint64_t io_region_size;
>  uint64_t io_region_offset;
>  struct ccw_io_region *io_region;
> +EventNotifier io_notifier;
>  } VFIOCCWDevice;
>  
>  static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
> @@ -54,6 +56,99 @@ static void vfio_ccw_reset(DeviceState *dev)
>  ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
>  }
>  
> +static void vfio_ccw_io_notifier_handler(void *opaque)
> +{
> +VFIOCCWDevice *vcdev = opaque;
> +
> +if (!event_notifier_test_and_clear(&vcdev->io_notifier)) {
> +return;
> +}
> +}
> +
> +static void vfio_ccw_register_io_notifier(VFIOCCWDevice *vcdev, Error **errp)
> +{
> +VFIODevice *vdev = &vcdev->vdev;
> +struct vfio_irq_info *irq_info;
> +struct vfio_irq_set *irq_set;
> +size_t argsz;
> +int32_t *pfd;
> +
> +if (vdev->num_irqs < VFIO_CCW_IO_IRQ_INDEX + 1) {
> +error_setg(errp, "vfio: unexpected number of io irqs %u",
> +   vdev->num_irqs);
> +return;
> +}
> +
> +argsz = sizeof(*irq_set);
> +irq_info = g_malloc0(argsz);
> +irq_info->index = VFIO_CCW_IO_IRQ_INDEX;
> +irq_info->argsz = argsz;
> +if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
> +  irq_info) < 0 || irq_info->count < 1) {
> +error_setg_errno(errp, errno, "vfio: Error getting irq info");
> +goto get_error;
> +}
> +
> +if (event_notifier_init(&vcdev->io_notifier, 0)) {
> +error_setg_errno(errp, errno,
> + "vfio: Unable to init event notifier for IO");
> +goto get_error;
> +}
> +
> +argsz = sizeof(*irq_set) + sizeof(*pfd);
> +irq_set = g_malloc0(argsz);
> +irq_set->argsz = argsz;
> +irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> + VFIO_IRQ_SET_ACTION_TRIGGER;
> +irq_set->index = VFIO_CCW_IO_IRQ_INDEX;
> +irq_set->start = 0;
> +irq_set->count = 1;
> +pfd = (int32_t *) &irq_set->data;
> +
> +*pfd = event_notifier_get_fd(&vcdev->io_notifier);
> +qemu_set_fd_handler(*pfd, vfio_ccw_io_notifier_handler, NULL, vcdev);
> +if (ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
> +error_setg(errp, "vfio: Failed to set up io notification");
> +qemu_set_fd_handler(*pfd, NULL, NULL, vcdev);
> +event_notifier_cleanup(&vcdev->io_notifier);
> +goto set_error;


nit, unnecessary goto here.  set_error label is unused if removed.

Otherwise,

Acked-by: Alex Williamson 


> +}
> +
> +set_error:
> +g_free(irq_set);
> +
> +get_error:
> +g_free(irq_info);
> +}
> +
> +static void vfio_ccw_unregister_io_notifier(VFIOCCWDevice *vcdev)
> +{
> +struct vfio_irq_set *irq_set;
> +size_t argsz;
> +int32_t *pfd;
> +
> +argsz = sizeof(*irq_set) + sizeof(*pfd);
> +irq_set = g_malloc0(argsz);
> +irq_set->argsz = argsz;
> +irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> + VFIO_IRQ_SET_ACTION_TRIGGER;
> +irq_set->index = VFIO_CCW_IO_IRQ_INDEX;
> +irq_set->start = 0;
> +irq_set->count = 1;
> +pfd = (int32_t *) &irq_set->data;
> +*pfd = -1;
> +
> +if (ioctl(vcdev->vdev.fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
> +error_report("vfio: Failed to de-assign device io fd: %m");
> +}
> +
> +qemu_set_fd_handler(event_notifier_get_fd(&vcdev->io_notifier),
> +NULL, NULL, vcdev);
> +event_notifier_cleanup(&vcdev->io_notifier);
> +
> +g_free(irq_set);
> +}
> +
>  static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
>  {
>  VFIODevice *vdev = &vcdev->vdev;
> @@ -173,8 +268,15 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
> **errp)
>  goto out_region_err;
>  }
>  
> +vfio_ccw_register_io_notifier(vcdev, &err);
> +if (err) {
> +goto out_notifier_err;
> +}
> +
>  return;
>  
> +out_notifier_err:
> +vfio_ccw_put_region(vcdev);
>  out_region_err:
>  vfio_put_device(vcdev);
>  out_device_err:
> @@ -195,6 +297,7 @@ static void vfio_ccw_unrealize(DeviceState *de

Re: [Qemu-devel] [PATCH v7 08/13] vfio/ccw: get io region info

2017-05-11 Thread Alex Williamson
On Fri,  5 May 2017 04:03:47 +0200
Dong Jia Shi  wrote:

> vfio-ccw provides an MMIO region for I/O operations. We fetch its
> information via ioctls here, then we can use it performing I/O
> instructions and retrieving I/O results later on.
> 
> Signed-off-by: Dong Jia Shi 
> ---


Acked-by: Alex Williamson 


>  hw/vfio/ccw.c | 54 ++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 7d2497c..7ddcfd7 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -12,6 +12,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  
>  #include "qemu/osdep.h"
> @@ -26,6 +27,9 @@
>  typedef struct VFIOCCWDevice {
>  S390CCWDevice cdev;
>  VFIODevice vdev;
> +uint64_t io_region_size;
> +uint64_t io_region_offset;
> +struct ccw_io_region *io_region;
>  } VFIOCCWDevice;
>  
>  static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
> @@ -50,6 +54,48 @@ static void vfio_ccw_reset(DeviceState *dev)
>  ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
>  }
>  
> +static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
> +{
> +VFIODevice *vdev = &vcdev->vdev;
> +struct vfio_region_info *info;
> +int ret;
> +
> +/* Sanity check device */
> +if (!(vdev->flags & VFIO_DEVICE_FLAGS_CCW)) {
> +error_setg(errp, "vfio: Um, this isn't a vfio-ccw device");
> +return;
> +}
> +
> +if (vdev->num_regions < VFIO_CCW_CONFIG_REGION_INDEX + 1) {
> +error_setg(errp, "vfio: Unexpected number of the I/O region %u",
> +   vdev->num_regions);
> +return;
> +}
> +
> +ret = vfio_get_region_info(vdev, VFIO_CCW_CONFIG_REGION_INDEX, &info);
> +if (ret) {
> +error_setg_errno(errp, -ret, "vfio: Error getting config info");
> +return;
> +}
> +
> +vcdev->io_region_size = info->size;
> +if (sizeof(*vcdev->io_region) != vcdev->io_region_size) {
> +error_setg(errp, "vfio: Unexpected size of the I/O region");
> +g_free(info);
> +return;
> +}
> +
> +vcdev->io_region_offset = info->offset;
> +vcdev->io_region = g_malloc0(info->size);
> +
> +g_free(info);
> +}
> +
> +static void vfio_ccw_put_region(VFIOCCWDevice *vcdev)
> +{
> +g_free(vcdev->io_region);
> +}
> +
>  static void vfio_put_device(VFIOCCWDevice *vcdev)
>  {
>  g_free(vcdev->vdev.name);
> @@ -122,8 +168,15 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
> **errp)
>  goto out_device_err;
>  }
>  
> +vfio_ccw_get_region(vcdev, &err);
> +if (err) {
> +goto out_region_err;
> +}
> +
>  return;
>  
> +out_region_err:
> +vfio_put_device(vcdev);
>  out_device_err:
>  vfio_put_group(group);
>  out_group_err:
> @@ -142,6 +195,7 @@ static void vfio_ccw_unrealize(DeviceState *dev, Error 
> **errp)
>  S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
>  VFIOGroup *group = vcdev->vdev.group;
>  
> +vfio_ccw_put_region(vcdev);
>  vfio_put_device(vcdev);
>  vfio_put_group(group);
>  




Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-11 Thread Rik van Riel
On Thu, 2017-05-11 at 14:17 -0400, Stefan Hajnoczi wrote:
> On Wed, May 10, 2017 at 09:26:00PM +0530, Pankaj Gupta wrote:
> > * For live migration use case, if host side backing file is 
> >   shared storage, we need to flush the page cache for the disk 
> >   image at the destination (new fadvise interface,
> > FADV_INVALIDATE_CACHE?) 
> >   before starting execution of the guest on the destination host.
> 
> Good point.  QEMU currently only supports live migration with
> O_DIRECT.
> I think the problem was that userspace cannot guarantee consistency
> in
> the general case.  If you find a solution to this problem for fake
> NVDIMM then maybe the QEMU block layer can also begin supporting live
> migration with buffered I/O.

I'll be happy to work with you on that, independently
of Pankaj's project.

It looks like the fadvise system call could be extended
pretty easily with an FADV_INVALIDATE_CACHE command, the
other side of which can simply hook into the existing
page cache invalidation code in the kernel.

Qemu will need to know whether the invalidation succeeded,
but that is something we can test for pretty easily before
returning to userspace.

-- 
All rights reversed

signature.asc
Description: This is a digitally signed message part


Re: [Qemu-devel] [PATCH V2] migration: expose qemu_announce_self() via qmp

2017-05-11 Thread Vlad Yasevich
On 02/20/2017 07:16 PM, Germano Veit Michel wrote:
> qemu_announce_self() is triggered by qemu at the end of migrations
> to update the network regarding the path to the guest l2addr.
> 
> however it is also useful when there is a network change such as
> an active bond slave swap. Essentially, it's the same as a migration
> from a network perspective - the guest moves to a different point
> in the network topology.
> 
> this exposes the function via qmp.
> 
> Signed-off-by: Germano Veit Michel 
> ---
>  include/migration/vmstate.h |  5 +
>  migration/savevm.c  | 30 +++---
>  qapi-schema.json| 18 ++
>  3 files changed, 42 insertions(+), 11 deletions(-)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 63e7b02..a08715c 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -1042,6 +1042,11 @@ int64_t self_announce_delay(int round)
>  return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>  }
> 
> +struct AnnounceRound {
> +QEMUTimer *timer;
> +int count;
> +};
> +
>  void dump_vmstate_json_to_file(FILE *out_fp);
> 
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 5ecd264..44e196b 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -118,29 +118,37 @@ static void qemu_announce_self_iter(NICState
> *nic, void *opaque)
>  qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
>  }
> 
> -
>  static void qemu_announce_self_once(void *opaque)
>  {
> -static int count = SELF_ANNOUNCE_ROUNDS;
> -QEMUTimer *timer = *(QEMUTimer **)opaque;
> +struct AnnounceRound *round = opaque;
> 
>  qemu_foreach_nic(qemu_announce_self_iter, NULL);
> 
> -if (--count) {
> +round->count--;
> +if (round->count) {
>  /* delay 50ms, 150ms, 250ms, ... */
> -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> -  self_announce_delay(count));
> +timer_mod(round->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +  self_announce_delay(round->count));
>  } else {
> -timer_del(timer);
> -timer_free(timer);
> +timer_del(round->timer);
> +timer_free(round->timer);
> +g_free(round);
>  }
>  }
> 
>  void qemu_announce_self(void)
>  {
> -static QEMUTimer *timer;
> -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, 
> &timer);
> -qemu_announce_self_once(&timer);
> +struct AnnounceRound *round = g_malloc(sizeof(struct AnnounceRound));
> +if (!round)
> +return;
> +round->count = SELF_ANNOUNCE_ROUNDS;
> +round->timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> qemu_announce_self_once, round);
> +qemu_announce_self_once(round);
> +}

So, I've been looking and this code and have been playing with it and with 
David's
patches and my patches to include virtio self announcements as well.  What I've 
discovered
is what I think is a possible packet amplification issue here.

This creates a new timer every time we do do a announce_self.  With just 
migration,
this is not an issue since you only migrate once at a time, so there is only 1 
timer.
With exposing this as an API, a user can potentially call it in a tight loop
and now you have a ton of timers being created.  Add in David's patches 
allowing timeouts
and retries to be configurable, and you may now have a ton of long lived timers.
Add in the patches I am working on to let virtio do self announcements too (to 
really fix
bonding issues), and now you add in a possibility of a lot of packets being 
sent for
each timeout (RARP, GARP, NA, IGMPv4 Reports, IGMPv6 Reports [even worse if 
MLD1 is used]).

As you can see, this can get rather ugly...

I think we need timer user here.  Migration and QMP being two to begin with.  
Each
one would get a single timer to play with.  If a given user already has a timer 
running,
we could return an error or just not do anything.

-vlad

> +
> +void qmp_announce_self(Error **errp)
> +{
> +qemu_announce_self();
>  }
> 
>  /***/
> diff --git a/qapi-schema.json b/qapi-schema.json
> index baa0d26..0d9bffd 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -6080,3 +6080,21 @@
>  #
>  ##
>  { 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] }
> +
> +##
> +# @announce-self:
> +#
> +# Trigger generation of broadcast RARP frames to update network switches.
> +# This can be useful when network bonds fail-over the active slave.
> +#
> +# Arguments: None.
> +#
> +# Example:
> +#
> +# -> { "execute": "announce-self" }
> +# <- { "return": {} }
> +#
> +# Since: 2.9
> +##
> +{ 'command': 'announce-self' }
> +
> 




Re: [Qemu-devel] KVM "fake DAX" device flushing

2017-05-11 Thread Rik van Riel
On Thu, 2017-05-11 at 12:15 -0700, Dan Williams wrote:
> On Thu, May 11, 2017 at 11:17 AM, Stefan Hajnoczi  m> wrote:
> > On Wed, May 10, 2017 at 09:26:00PM +0530, Pankaj Gupta wrote:
> > > We are sharing initial project proposal for
> > > 'KVM "fake DAX" device flushing' project for feedback.
> > > Got the idea during discussion with 'Rik van Riel'.
> > 
> > CCing NVDIMM folks.
> > 
> > > 
> > > Also, request answers to 'Questions' section.
> > > 
> > > Abstract :
> > > --
> > > Project idea is to use fake persistent memory with direct
> > > access(DAX) in virtual machines. Overall goal of project
> > > is to increase the number of virtual machines that can be
> > > run on a physical machine, in order to increase the density
> > > of customer virtual machines.
> > > 
> > > The idea is to avoid the guest page cache, and minimize the
> > > memory footprint of virtual machines. By presenting a disk
> > > image as a nvdimm direct access (DAX) memory region in a
> > > virtual machine, the guest OS can avoid using page cache
> > > memory for most file accesses.
> 
> How is this different than the solution that Clear Containers came up
> with?
> 
> https://lwn.net/Articles/644675/

Clear Containers uses MAP_PRIVATE with read-only
images.

This solution is about making read-write images
work.  When a program in the guest calls fsync,
we need to ensure the data has actually hit the
disk on the host side before fsync returns.

-- 
All rights reversed

signature.asc
Description: This is a digitally signed message part


Re: [Qemu-devel] [PATCH v7 06/13] s390x/css: device support for s390-ccw passthrough

2017-05-11 Thread Alex Williamson
On Fri,  5 May 2017 04:03:45 +0200
Dong Jia Shi  wrote:

> In order to support subchannels pass-through, we introduce a s390
> subchannel device called "s390-ccw" to hold the real subchannel info.
> The s390-ccw devices inherit from the abstract CcwDevice which connect
> to the existing virtual-css-bus.
> 
> Signed-off-by: Dong Jia Shi 
> ---
>  hw/s390x/Makefile.objs |   1 +
>  hw/s390x/s390-ccw.c| 138 
> +
>  hw/s390x/s390-ccw.h|  38 ++
>  3 files changed, 177 insertions(+)
>  create mode 100644 hw/s390x/s390-ccw.c
>  create mode 100644 hw/s390x/s390-ccw.h
> 
> diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
> index 41ac4ec..72a3d37 100644
> --- a/hw/s390x/Makefile.objs
> +++ b/hw/s390x/Makefile.objs
> @@ -13,3 +13,4 @@ obj-y += ccw-device.o
>  obj-y += s390-pci-bus.o s390-pci-inst.o
>  obj-y += s390-skeys.o
>  obj-$(CONFIG_KVM) += s390-skeys-kvm.o
> +obj-y += s390-ccw.o
> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
> new file mode 100644
> index 000..b1aadcd
> --- /dev/null
> +++ b/hw/s390x/s390-ccw.c
> @@ -0,0 +1,138 @@
> +/*
> + * s390 CCW Assignment Support
> + *
> + * Copyright 2017 IBM Corp
> + * Author(s): Dong Jia Shi 
> + *Xiao Feng Ren 
> + *Pierre Morel 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2
> + * or (at your option) any later version. See the COPYING file in the
> + * top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/sysbus.h"
> +#include "libgen.h"
> +#include "hw/s390x/css.h"
> +#include "hw/s390x/css-bridge.h"
> +#include "s390-ccw.h"
> +
> +static void s390_ccw_get_dev_info(S390CCWDevice *cdev,
> +  char *sysfsdev,
> +  Error **errp)
> +{
> +unsigned int cssid, ssid, devid;
> +char dev_path[PATH_MAX] = {0}, *tmp;
> +
> +if (!sysfsdev) {
> +error_setg(errp, "No host device provided");
> +error_append_hint(errp,
> +  "Use -device vfio-ccw,sysfsdev=PATH_TO_DEVICE\n");
> +return;
> +}
> +
> +if (!realpath(sysfsdev, dev_path)) {
> +error_setg_errno(errp, errno, "Host device '%s' not found", 
> sysfsdev);
> +return;
> +}
> +
> +cdev->mdevid = g_strdup(basename(dev_path));
> +
> +tmp = basename(dirname(dev_path));
> +sscanf(tmp, "%2x.%1x.%4x", &cssid, &ssid, &devid);


Seems like an oversight not to check this return value.


> +
> +cdev->hostid.cssid = cssid;
> +cdev->hostid.ssid = ssid;
> +cdev->hostid.devid = devid;
> +cdev->hostid.valid = true;
> +}
> +



Re: [Qemu-devel] [PATCH 3/7] curl: avoid recursive locking of BDRVCURLState mutex

2017-05-11 Thread Jeff Cody
On Wed, May 10, 2017 at 04:32:01PM +0200, Paolo Bonzini wrote:
> The curl driver has a ugly hack where, if it cannot find an empty CURLState,
> it just uses aio_poll to wait for one to be empty.  This is probably
> buggy when used together with dataplane, and the simplest way to fix it
> is to use coroutines instead.
> 
> A more immediate effect of the bug however is that it can cause a
> recursive call to curl_readv_bh_cb and recursively taking the
> BDRVCURLState mutex.  This causes a deadlock.
> 
> The fix is to unlock the mutex around aio_poll, but for cleanliness we
> should also take the mutex around all calls to curl_init_state, even if
> reaching the unlock/lock pair is impossible.  The same is true for
> curl_clean_state.
> 
> Reported-by: Richard W.M. Jones 
> Cc: jc...@redhat.com
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Paolo Bonzini 
> ---
>  block/curl.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/block/curl.c b/block/curl.c
> index 9a00fdc28e..b18e79bf54 100644
> --- a/block/curl.c
> +++ b/block/curl.c
> @@ -281,6 +281,7 @@ read_end:
>  return size * nmemb;
>  }
>  
> +/* Called with s->mutex held.  */
>  static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
>   CURLAIOCB *acb)
>  {
> @@ -453,6 +454,7 @@ static void curl_multi_timeout_do(void *arg)
>  #endif
>  }
>  
> +/* Called with s->mutex held.  */
>  static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
>  {
>  CURLState *state = NULL;
> @@ -471,7 +473,9 @@ static CURLState *curl_init_state(BlockDriverState *bs, 
> BDRVCURLState *s)
>  break;
>  }
>  if (!state) {
> +qemu_mutex_unlock(&s->mutex);
>  aio_poll(bdrv_get_aio_context(bs), true);
> +qemu_mutex_lock(&s->mutex);
>  }
>  } while(!state);
>  
> @@ -534,6 +538,7 @@ static CURLState *curl_init_state(BlockDriverState *bs, 
> BDRVCURLState *s)
>  return state;
>  }
>  
> +/* Called with s->mutex held.  */
>  static void curl_clean_state(CURLState *s)
>  {
>  int j;
> @@ -565,6 +570,7 @@ static void curl_detach_aio_context(BlockDriverState *bs)
>  BDRVCURLState *s = bs->opaque;
>  int i;
>  
> +qemu_mutex_lock(&s->mutex);
>  for (i = 0; i < CURL_NUM_STATES; i++) {
>  if (s->states[i].in_use) {
>  curl_clean_state(&s->states[i]);
> @@ -580,6 +586,7 @@ static void curl_detach_aio_context(BlockDriverState *bs)
>  curl_multi_cleanup(s->multi);
>  s->multi = NULL;
>  }
> +qemu_mutex_unlock(&s->mutex);
>  
>  timer_del(&s->timer);
>  }
> @@ -745,9 +752,12 @@ static int curl_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  }
>  
>  DPRINTF("CURL: Opening %s\n", file);
> +qemu_mutex_init(&s->mutex);

This mutex init is now done above possible returns on error, so we should
call qemu_mutex_destroy() on errors after this point.

>  s->aio_context = bdrv_get_aio_context(bs);
>  s->url = g_strdup(file);
> +qemu_mutex_lock(&s->mutex);
>  state = curl_init_state(bs, s);
> +qemu_mutex_unlock(&s->mutex);
>  if (!state)
>  goto out_noclean;
>  
> @@ -791,11 +801,12 @@ static int curl_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  }
>  DPRINTF("CURL: Size = %zd\n", s->len);
>  
> +qemu_mutex_lock(&s->mutex);
>  curl_clean_state(state);
> +qemu_mutex_unlock(&s->mutex);
>  curl_easy_cleanup(state->curl);
>  state->curl = NULL;
>  
> -qemu_mutex_init(&s->mutex);
>  curl_attach_aio_context(bs, bdrv_get_aio_context(bs));
>  
>  qemu_opts_del(opts);
> -- 
> 2.12.2
> 
> 



Re: [Qemu-devel] [Qemu-block] [PATCH v2 00/16] block: Protect AIO context change with perm API

2017-05-11 Thread Stefan Hajnoczi
On Wed, Apr 19, 2017 at 05:43:40PM +0800, Fam Zheng wrote:
> v2: Address Stefan's comments:
> 
> - Clean up redundancy in bdrv_format_default_perms change.
> - Add a test case to check both success/failure cases.
>   A failure case is not possible at user interface level because of other
>   checks we have, so write a unit test in tests/test-blk-perm.c.
> 
> Eject / change of scsi-cd on a virtio-scsi dataplane bus causes abort() 
> because
> the new BDS doesn't get proper bdrv_set_aio_context().
> 
> Store the AioContext in BB and do it in blk_insert_bs. That is done by
> Vladimir's patch.
> 
> Other patches are to make sure such a bdrv_set_aio_context() doesn't interfere
> with other BBs using other nodes from this graph.

Looks pretty good.  I had two comments that apply across all patches:

First, it is not safe to enable the new permission without registering
an aio notifier.  Another user could look up the BDS and call
bdrv_set_aio_context() on it.  I believe this bug is present for block
jobs that have additional BDSes like base/target/etc.

Second, patches that post-pone bdrv_set_aio_context() must take care to
acquire the AioContext for BDS accesses that happen before the next
bdrv_set_aio_context() call.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 2/7] curl: never invoke callbacks with s->mutex held

2017-05-11 Thread Jeff Cody
On Wed, May 10, 2017 at 04:32:00PM +0200, Paolo Bonzini wrote:
> All curl callbacks go through curl_multi_do, and hence are called with
> s->mutex held.  Note that with comments, and make curl_read_cb drop the
> lock before invoking the callback.
> 
> Likewise for curl_find_buf, where the callback can be invoked by the
> caller.
> 
> Cc: qemu-sta...@nongnu.org
> Cc: jc...@redhat.com
> Signed-off-by: Paolo Bonzini 
> ---
>  block/curl.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/block/curl.c b/block/curl.c
> index 25a301e7b4..9a00fdc28e 100644
> --- a/block/curl.c
> +++ b/block/curl.c
> @@ -147,6 +147,7 @@ static void curl_multi_do(void *arg);
>  static void curl_multi_read(void *arg);
>  
>  #ifdef NEED_CURL_TIMER_CALLBACK
> +/* Called from curl_multi_do_locked, with s->mutex held.  */
>  static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
>  {
>  BDRVCURLState *s = opaque;
> @@ -163,6 +164,7 @@ static int curl_timer_cb(CURLM *multi, long timeout_ms, 
> void *opaque)
>  }
>  #endif
>  
> +/* Called from curl_multi_do_locked, with s->mutex held.  */
>  static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
>  void *userp, void *sp)
>  {
> @@ -212,6 +214,7 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
> action,
>  return 0;
>  }
>  
> +/* Called from curl_multi_do_locked, with s->mutex held.  */
>  static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void 
> *opaque)
>  {
>  BDRVCURLState *s = opaque;
> @@ -226,6 +229,7 @@ static size_t curl_header_cb(void *ptr, size_t size, 
> size_t nmemb, void *opaque)
>  return realsize;
>  }
>  
> +/* Called from curl_multi_do_locked, with s->mutex held.  */
>  static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void 
> *opaque)
>  {
>  CURLState *s = ((CURLState*)opaque);
> @@ -264,7 +268,9 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t 
> nmemb, void *opaque)
>request_length - offset);
>  }
>  
> +qemu_mutex_unlock(&s->s->mutex);
>  acb->common.cb(acb->common.opaque, 0);
> +qemu_mutex_lock(&s->s->mutex);
>  qemu_aio_unref(acb);
>  s->acb[i] = NULL;
>  }
> @@ -305,8 +311,6 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, 
> size_t len,
>  if (clamped_len < len) {
>  qemu_iovec_memset(acb->qiov, clamped_len, 0, len - 
> clamped_len);
>  }
> -acb->common.cb(acb->common.opaque, 0);
> -
>  return FIND_RET_OK;
>  }
>  
> @@ -832,8 +836,8 @@ static void curl_readv_bh_cb(void *p)
>  // we can just call the callback and be done.
>  switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, 
> acb)) {
>  case FIND_RET_OK:
> -qemu_aio_unref(acb);
> -// fall through
> +ret = 0;
> +goto out;
>  case FIND_RET_WAIT:
>  goto out;
>  default:
> -- 
> 2.12.2
> 
>

Reviewed-by: Jeff Cody 




Re: [Qemu-devel] [PATCH 1/7] curl: strengthen assertion in curl_clean_state

2017-05-11 Thread Jeff Cody
On Wed, May 10, 2017 at 04:31:59PM +0200, Paolo Bonzini wrote:
> curl_clean_state should only be called after all AIOCBs have been
> completed.  This is not so obvious for the call from curl_detach_aio_context,
> so assert that.
> 
> Cc: qemu-sta...@nongnu.org
> Cc: jc...@redhat.com
> Signed-off-by: Paolo Bonzini 
> ---
>  block/curl.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/curl.c b/block/curl.c
> index 2708d57c2f..25a301e7b4 100644
> --- a/block/curl.c
> +++ b/block/curl.c
> @@ -532,6 +532,11 @@ static CURLState *curl_init_state(BlockDriverState *bs, 
> BDRVCURLState *s)
>  
>  static void curl_clean_state(CURLState *s)
>  {
> +int j;
> +for (j=0; j +assert(!s->acb[j]);
> +}
> +
>  if (s->s->multi)
>  curl_multi_remove_handle(s->s->multi, s->curl);
>  
> -- 
> 2.12.2
> 
>

Minor formatting nit aside (if no other revisions needed, I can fix that on
apply):

Reviewed-by: Jeff Cody 



Re: [Qemu-devel] [PATCH v2 16/16] tests: Add test case for BLK_PERM_AIO_CONTEXT_CHANGE

2017-05-11 Thread Stefan Hajnoczi
On Wed, Apr 19, 2017 at 05:43:56PM +0800, Fam Zheng wrote:
> +static void test_aio_context_failure(void)
> +{
> +Error *local_err = NULL;
> +BlockBackend *blk1 = blk_new(BLK_PERM_AIO_CONTEXT_CHANGE,
> + BLK_PERM_ALL & 
> ~BLK_PERM_AIO_CONTEXT_CHANGE);
> +BlockBackend *blk2 = blk_new(BLK_PERM_AIO_CONTEXT_CHANGE, BLK_PERM_ALL);
> +BlockDriverState *bs = bdrv_open("null-co://", NULL, NULL, 0, 
> &error_abort);
> +
> +blk_insert_bs(blk1, bs, &error_abort);
> +blk_insert_bs(blk2, bs, &local_err);
> +
> +g_assert_nonnull(local_err);

The following asserts and also frees the Error object to prevent leaking
memory:

error_free_or_abort(local_err);


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [PATCH v2 15/16] block: Add perm assertion on blk_set_aio_context

2017-05-11 Thread Stefan Hajnoczi
On Wed, Apr 19, 2017 at 05:43:55PM +0800, Fam Zheng wrote:
> Now that all BB users comply with the BLK_PERM_AIO_CONTEXT_CHANGE
> rule, we can assert it.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/block-backend.c | 4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [PATCH v2 13/16] blk: fix aio context loss on media change

2017-05-11 Thread Stefan Hajnoczi
On Wed, Apr 19, 2017 at 05:43:53PM +0800, Fam Zheng wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> If we have separate iothread for cdrom, we lose connection to it on
> qmp_blockdev_change_medium, as aio_context is on bds which is dropped
> and switched with new one.
> 
> As an example result, after such media change we have crash on
> virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' 
> failed.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Fam Zheng 
> ---
>  block/block-backend.c | 6 ++
>  1 file changed, 6 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [PATCH v2 14/16] nbd: Allow BLK_PERM_AIO_CONTEXT_CHANGE on BB

2017-05-11 Thread Stefan Hajnoczi
On Wed, Apr 19, 2017 at 05:43:54PM +0800, Fam Zheng wrote:
> This is safe because of the aio context notifier we'll register on this
> node. So allow it.
> 
> Signed-off-by: Fam Zheng 
> ---
>  nbd/server.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


  1   2   3   4   >