date:20141007

Re: [Qemu-devel] [PATCH 1/2] qemu-char: Fix reconnect socket error reporting

2014-10-07 Thread Markus Armbruster

miny...@acm.org writes:

> From: Corey Minyard 
>
> If reconnect was set, errors wouldn't always be reported.
> Fix that and also only report a connect error once until a
> connection has been made.
>
> The primary purpose of this is to tell the user that a
> connection failed so they can know they need to figure out
> what went wrong.  So we don't want to spew too much
> out here, just enough so they know.
>
> Signed-off-by: Corey Minyard 
> ---
>  qemu-char.c | 47 ---
>  1 file changed, 32 insertions(+), 15 deletions(-)
>
> diff --git a/qemu-char.c b/qemu-char.c
> index 62af0ef..fb895c7 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -2509,6 +2509,7 @@ typedef struct {
>  
>  guint reconnect_timer;
>  int64_t reconnect_time;
> +bool connect_err_reported;
>  } TCPCharDriver;
>  
>  static gboolean socket_reconnect_timeout(gpointer opaque);

Doesn't apply, obviously depends on some other patch.  Always state your
dependencies explicitly in the cover letter!

[...]

Re: [Qemu-devel] [PATCH v1 7/8] throttle: Add throttle group support

2014-10-07 Thread Fam Zheng

On Tue, 10/07 15:24, Benoît Canet wrote:
> The throttle group support use a cooperative round robin scheduling algorithm.
> 
> The principle of the algorithm are simple:

s/principle/principles/

> - Each BDS of the group is used as a token in a circular way.
> - The active BDS compute if a wait must be done and arm the right timer.
> - If a wait must be done the token timer will be armed so the token will 
> become
>   the next active BDS.
> 
> Signed-off-by: Benoit Canet 
> ---
>  block.c   | 191 
> --
>  block/qapi.c  |   7 +-
>  block/throttle-groups.c   |   2 +-
>  blockdev.c|  19 -
>  hmp.c |   4 +-
>  include/block/block.h |   3 +-
>  include/block/block_int.h |   9 ++-
>  qapi/block-core.json  |   5 +-
>  qemu-options.hx   |   1 +
>  qmp-commands.hx   |   3 +-
>  10 files changed, 209 insertions(+), 35 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 527ea48..e7e5607 100644
> --- a/block.c
> +++ b/block.c
> @@ -36,6 +36,7 @@
>  #include "qmp-commands.h"
>  #include "qemu/timer.h"
>  #include "qapi-event.h"
> +#include "block/throttle-groups.h"
>  
>  #ifdef CONFIG_BSD
>  #include 
> @@ -129,7 +130,9 @@ void bdrv_set_io_limits(BlockDriverState *bs,
>  {
>  int i;
>  
> -throttle_config(&bs->throttle_state, &bs->throttle_timers, cfg);
> +throttle_group_lock(bs->throttle_state);
> +throttle_config(bs->throttle_state, &bs->throttle_timers, cfg);
> +throttle_group_unlock(bs->throttle_state);
>  
>  for (i = 0; i < 2; i++) {
>  qemu_co_enter_next(&bs->throttled_reqs[i]);
> @@ -156,34 +159,99 @@ static bool bdrv_start_throttled_reqs(BlockDriverState 
> *bs)
>  return drained;
>  }
>  
> +static void bdrv_throttle_group_add(BlockDriverState *bs)
> +{
> +int i;
> +BlockDriverState *token;
> +
> +for (i = 0; i < 2; i++) {
> +/* Get the BlockDriverState having the round robin token */
> +token = throttle_group_token(bs->throttle_state, i);
> +
> +/* If the ThrottleGroup is new set the current BlockDriverState as
> + * token
> + */
> +if (!token) {
> +throttle_group_set_token(bs->throttle_state, bs, i);
> +}
> +
> +}
> +
> +throttle_group_register_bs(bs->throttle_state, bs);
> +}
> +
> +static void bdrv_throttle_group_remove(BlockDriverState *bs)
> +{
> +BlockDriverState *token;
> +int i;
> +
> +for (i = 0; i < 2; i++) {
> +/* Get the BlockDriverState having the round robin token */
> +token = throttle_group_token(bs->throttle_state, i);
> +/* if this bs is the current token set the next bs as token */
> +if (token == bs) {
> +token = throttle_group_next_bs(token);
> +/* take care of the case where bs is the only bs of the group */
> +if (token == bs) {
> +token = NULL;
> +}
> +throttle_group_set_token(bs->throttle_state, token, i);
> +}
> +}
> +
> +/* remove the current bs from the list */
> +QLIST_REMOVE(bs, round_robin);
> +}
> +
>  void bdrv_io_limits_disable(BlockDriverState *bs)
>  {
> +
> +throttle_group_lock(bs->throttle_state);
>  bs->io_limits_enabled = false;
> +throttle_group_unlock(bs->throttle_state);
>  
>  bdrv_start_throttled_reqs(bs);
>  
> +throttle_group_lock(bs->throttle_state);
> +bdrv_throttle_group_remove(bs);
> +throttle_group_unlock(bs->throttle_state);
> +
> +throttle_group_unref(bs->throttle_state);
> +bs->throttle_state = NULL;
> +
>  throttle_timers_destroy(&bs->throttle_timers);
>  }
>  
>  static void bdrv_throttle_read_timer_cb(void *opaque)
>  {
>  BlockDriverState *bs = opaque;
> -throttle_timer_fired(&bs->throttle_state, false);
> +
> +throttle_group_lock(bs->throttle_state);
> +throttle_timer_fired(bs->throttle_state, false);
> +throttle_group_unlock(bs->throttle_state);
> +
>  qemu_co_enter_next(&bs->throttled_reqs[0]);
>  }
>  
>  static void bdrv_throttle_write_timer_cb(void *opaque)
>  {
>  BlockDriverState *bs = opaque;
> -throttle_timer_fired(&bs->throttle_state, true);
> +
> +throttle_group_lock(bs->throttle_state);
> +throttle_timer_fired(bs->throttle_state, true);
> +throttle_group_unlock(bs->throttle_state);
> +
>  qemu_co_enter_next(&bs->throttled_reqs[1]);
>  }
>  
>  /* should be called before bdrv_set_io_limits if a limit is set */
> -void bdrv_io_limits_enable(BlockDriverState *bs)
> +void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)

Does this mean that after this series, all the throttle_states must be
contained inside its own throttle group? If so, we could embed ThrottleGroup
fields in ThrottleState.

It's weird when a function called throttle_group_compare takes a parameter of
ThrottleState pointer, and cast it back to ThrottleGroup with container_of

Re: [Qemu-devel] [PATCH v1 6/8] throttle: Add a way to fire one of the timers asap like a bottom half

2014-10-07 Thread Fam Zheng

On Tue, 10/07 15:24, Benoît Canet wrote:
> This will be needed by the group throttling algorithm.
> 
> Signed-off-by: Benoit Canet 
> ---
>  include/qemu/throttle.h |  2 ++
>  util/throttle.c | 11 +++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
> index 3a16c48..3b9d1b8 100644
> --- a/include/qemu/throttle.h
> +++ b/include/qemu/throttle.h
> @@ -127,6 +127,8 @@ bool throttle_schedule_timer(ThrottleState *ts,
>   bool is_write,
>   bool *armed);
>  
> +void throttle_fire_timer(ThrottleTimers *tt, bool is_write);
> +
>  void throttle_timer_fired(ThrottleState *ts, bool is_write);
>  
>  void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
> diff --git a/util/throttle.c b/util/throttle.c
> index a273acb..163b9d0 100644
> --- a/util/throttle.c
> +++ b/util/throttle.c
> @@ -403,6 +403,17 @@ bool throttle_schedule_timer(ThrottleState *ts,
>  return true;
>  }
>  
> +/* Schedule a throttle timer like a BH

Why not use a real BH? It's more efficient than a timer scheduled at now + 1.

Fam

> + *
> + * @tt:   The timers structure
> + * @is_write: the type of operation (read/write)
> + */
> +void throttle_fire_timer(ThrottleTimers *tt, bool is_write)
> +{
> +int64_t now = qemu_clock_get_ns(tt->clock_type);
> +timer_mod(tt->timers[is_write], now + 1);
> +}
> +
>  /* Remember that now timers are currently armed
>   *
>   * @ts:   the throttle state we are working on
> -- 
> 2.1.1
>

Re: [Qemu-devel] [PATCH v1 3/8] throttle: Add throttle group infrastructure tests

2014-10-07 Thread Fam Zheng

On Tue, 10/07 15:24, Benoît Canet wrote:
> Signed-off-by: Benoit Canet 
> ---
>  tests/test-throttle.c | 51 
> +++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/tests/test-throttle.c b/tests/test-throttle.c
> index 3e52df3..ecb5504 100644
> --- a/tests/test-throttle.c
> +++ b/tests/test-throttle.c
> @@ -15,6 +15,7 @@
>  #include "block/aio.h"
>  #include "qemu/throttle.h"
>  #include "qemu/error-report.h"
> +#include "block/throttle-groups.h"
>  
>  static AioContext *ctx;
>  static LeakyBucketbkt;
> @@ -500,6 +501,55 @@ static void test_accounting(void)
>  (64.0 / 13)));
>  }
>  
> +static void test_groups(void)
> +{
> +bool removed;
> +
> +ThrottleState *ts_foo, *ts_bar, *tmp;
> +
> +ts_bar = throttle_group_incref("bar");
> +throttle_group_set_token(ts_bar, (BlockDriverState *) 0x5, false);

Why do you have the magic numbers cast to pointers instead of allocated objects
with bdrv_new?

Fam

> +ts_foo = throttle_group_incref("foo");
> +
> +tmp = throttle_group_incref("foo");
> +throttle_group_set_token(tmp, (BlockDriverState *) 0x7, true);
> +g_assert(tmp == ts_foo);
> +
> +tmp = throttle_group_incref("bar");
> +g_assert(tmp == ts_bar);
> +
> +tmp = throttle_group_incref("bar");
> +g_assert(tmp == ts_bar);
> +
> +g_assert((int64_t) throttle_group_token(ts_bar, false) == 0x5);
> +g_assert((int64_t) throttle_group_token(ts_foo, true) == 0x7);
> +
> +removed = throttle_group_unref(ts_foo);
> +g_assert(removed);
> +removed = throttle_group_unref(ts_bar);
> +g_assert(removed);
> +
> +g_assert((int64_t) throttle_group_token(ts_foo, true) == 0x7);
> +
> +removed = throttle_group_unref(ts_foo);
> +g_assert(removed);
> +removed = throttle_group_unref(ts_bar);
> +g_assert(removed);
> +
> +/* "foo" group should be destroyed when reaching this */
> +removed = throttle_group_unref(ts_foo);
> +g_assert(!removed);
> +
> +g_assert((int64_t) throttle_group_token(ts_bar, false) == 0x5);
> +
> +removed = throttle_group_unref(ts_bar);
> +g_assert(removed);
> +
> +/* "bar" group should be destroyed when reaching this */
> +removed = throttle_group_unref(ts_bar);
> +g_assert(!removed);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  GSource *src;
> @@ -533,6 +583,7 @@ int main(int argc, char **argv)
>  g_test_add_func("/throttle/config/is_valid",test_is_valid);
>  g_test_add_func("/throttle/config_functions",   test_config_functions);
>  g_test_add_func("/throttle/accounting", test_accounting);
> +g_test_add_func("/throttle/groups", test_groups);
>  return g_test_run();
>  }
>  
> -- 
> 2.1.1
>

Re: [Qemu-devel] [PATCH v1 1/8] throttle: Extract timers from ThrottleState into a separate ThrottleTimers structure

2014-10-07 Thread Fam Zheng

On Tue, 10/07 15:24, Benoît Canet wrote:
> Group throttling will share ThrottleState between multiple bs.
> As a consequence the ThrottleState will be accessed by multiple aio context.
> 
> Timers are tied to their aio context so they must go out of the ThrottleState 
> structure.
> 
> This commit pave the way for each bs of a common ThrottleState to have it's 
> own

s/pave/paves/

And a few trivial comments below.

Otherwise looks good.


> timer.
> 
> Signed-off-by: Benoit Canet 
> ---
>  block.c   | 35 
>  include/block/block_int.h |  1 +
>  include/qemu/throttle.h   | 36 +
>  tests/test-throttle.c | 82 
> ++-
>  util/throttle.c   | 73 -
>  5 files changed, 134 insertions(+), 93 deletions(-)
> 
> diff --git a/block.c b/block.c
> index d3aebeb..f209f55 100644
> --- a/block.c
> +++ b/block.c
> @@ -129,7 +129,7 @@ void bdrv_set_io_limits(BlockDriverState *bs,
>  {
>  int i;
>  
> -throttle_config(&bs->throttle_state, cfg);
> +throttle_config(&bs->throttle_state, &bs->throttle_timers, cfg);
>  
>  for (i = 0; i < 2; i++) {
>  qemu_co_enter_next(&bs->throttled_reqs[i]);
> @@ -162,7 +162,7 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
>  
>  bdrv_start_throttled_reqs(bs);
>  
> -throttle_destroy(&bs->throttle_state);
> +throttle_timers_destroy(&bs->throttle_timers);
>  }
>  
>  static void bdrv_throttle_read_timer_cb(void *opaque)
> @@ -181,12 +181,13 @@ static void bdrv_throttle_write_timer_cb(void *opaque)
>  void bdrv_io_limits_enable(BlockDriverState *bs)
>  {
>  assert(!bs->io_limits_enabled);
> -throttle_init(&bs->throttle_state,
> -  bdrv_get_aio_context(bs),
> -  QEMU_CLOCK_VIRTUAL,
> -  bdrv_throttle_read_timer_cb,
> -  bdrv_throttle_write_timer_cb,
> -  bs);
> +throttle_init(&bs->throttle_state);
> +throttle_timers_init(&bs->throttle_timers,
> + bdrv_get_aio_context(bs),
> + QEMU_CLOCK_VIRTUAL,
> + bdrv_throttle_read_timer_cb,
> + bdrv_throttle_write_timer_cb,
> + bs);
>  bs->io_limits_enabled = true;
>  }
>  
> @@ -200,7 +201,9 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
>   bool is_write)
>  {
>  /* does this io must wait */
> -bool must_wait = throttle_schedule_timer(&bs->throttle_state, is_write);
> +bool must_wait = throttle_schedule_timer(&bs->throttle_state,
> + &bs->throttle_timers,
> + is_write);
>  
>  /* if must wait or any request of this type throttled queue the IO */
>  if (must_wait ||
> @@ -213,7 +216,8 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
>  
>  
>  /* if the next request must wait -> do nothing */
> -if (throttle_schedule_timer(&bs->throttle_state, is_write)) {
> +if (throttle_schedule_timer(&bs->throttle_state, &bs->throttle_timers,
> +is_write)) {
>  return;
>  }
>  
> @@ -1990,6 +1994,9 @@ static void bdrv_move_feature_fields(BlockDriverState 
> *bs_dest,
>  memcpy(&bs_dest->throttle_state,
> &bs_src->throttle_state,
> sizeof(ThrottleState));
> +memcpy(&bs_dest->throttle_timers,
> +   &bs_src->throttle_timers,
> +   sizeof(ThrottleTimers));
>  bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
>  bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
>  bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
> @@ -2052,7 +2059,7 @@ void bdrv_swap(BlockDriverState *bs_new, 
> BlockDriverState *bs_old)
>  assert(bs_new->job == NULL);
>  assert(bs_new->dev == NULL);
>  assert(bs_new->io_limits_enabled == false);
> -assert(!throttle_have_timer(&bs_new->throttle_state));
> +assert(!throttle_timers_are_init(&bs_new->throttle_timers));
>  
>  tmp = *bs_new;
>  *bs_new = *bs_old;
> @@ -2070,7 +2077,7 @@ void bdrv_swap(BlockDriverState *bs_new, 
> BlockDriverState *bs_old)
>  assert(bs_new->dev == NULL);
>  assert(bs_new->job == NULL);
>  assert(bs_new->io_limits_enabled == false);
> -assert(!throttle_have_timer(&bs_new->throttle_state));
> +assert(!throttle_timers_are_init(&bs_new->throttle_timers));
>  
>  /* insert the nodes back into the graph node list if needed */
>  if (bs_new->node_name[0] != '\0') {
> @@ -5746,7 +5753,7 @@ void bdrv_detach_aio_context(BlockDriverState *bs)
>  }
>  
>  if (bs->io_limits_enabled) {
> -throttle_detach_aio_context(&bs->throttle_state);
> +throttle_timers_detach_aio_context(&bs->throttle_timers);
>  }
>  if (bs->drv->bdrv_detach_aio_context) {
>

Re: [Qemu-devel] [PATCH v2 37/36] qdev: device_del: search for to be unplugged device in 'peripheral' container

2014-10-07 Thread Zhu Guihua

On Tue, 2014-10-07 at 15:53 +0200, Igor Mammedov wrote:
> On Tue, 07 Oct 2014 15:23:45 +0200
> Andreas Färber  wrote:
> 
> > Am 07.10.2014 um 14:10 schrieb Igor Mammedov:
> > > On Tue, 7 Oct 2014 19:59:51 +0800
> > > Zhu Guihua  wrote:
> > > 
> > >> On Thu, 2014-10-02 at 10:08 +, Igor Mammedov wrote:
> > >>> device_add puts every device with 'id' inside of 'peripheral'
> > >>> container using id's value as the last component name.
> > >>> Use it by replacing recursive search on sysbus with path
> > >>> lookup in 'peripheral' container, which could handle both
> > >>> BUS and BUS-less device cases.
> > >>>
> > >>
> > >> If I want to delete device without id inside of 'peripheral-anon'
> > >> container, the command 'device_del' does not work. 
> > >> My suggestion is deleting device by the last component name, is this
> > >> feasiable?
> > > So far device_del was designed to work only with id-ed devices.
> > > 
> > > What's a use-case for unplugging unnamed device from peripheral-anon?
> > 
> > I can think of use cases where you may want to balloon memory or CPUs.
> yep currently initial CPUs are created without dev->id and even without
> device_add help.
> However if/when it's switched to device_add we can make them use
> auto-generated IDs so they would go into peripheral section.
> That would let us keep peripheral-anon for devices that shouldn't
> be unplugged.

when device_add pc-dimm, only 'memdev' property is necessary, but the
'id' property is optional. 

So I execute the command as followings:
object_add memory-backend-ram,id=ram0,size=128M
device_add pc-dimm,memdev=ram0

Now it is impossible to delete the pc-dimm, because it has no id, and it
is inside of 'peripheral-anon' container. 

Regards,
Zhu

> 
> > 
> > But that seems orthogonal to this series.
> > 
> > Regards,
> > Andreas
> > 
>

Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of pcie devices

2014-10-07 Thread Gonglei (Arei)

> Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of
> pcie devices
> 
> On Fri, 2014-10-03 at 13:22 +0200, Knut Omang wrote:
> > On Wed, 2014-10-01 at 17:08 +0300, Marcel Apfelbaum wrote:
> > > On Wed, 2014-10-01 at 07:26 +0200, Knut Omang wrote:
> > > > On Tue, 2014-09-30 at 21:38 +0800, Gonglei wrote:
> > > > > > Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari
> capability of
> > > > > > pcie devices
> > > > > >
> > > > > > On Tue, Sep 30, 2014 at 06:11:25PM +0800, arei.gong...@huawei.com
> wrote:
> > > > > > > From: Gonglei 
> > > > > > >
> > > > > > > In QEMU, ARI Forwarding is enabled default at emulation of PCIe
> > > > > > > ports. ARI Forwarding enable setting at firmware/OS Control
> handoff.
> > > > > > > If the bit is Set when a non-ARI Device is present, the non-ARI
> > > > > > > Device can respond to Configuration Space accesses under what it
> > > > > > > interprets as being different Device Numbers, and its Functions 
> > > > > > > can
> > > > > > > be aliased under multiple Device Numbers, generally leading to
> > > > > > > undesired behavior.
> > > > > >
> > > > > > So what is the undesired behaviour?
> > > > > > Did you observe such?
> > > > >
> > > > > I just observe the PCI device don't work, and the dmesg show me:
> > > > >
> > > > > [ 159.035250] pciehp :05:00.0:pcie24: Button pressed on Slot (0 - 
> > > > > 4)
> > > > > [ 159.035274] pciehp :05:00.0:pcie24: Card present on Slot (0 - 4)
> > > > > [ 159.036517] pciehp :05:00.0:pcie24: PCI slot #0 - 4 - powering 
> > > > > on
> due to button press.
> > > > > [ 159.188049] pciehp :05:00.0:pcie24: Failed to check link status
> > > > > [ 159.201968] pciehp :05:00.0:pcie24: Card not present on Slot (0 
> > > > > -
> 4)
> > > > > [ 159.202529] pciehp :05:00.0:pcie24: Already disabled on Slot (0 
> > > > > -
> 4)
> > > >
> > > > This seems very much like the symptoms I see when I use hotplug after
> > > > the latest changes to the hotplug code - do you have something that
> > > > confirms this has something to do with wrong interpretation of device
> > > > IDs? My suspicion is that something has gone wrong or is missing in the
> > > > hotplug logic but I havent had time to dig deeper into it yet.
> > > Can you please describe me the steps to reproduce the issue?
> >
> > Hmm, while trying to reproduce again I realize my hotplug issues are not
> > the same as the one Gonglei reports, let me come back to that in a
> > separate mail later,
> >
> > My main point here is that I don't see how this particular fix would
> > alleviate Gonglei's issue, as it does not seem to get triggered unless
> > there's a bug in the emulated port/switch?
> >
> > Gonglei, I assume you are use the TI x3130 in your example:
> > IMHO any PCIe x 1 device should fit into any of the (3) PCIe slots
> > provided by the TI x3130, and according to the spec:
> >
> > http://www.ti.com/lit/gpn/xio3130
> >
> > these slots appear as separate buses, which means that all devices will
> > have devfn 0.0 but on different buses, eg. if you have all three switch
> > slots (not to be confused with the slot number given by the PCI_SLOT()
> > macro) populated and no other secondary buses before it, you should see
> > the downstream switch ports as 01:xx:x and devices in 02:00.0, 03:00.0
> > and 04:00.0 for slots 0, 1 and 2. This seems to correspond well with the
> > current Qemu model.
> >
> > So unless your device itself exposes function# > 8 and is not ARI
> > capable (which would be a non-compliant device as far as I read the
> > standard) you should never be able to see any device in any of the
> > downstream ports have PCI_SLOT(devfn) != 0 ?
> >
> > With qemu master + my SR/IOV patch set + the igb patch (just to have an
> > ARI capable device to play with) here:
> >
> > https://github.com/knuto/qemu, branch sriov_patches_v3
> >
> > and a guest running F20 I am able to boot with a device (such as for
> > instance an e1000) inserted in a root port, I am also able to hot plug
> > it from the qemu monitor, eg.
> >
> > qemu-kvm  ... \
> >   -device ioh3420,slot=0,id=pcie_port.0
> >
> > ...
> >
> > (qemu) device_add e1000,vlan=1,bus=pcie_port.0,id=ne
> > (qemu) device_del ne
> >
> > In both cases the ARIfwd bit is disabled by default and not enabled by
> > the port driver (just as I would expect) so at least your comment
> > is wrong as far as I can see.
> >
> > Booting with a two-port downstream switch with no devices plugged, I see
> > the same, ARIfwd is not enabled on the downstream ports as long as no
> > device is in any slots, which is as expected, isn't it?
> >
> > qemu-kvm  ... \
> >   -device x3130-upstream,id=upsw \
> >   -device xio3130-downstream,bus=upsw,addr=0.0,chassis=5,id=ds_port.0 \
> >   -device xio3130-downstream,bus=upsw,addr=1.0,chassis=6,id=ds_port.1
> > ...
> >
> > The upstream port does not support ARIfwd in the QEMU model, which I
> > suppose is correct as it will only contain individual downstre

Re: [Qemu-devel] [PATCH v5 1/2] QEMUSizedBuffer based QEMUFile

2014-10-07 Thread zhanghailiang


On 2014/9/29 17:41, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

This is based on Stefan and Joel's patch that creates a QEMUFile that goes
to a memory buffer; from:

http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html

Using the QEMUFile interface, this patch adds support functions for
operating on in-memory sized buffers that can be written to or read from.

Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 

For fixes/tweeks I've done:
Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Eric Blake 
---
  include/migration/qemu-file.h |  28 +++
  include/qemu/typedefs.h   |   1 +
  qemu-file.c   | 456 ++
  3 files changed, 485 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index c90f529..6ef8ebc 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -25,6 +25,8 @@
  #define QEMU_FILE_H 1
  #include "exec/cpu-common.h"

+#include 
+
  /* This function writes a chunk of data to a file at the given position.
   * The pos argument can be ignored if the file is only being used for
   * streaming.  The handler should try to write all of the data it can.
@@ -94,11 +96,21 @@ typedef struct QEMUFileOps {
  QEMURamSaveFunc *save_page;
  } QEMUFileOps;

+struct QEMUSizedBuffer {
+struct iovec *iov;
+size_t n_iov;
+size_t size; /* total allocated size in all iov's */
+size_t used; /* number of used bytes */
+};
+
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
+
  QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
  QEMUFile *qemu_fopen(const char *filename, const char *mode);
  QEMUFile *qemu_fdopen(int fd, const char *mode);
  QEMUFile *qemu_fopen_socket(int fd, const char *mode);
  QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
  int qemu_get_fd(QEMUFile *f);
  int qemu_fclose(QEMUFile *f);
  int64_t qemu_ftell(QEMUFile *f);
@@ -111,6 +123,22 @@ void qemu_put_byte(QEMUFile *f, int v);
  void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
  bool qemu_file_mode_is_not_valid(const char *mode);

+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *);
+void qsb_free(QEMUSizedBuffer *);
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
+size_t qsb_get_length(const QEMUSizedBuffer *qsb);
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
+   uint8_t *buf);
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
+ off_t pos, size_t count);
+
+
+/*
+ * For use on files opened with qemu_bufopen
+ */
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
+
  static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
  {
  qemu_put_byte(f, (int)v);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 5f20b0e..db1153a 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -60,6 +60,7 @@ typedef struct PCIEAERLog PCIEAERLog;
  typedef struct PCIEAERErr PCIEAERErr;
  typedef struct PCIEPort PCIEPort;
  typedef struct PCIESlot PCIESlot;
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
  typedef struct MSIMessage MSIMessage;
  typedef struct SerialState SerialState;
  typedef struct PCMCIACardState PCMCIACardState;
diff --git a/qemu-file.c b/qemu-file.c
index a8e3912..ccc516c 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -878,3 +878,459 @@ uint64_t qemu_get_be64(QEMUFile *f)
  v |= qemu_get_be32(f);
  return v;
  }
+
+#define QSB_CHUNK_SIZE  (1 << 10)
+#define QSB_MAX_CHUNK_SIZE  (16 * QSB_CHUNK_SIZE)
+
+/**
+ * Create a QEMUSizedBuffer
+ * This type of buffer uses scatter-gather lists internally and
+ * can grow to any size. Any data array in the scatter-gather list
+ * can hold different amount of bytes.
+ *
+ * @buffer: Optional buffer to copy into the QSB
+ * @len: size of initial buffer; if @buffer is given, buffer must
+ *   hold at least len bytes
+ *
+ * Returns a pointer to a QEMUSizedBuffer or NULL on allocation failure
+ */
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len)
+{
+QEMUSizedBuffer *qsb;
+size_t alloc_len, num_chunks, i, to_copy;
+size_t chunk_size = (len > QSB_MAX_CHUNK_SIZE)
+? QSB_MAX_CHUNK_SIZE
+: QSB_CHUNK_SIZE;
+
+num_chunks = DIV_ROUND_UP(len ? len : QSB_CHUNK_SIZE, chunk_size);
+alloc_len = num_chunks * chunk_size;
+
+qsb = g_try_new0(QEMUSizedBuffer, 1);
+if (!qsb) {
+return NULL;
+}
+
+qsb->iov = g_try_new0(struct iovec, num_chunks);
+if (!qsb->iov) {
+g_free(qsb);
+return NULL;
+}
+
+qsb->n_iov = num_chunks;
+
+for (i = 0; i < num_chunks; i++) {
+qsb->iov[i].iov_base = g_try_malloc0(chunk_size);
+if (!qsb->iov[i].iov_base) {
+/*

Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request

2014-10-07 Thread zhanghailiang


On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

On receiving MIG_RPCOMM_REQPAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert 
---
  arch_init.c   | 52 +++
  include/migration/migration.h | 21 +
  include/qemu/typedefs.h   |  3 ++-
  migration.c   | 34 +++-
  4 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 4a03171..72f9e17 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -660,6 +660,58 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, 
ram_addr_t offset,
  }

  /*
+ * Queue the pages for transmission, e.g. a request from postcopy destination
+ *   ms: MigrationStatus in which the queue is held
+ *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
+ *   start: Offset from the start of the RAMBlock
+ *   len: Length (in bytes) to send
+ *   Return: 0 on success
+ */
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+ ram_addr_t start, ram_addr_t len)
+{
+RAMBlock *ramblock;
+
+if (!rbname) {
+/* Reuse last RAMBlock */
+ramblock = ms->last_req_rb;
+
+if (!ramblock) {
+error_report("ram_save_queue_pages no previous block");
+return -1;
+}
+} else {
+ramblock = ram_find_block(rbname);
+
+if (!ramblock) {
+error_report("ram_save_queue_pages no block '%s'", rbname);
+return -1;
+}
+}
+DPRINTF("ram_save_queue_pages: Block %s start %zx len %zx",
+ramblock->idstr, start, len);
+
+if (start+len > ramblock->length) {
+error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
+ __func__, start, len, ramblock->length);
+return -1;
+}
+
+struct MigrationSrcPageRequest *new_entry =
+g_malloc0(sizeof(struct MigrationSrcPageRequest));
+new_entry->rb = ramblock;
+new_entry->offset = start;
+new_entry->len = len;
+ms->last_req_rb = ramblock;
+
+qemu_mutex_lock(&ms->src_page_req_mutex);
+QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
+qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+return 0;
+}
+
+/*
   * ram_find_and_save_block: Finds a page to send and sends it to f
   *
   * Returns:  The number of bytes written.
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5e0d30d..5bc01d5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -102,6 +102,18 @@ MigrationIncomingState 
*migration_incoming_get_current(void);
  MigrationIncomingState *migration_incoming_state_init(QEMUFile *f);
  void migration_incoming_state_destroy(void);

+/*
+ * An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct MigrationSrcPageRequest {
+RAMBlock *rb;
+hwaddroffset;
+hwaddrlen;
+
+QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
+};
+
  struct MigrationState
  {
  int64_t bandwidth_limit;
@@ -138,6 +150,12 @@ struct MigrationState
   * of the postcopy phase
   */
  unsigned long *sentmap;
+
+/* Queue of outstanding page requests from the destination */
+QemuMutex src_page_req_mutex;
+QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
+/* The RAMBlock used in the last src_page_request */
+RAMBlock *last_req_rb;
  };

  void process_incoming_migration(QEMUFile *f);
@@ -273,4 +291,7 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t 
block_offset,
   ram_addr_t offset, size_t size,
   int *bytes_sent);

+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+ ram_addr_t start, ram_addr_t len);
+
  #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 79f57c0..24c2207 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -8,6 +8,7 @@ typedef struct QEMUTimerListGroup QEMUTimerListGroup;
  typedef struct QEMUFile QEMUFile;
  typedef struct QEMUBH QEMUBH;

+typedef struct AdapterInfo AdapterInfo;
  typedef struct AioContext AioContext;

  typedef struct Visitor Visitor;
@@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState;
  typedef struct PcGuestInfo PcGuestInfo;
  typedef struct PostcopyPMI PostcopyPMI;
  typedef struct Range Range;
-typedef struct AdapterInfo AdapterInfo;
+typedef struct RAMBlock RAMBlock;



:(, another redefinition, 'RAMBlock' also defined in 
'include/exec/cpu-all.h:314',
Am i miss something when compile qemu?


  #endif /* QEMU_TYPEDEFS_H */
diff --git a/migration.c b/migration.c
index cfdaa52..63d7699 100644
--- a/migration.c
+++ b/migration.c
@@ -26,6 +26,8 @@
  #include "qemu/thread.h"
  #include "qmp-commands.h"
  #include "trace.h"
+#include "exec/

Re: [Qemu-devel] [PATCH v4 23/47] migrate_init: Call from savevm

2014-10-07 Thread zhanghailiang


On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert 
---
  include/migration/migration.h | 1 +
  include/qemu/typedefs.h   | 1 +
  migration.c   | 2 +-
  savevm.c  | 2 ++
  4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2c078c4..3aeae47 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -140,6 +140,7 @@ int migrate_fd_close(MigrationState *s);

  void add_migration_state_change_notifier(Notifier *notify);
  void remove_migration_state_change_notifier(Notifier *notify);
+MigrationState *migrate_init(const MigrationParams *params);
  bool migration_in_setup(MigrationState *);
  bool migration_has_finished(MigrationState *);
  bool migration_has_failed(MigrationState *);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 0f79b5c..8539de6 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -16,6 +16,7 @@ struct Monitor;
  typedef struct Monitor Monitor;
  typedef struct MigrationIncomingState MigrationIncomingState;
  typedef struct MigrationParams MigrationParams;
+typedef struct MigrationState MigrationState;



Er, another redefinition, when compile, it complains there is
a redefinition of typedef ‘MigrationState’ in
'include/migration/migration.h:59', is this a problem?


  typedef struct Property Property;
  typedef struct PropertyInfo PropertyInfo;
diff --git a/migration.c b/migration.c
index 527423e..3a45b2a 100644
--- a/migration.c
+++ b/migration.c
@@ -488,7 +488,7 @@ bool migration_has_failed(MigrationState *s)
  s->state == MIG_STATE_ERROR);
  }

-static MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(const MigrationParams *params)
  {
  MigrationState *s = migrate_get_current();
  int64_t bandwidth_limit = s->bandwidth_limit;
diff --git a/savevm.c b/savevm.c
index bffe890..a368a25 100644
--- a/savevm.c
+++ b/savevm.c
@@ -949,6 +949,8 @@ static int qemu_savevm_state(QEMUFile *f)
  .blk = 0,
  .shared = 0
  };
+MigrationState *ms = migrate_init(¶ms);
+ms->file = f;

  if (qemu_savevm_state_blocked(NULL)) {
  return -EINVAL;

Re: [Qemu-devel] [PATCH v4 01/47] QEMUSizedBuffer based QEMUFile

2014-10-07 Thread zhanghailiang


On 2014/10/4 1:47, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

* Please comment on separate thread for this QEMUSizedBuffer patch *

This is based on Stefan and Joel's patch that creates a QEMUFile that goes
to a memory buffer; from:

http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html

Using the QEMUFile interface, this patch adds support functions for
operating on in-memory sized buffers that can be written to or read from.

Signed-off-by: Stefan Berger 
Signed-off-by: Joel Schopp 

For fixes/tweeks I've done:
Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Eric Blake 
---
  include/migration/qemu-file.h |  28 +++
  include/qemu/typedefs.h   |   1 +
  qemu-file.c   | 456 ++
  3 files changed, 485 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index c90f529..6ef8ebc 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -25,6 +25,8 @@
  #define QEMU_FILE_H 1
  #include "exec/cpu-common.h"

+#include 
+
  /* This function writes a chunk of data to a file at the given position.
   * The pos argument can be ignored if the file is only being used for
   * streaming.  The handler should try to write all of the data it can.
@@ -94,11 +96,21 @@ typedef struct QEMUFileOps {
  QEMURamSaveFunc *save_page;
  } QEMUFileOps;

+struct QEMUSizedBuffer {
+struct iovec *iov;
+size_t n_iov;
+size_t size; /* total allocated size in all iov's */
+size_t used; /* number of used bytes */
+};
+
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;
+


There is a redefinition of typedef ‘QEMUSizedBuffer’ in
'include/qemu/typedefs.h:68', when i compile qemu, it complains;)



  QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
  QEMUFile *qemu_fopen(const char *filename, const char *mode);
  QEMUFile *qemu_fdopen(int fd, const char *mode);
  QEMUFile *qemu_fopen_socket(int fd, const char *mode);
  QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
  int qemu_get_fd(QEMUFile *f);
  int qemu_fclose(QEMUFile *f);
  int64_t qemu_ftell(QEMUFile *f);
@@ -111,6 +123,22 @@ void qemu_put_byte(QEMUFile *f, int v);
  void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size);
  bool qemu_file_mode_is_not_valid(const char *mode);

+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
+QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *);
+void qsb_free(QEMUSizedBuffer *);
+size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
+size_t qsb_get_length(const QEMUSizedBuffer *qsb);
+ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
+   uint8_t *buf);
+ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
+ off_t pos, size_t count);
+
+
+/*
+ * For use on files opened with qemu_bufopen
+ */
+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
+
  static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
  {
  qemu_put_byte(f, (int)v);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 5f20b0e..db1153a 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -60,6 +60,7 @@ typedef struct PCIEAERLog PCIEAERLog;
  typedef struct PCIEAERErr PCIEAERErr;
  typedef struct PCIEPort PCIEPort;
  typedef struct PCIESlot PCIESlot;
+typedef struct QEMUSizedBuffer QEMUSizedBuffer;


Here!, see above comment. Thanks.


  typedef struct MSIMessage MSIMessage;
  typedef struct SerialState SerialState;
  typedef struct PCMCIACardState PCMCIACardState;
diff --git a/qemu-file.c b/qemu-file.c
index a8e3912..ccc516c 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -878,3 +878,459 @@ uint64_t qemu_get_be64(QEMUFile *f)
  v |= qemu_get_be32(f);
  return v;
  }
+
+#define QSB_CHUNK_SIZE  (1 << 10)
+#define QSB_MAX_CHUNK_SIZE  (16 * QSB_CHUNK_SIZE)
+
+/**
+ * Create a QEMUSizedBuffer
+ * This type of buffer uses scatter-gather lists internally and
+ * can grow to any size. Any data array in the scatter-gather list
+ * can hold different amount of bytes.
+ *
+ * @buffer: Optional buffer to copy into the QSB
+ * @len: size of initial buffer; if @buffer is given, buffer must
+ *   hold at least len bytes
+ *
+ * Returns a pointer to a QEMUSizedBuffer or NULL on allocation failure
+ */
+QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len)
+{
+QEMUSizedBuffer *qsb;
+size_t alloc_len, num_chunks, i, to_copy;
+size_t chunk_size = (len > QSB_MAX_CHUNK_SIZE)
+? QSB_MAX_CHUNK_SIZE
+: QSB_CHUNK_SIZE;
+
+num_chunks = DIV_ROUND_UP(len ? len : QSB_CHUNK_SIZE, chunk_size);
+alloc_len = num_chunks * chunk_size;
+
+qsb = g_try_new0(QEMUSizedBuffer, 1);
+if (!qsb) {
+return NULL;
+}
+
+qsb->iov = g_try_new0(struct iovec, num_chunks);
+if (!qsb->iov) {
+

Re: [Qemu-devel] [PATCH V4 5/8] pc: Update rtc_cmos in pc_cpu_plug

2014-10-07 Thread Gu Zheng

Hi Igor,

On 10/07/2014 09:01 PM, Igor Mammedov wrote:

> On Mon, 29 Sep 2014 18:52:34 +0800
> Gu Zheng  wrote:
> 
>> Update rtc_cmos in pc_cpu_plug directly instead of the notifier.
>>
>> v4:
>>  -Make link property in PCMachine rather than the global
>>   variables.
>>  -Split out the removal of unused notifier into separate patch.
>>
>> Signed-off-by: Gu Zheng 
>> ---
>>  hw/i386/pc.c |   37 -
>>  hw/i386/pc_piix.c|2 +-
>>  hw/i386/pc_q35.c |2 +-
>>  include/hw/i386/pc.h |3 ++-
>>  qom/cpu.c|1 -
>>  5 files changed, 20 insertions(+), 25 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index dcb9332..301e704 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -355,30 +355,15 @@ static void pc_cmos_init_late(void *opaque)
>>  qemu_unregister_reset(pc_cmos_init_late, opaque);
>>  }
>>  
>> -typedef struct RTCCPUHotplugArg {
>> -Notifier cpu_added_notifier;
>> -ISADevice *rtc_state;
>> -} RTCCPUHotplugArg;
>> -
>> -static void rtc_notify_cpu_added(Notifier *notifier, void *data)
>> -{
>> -RTCCPUHotplugArg *arg = container_of(notifier, RTCCPUHotplugArg,
>> - cpu_added_notifier);
>> -ISADevice *s = arg->rtc_state;
>> -
>> -/* increment the number of CPUs */
>> -rtc_set_memory(s, 0x5f, rtc_get_memory(s, 0x5f) + 1);
>> -}
>> -
>>  void pc_cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size,
>> -  const char *boot_device,
>> +  const char *boot_device, MachineState *machine,
>>ISADevice *floppy, BusState *idebus0, BusState *idebus1,
>>ISADevice *s)
>>  {
>>  int val, nb, i;
>>  FDriveType fd_type[2] = { FDRIVE_DRV_NONE, FDRIVE_DRV_NONE };
>>  static pc_cmos_init_late_arg arg;
>> -static RTCCPUHotplugArg cpu_hotplug_cb;
>> +PCMachineState *pc_machine = PC_MACHINE(machine);
>>  
>>  /* various important CMOS locations needed by PC/Bochs bios */
>>  
>> @@ -417,10 +402,14 @@ void pc_cmos_init(ram_addr_t ram_size, ram_addr_t 
>> above_4g_mem_size,
>>  
>>  /* set the number of CPU */
>>  rtc_set_memory(s, 0x5f, smp_cpus - 1);
>> -/* init CPU hotplug notifier */
>> -cpu_hotplug_cb.rtc_state = s;
>> -cpu_hotplug_cb.cpu_added_notifier.notify = rtc_notify_cpu_added;
>> -qemu_register_cpu_added_notifier(&cpu_hotplug_cb.cpu_added_notifier);
>> +
>> +object_property_add_link(OBJECT(machine), "rtc_state",
>> + TYPE_ISA_DEVICE,
>> + (Object **)&pc_machine->rtc,
>> + object_property_allow_set_link,
>> + OBJ_PROP_LINK_UNREF_ON_RELEASE, &error_abort);
>> +object_property_set_link(OBJECT(machine), OBJECT(s),
>> + "rtc_state", &error_abort);
>>  
>>  if (set_boot_dev(s, boot_device)) {
>>  exit(1);
>> @@ -1633,6 +1622,12 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
>>  
>>  hhc = HOTPLUG_HANDLER_GET_CLASS(pcms->acpi_dev);
>>  hhc->plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err);
>> +if (local_err) {
>> +goto out;
>> +}
>> +
>> +/* increment the number of CPUs */
>> +rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1);
>>  out:
>>  error_propagate(errp, local_err);
>>  }
>> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
>> index 103d756..2c8d4dc 100644
>> --- a/hw/i386/pc_piix.c
>> +++ b/hw/i386/pc_piix.c
>> @@ -266,7 +266,7 @@ static void pc_init1(MachineState *machine,
>>  }
>>  
>>  pc_cmos_init(below_4g_mem_size, above_4g_mem_size, machine->boot_order,
>> - floppy, idebus[0], idebus[1], rtc_state);
>> + machine, floppy, idebus[0], idebus[1], rtc_state);
>>  
>>  if (pci_enabled && usb_enabled(false)) {
>>  pci_create_simple(pci_bus, piix3_devfn + 2, "piix3-usb-uhci");
>> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
>> index d4a907c..94ba98d 100644
>> --- a/hw/i386/pc_q35.c
>> +++ b/hw/i386/pc_q35.c
>> @@ -266,7 +266,7 @@ static void pc_q35_init(MachineState *machine)
>>8, NULL, 0);
>>  
>>  pc_cmos_init(below_4g_mem_size, above_4g_mem_size, machine->boot_order,
>> - floppy, idebus[0], idebus[1], rtc_state);
>> + machine, floppy, idebus[0], idebus[1], rtc_state);
>>  
>>  /* the rest devices to which pci devfn is automatically assigned */
>>  pc_vga_init(isa_bus, host_bus);
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 77316d5..7a4bff4 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -33,6 +33,7 @@ struct PCMachineState {
>>  MemoryRegion hotplug_memory;
>>  
>>  HotplugHandler *acpi_dev;
>> +ISADevice *rtc;
>>  
>>  uint64_t max_ram_below_4g;
>>  };
>> @@ -210,7 +211,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq

Re: [Qemu-devel] [PATCH RFC 00/11] qemu: towards virtio-1 host support

2014-10-07 Thread Andy Lutomirski

On 10/07/2014 07:39 AM, Cornelia Huck wrote:
> This patchset aims to get us some way to implement virtio-1 compliant
> and transitional devices in qemu. Branch available at
> 
> git://github.com/cohuck/qemu virtio-1
> 
> I've mainly focused on:
> - endianness handling
> - extended feature bits
> - virtio-ccw new/changed commands

At the risk of some distraction, would it be worth thinking about a
solution to the IOMMU bypassing mess as part of this?

--Andy

Re: [Qemu-devel] [PATCH v7 2/2] dump: Turn some functions to void to make code cleaner

2014-10-07 Thread zhanghailiang


Hi,

Ping...:(

Thanks,
zhanghailiang

On 2014/9/30 17:20, zhanghailiang wrote:

Functions shouldn't return an error code and an Error object at the same time.
Turn all these functions that returning Error object to void.
We also judge if a function success or fail by reference to the local_err.

Signed-off-by: zhanghailiang 
---
  dump.c | 313 ++---
  1 file changed, 143 insertions(+), 170 deletions(-)

diff --git a/dump.c b/dump.c
index 07d2300..de0cd83 100644
--- a/dump.c
+++ b/dump.c
@@ -100,7 +100,7 @@ static int fd_write_vmcore(const void *buf, size_t size, 
void *opaque)
  return 0;
  }

-static int write_elf64_header(DumpState *s, Error **errp)
+static void write_elf64_header(DumpState *s, Error **errp)
  {
  Elf64_Ehdr elf_header;
  int ret;
@@ -128,13 +128,10 @@ static int write_elf64_header(DumpState *s, Error **errp)
  ret = fd_write_vmcore(&elf_header, sizeof(elf_header), s);
  if (ret < 0) {
  dump_error(s, "dump: failed to write elf header", errp);
-return -1;
  }
-
-return 0;
  }

-static int write_elf32_header(DumpState *s, Error **errp)
+static void write_elf32_header(DumpState *s, Error **errp)
  {
  Elf32_Ehdr elf_header;
  int ret;
@@ -162,15 +159,12 @@ static int write_elf32_header(DumpState *s, Error **errp)
  ret = fd_write_vmcore(&elf_header, sizeof(elf_header), s);
  if (ret < 0) {
  dump_error(s, "dump: failed to write elf header", errp);
-return -1;
  }
-
-return 0;
  }

-static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
-int phdr_index, hwaddr offset,
-hwaddr filesz, Error **errp)
+static void write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
+ int phdr_index, hwaddr offset,
+ hwaddr filesz, Error **errp)
  {
  Elf64_Phdr phdr;
  int ret;
@@ -188,15 +182,12 @@ static int write_elf64_load(DumpState *s, MemoryMapping 
*memory_mapping,
  ret = fd_write_vmcore(&phdr, sizeof(Elf64_Phdr), s);
  if (ret < 0) {
  dump_error(s, "dump: failed to write program header table", errp);
-return -1;
  }
-
-return 0;
  }

-static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
-int phdr_index, hwaddr offset,
-hwaddr filesz, Error **errp)
+static void write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
+ int phdr_index, hwaddr offset,
+ hwaddr filesz, Error **errp)
  {
  Elf32_Phdr phdr;
  int ret;
@@ -214,13 +205,10 @@ static int write_elf32_load(DumpState *s, MemoryMapping 
*memory_mapping,
  ret = fd_write_vmcore(&phdr, sizeof(Elf32_Phdr), s);
  if (ret < 0) {
  dump_error(s, "dump: failed to write program header table", errp);
-return -1;
  }
-
-return 0;
  }

-static int write_elf64_note(DumpState *s, Error **errp)
+static void write_elf64_note(DumpState *s, Error **errp)
  {
  Elf64_Phdr phdr;
  hwaddr begin = s->memory_offset - s->note_size;
@@ -237,10 +225,7 @@ static int write_elf64_note(DumpState *s, Error **errp)
  ret = fd_write_vmcore(&phdr, sizeof(Elf64_Phdr), s);
  if (ret < 0) {
  dump_error(s, "dump: failed to write program header table", errp);
-return -1;
  }
-
-return 0;
  }

  static inline int cpu_index(CPUState *cpu)
@@ -248,8 +233,8 @@ static inline int cpu_index(CPUState *cpu)
  return cpu->cpu_index + 1;
  }

-static int write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
- Error **errp)
+static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
+  Error **errp)
  {
  CPUState *cpu;
  int ret;
@@ -260,7 +245,7 @@ static int write_elf64_notes(WriteCoreDumpFunction f, 
DumpState *s,
  ret = cpu_write_elf64_note(f, cpu, id, s);
  if (ret < 0) {
  dump_error(s, "dump: failed to write elf notes", errp);
-return -1;
+return;
  }
  }

@@ -268,14 +253,12 @@ static int write_elf64_notes(WriteCoreDumpFunction f, 
DumpState *s,
  ret = cpu_write_elf64_qemunote(f, cpu, s);
  if (ret < 0) {
  dump_error(s, "dump: failed to write CPU status", errp);
-return -1;
+return;
  }
  }
-
-return 0;
  }

-static int write_elf32_note(DumpState *s, Error **errp)
+static void write_elf32_note(DumpState *s, Error **errp)
  {
  hwaddr begin = s->memory_offset - s->note_size;
  Elf32_Phdr phdr;
@@ -292,14 +275,11 @@ static int write_elf32_note(DumpState *s, Error **errp)
  ret = fd_write_vmcore(&phdr, sizeof(Elf32_Phdr), s);
  if (ret < 0) {
  dump_error(s, "dump: failed to write program header table", errp);
-retu

Re: [Qemu-devel] [PATCH 1/1] target-i386: prevent users from setting threads>1 for AMD CPUs

2014-10-07 Thread Wei Huang




On 10/07/2014 04:36 PM, Paolo Bonzini wrote:

Il 07/10/2014 23:16, Wei Huang ha scritto:

It isn't a bug IMO. I tested various combinations; and current QEMU
handles it very well. It converts threads=n to proper
CPUID__0001_EBX[LogicalProcCount] and CPUID_8000_0008_ECX[NC]
accordingly for AMD.


So if it ain't broken, don't fix it. :)


I am worried that the default CPU is an AMD one when KVM is disabled,
and thus "qemu-system-x86_64 -smp threads=2" will likely be broken.


"-smp threads=2" will be rejected by the patch.


Yeah, I am afraid that is an incompatible change, and one more reason
not to do this.  AMD not selling hyperthreaded processors does not mean
that they could not be represented properly with the CPUID leaves that
AMD processors support.
I am OK with either way. The key question is: should QEMU presents 
CPUIDs strictly as specified by the command line or QEMU can tweak a 
little bit on behalf of end-users? For instance, if end-users say "-smp 
8,cores=2,threads=2,sockets=2", they meant "two socket, each has two 
2-hyperthread cores". Current QEMU will convert CPUID as "two socket, 
each has 4 cores". My patch will forbid the tweaking...


-Wei



Paolo


Unless the meaning of
threads is not limited to threads-per-core, shouldn't end users use
"-smp 2" in this case or something like "-smp 2,cores=2,sockets=1"?

Re: [Qemu-devel] [Bug 1378554] [NEW] qemu segfault in virtio_scsi_handle_cmd_req_submit on ARM 32 bit

2014-10-07 Thread Paolo Bonzini

Does this work:

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 203e624..c6d4f2e 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -545,11 +545,12 @@ bool virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 
 void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
 {
-if (scsi_req_enqueue(req->sreq)) {
-scsi_req_continue(req->sreq);
+SCSIRequest *sreq = req->sreq;
+bdrv_io_unplug(sreq->dev->conf.bs);
+if (scsi_req_enqueue(sreq)) {
+scsi_req_continue(sreq);
 }
-bdrv_io_unplug(req->sreq->dev->conf.bs);
-scsi_req_unref(req->sreq);
+scsi_req_unref(sreq);
 }
 
 static void virtio_scsi_handle_cmd(VirtIODevice *vdev, VirtQueue *vq)

?

Paolo

[Qemu-devel] [Bug 1378554] [NEW] qemu segfault in virtio_scsi_handle_cmd_req_submit on ARM 32 bit

2014-10-07 Thread Richard Jones

Public bug reported:

/home/rjones/d/qemu/arm-softmmu/qemu-system-arm \
-global virtio-blk-device.scsi=off \
-nodefconfig \
-enable-fips \
-nodefaults \
-display none \
-M virt \
-machine accel=kvm:tcg \
-m 500 \
-no-reboot \
-rtc driftfix=slew \
-global kvm-pit.lost_tick_policy=discard \
-kernel /home/rjones/d/libguestfs/tmp/.guestfs-1001/appliance.d/kernel \
-initrd /home/rjones/d/libguestfs/tmp/.guestfs-1001/appliance.d/initrd \
-device virtio-scsi-device,id=scsi \
-drive 
file=/home/rjones/d/libguestfs/tmp/libguestfseV4fT5/scratch.1,cache=unsafe,format=raw,id=hd0,if=none
 \
-device scsi-hd,drive=hd0 \
-drive 
file=/home/rjones/d/libguestfs/tmp/.guestfs-1001/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none
 \
-device scsi-hd,drive=appliance \
-device virtio-serial-device \
-serial stdio \
-chardev 
socket,path=/home/rjones/d/libguestfs/tmp/libguestfseV4fT5/guestfsd.sock,id=channel0
 \
-device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
-append 'panic=1 mem=500M console=ttyAMA0 udevtimeout=6000 no_timer_check 
lpj=4464640 acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb 
selinux=0 guestfs_verbose=1 TERM=xterm-256color'

The appliance boots, but segfaults as soon as the virtio-scsi driver is
loaded:

supermin: internal insmod virtio_scsi.ko
[3.992963] scsi0 : Virtio SCSI HBA
libguestfs: error: appliance closed the connection unexpectedly, see earlier 
error messages

I captured a core dump:

Core was generated by `/home/rjones/d/qemu/arm-softmmu/qemu-system-arm -global 
virtio-blk-device.scsi='.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000856bc in virtio_scsi_handle_cmd_req_submit (s=, 
req=0x6c03acf8) at /home/rjones/d/qemu/hw/scsi/virtio-scsi.c:551
551 bdrv_io_unplug(req->sreq->dev->conf.bs);
(gdb) bt
#0  0x000856bc in virtio_scsi_handle_cmd_req_submit (s=, 
req=0x6c03acf8) at /home/rjones/d/qemu/hw/scsi/virtio-scsi.c:551
#1  0x0008573a in virtio_scsi_handle_cmd (vdev=0xac4d68, vq=0xafe4b8)
at /home/rjones/d/qemu/hw/scsi/virtio-scsi.c:573
#2  0x0004fdbe in access_with_adjusted_size (addr=80, 
value=value@entry=0x4443e6c0, size=size@entry=4, access_size_min=1, 
access_size_max=, access_size_max@entry=0, 
access=access@entry=0x4fee9 , 
mr=mr@entry=0xa53fa8) at /home/rjones/d/qemu/memory.c:480
#3  0x00054234 in memory_region_dispatch_write (size=4, data=2, 
addr=, mr=0xa53fa8) at /home/rjones/d/qemu/memory.c:1117
#4  io_mem_write (mr=0xa53fa8, addr=, val=val@entry=2, 
size=size@entry=4) at /home/rjones/d/qemu/memory.c:1958
#5  0x00021c88 in address_space_rw (as=0x3b96b4 , 
addr=167788112, buf=buf@entry=0x4443e790 "\002", len=len@entry=4, 
is_write=is_write@entry=true) at /home/rjones/d/qemu/exec.c:2135
#6  0x00021de6 in address_space_write (len=4, buf=0x4443e790 "\002", 
addr=, as=)
at /home/rjones/d/qemu/exec.c:2202
#7  subpage_write (opaque=, addr=, value=2, 
len=4) at /home/rjones/d/qemu/exec.c:1811
#8  0x0004fdbe in access_with_adjusted_size (addr=592, 
value=value@entry=0x4443e820, size=size@entry=4, access_size_min=1, 
access_size_max=, access_size_max@entry=0, 
access=access@entry=0x4fee9 , 
mr=mr@entry=0xaed980) at /home/rjones/d/qemu/memory.c:480
#9  0x00054234 in memory_region_dispatch_write (size=4, data=2, 
addr=, mr=0xaed980) at /home/rjones/d/qemu/memory.c:1117
#10 io_mem_write (mr=0xaed980, addr=, val=2, size=size@entry=4)
at /home/rjones/d/qemu/memory.c:1958
#11 0x00057f24 in io_writel (retaddr=1121296542, Cannot access memory at 
address 0x0
addr=, val=2, 
physaddr=592, env=0x9d6c50) at /home/rjones/d/qemu/softmmu_template.h:381
#12 helper_le_stl_mmu (env=0x9d6c50, addr=, val=2, 
mmu_idx=, retaddr=1121296542)
at /home/rjones/d/qemu/softmmu_template.h:419
#13 0x42d5a0a0 in ?? ()
Cannot access memory at address 0x0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) print req
$1 = (VirtIOSCSIReq *) 0x6c03acf8
(gdb) print req->sreq
$2 = (SCSIRequest *) 0xc2c2c2c2
(gdb) print req->sreq->dev
Cannot access memory at address 0xc2c2c2c6
(gdb) print *req
$3 = {
  dev = 0x6c40, 
  vq = 0x6c40, 
  qsgl = {
sg = 0x0, 
nsg = 0, 
nalloc = -1027423550, 
size = 3267543746, 
dev = 0xc2c2c2c2, 
as = 0xc2c2c2c2
  }, 
  resp_iov = {
iov = 0xc2c2c2c2, 
niov = -1027423550, 
nalloc = -1027423550, 
size = 3267543746
  }, 
  elem = {
index = 3267543746, 
out_num = 3267543746, 
in_num = 3267543746, 
in_addr = {14033993530586874562 }, 
out_addr = {14033993530586874562 }, 
in_sg = {{
iov_base = 0xc2c2c2c2, 
iov_len = 3267543746
  } }, 
out_sg = {{
iov_base = 0xc2c2c2c2, 
iov_len = 3267543746
  } }
  }, 
  vring = 0xc2c2c2c2, 
  {
next = {
  tqe_next = 0xc2c2c2c2, 
  tqe_prev = 0xc2c2c2c2
}, 
rem

[Qemu-devel] [Bug 1378554] Re: qemu segfault in virtio_scsi_handle_cmd_req_submit on ARM 32 bit

2014-10-07 Thread Richard Jones

This is qemu from git today (2014-10-07).

The hardware is 32 bit ARM (ODROID-XU Samsung Exynos 5410).  It is
running Ubuntu 14.04 LTS as the main operating system, but I am NOT
using qemu from Ubuntu (which is ancient).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1378554

Title:
  qemu segfault in virtio_scsi_handle_cmd_req_submit on ARM 32 bit

Status in QEMU:
  New

Bug description:
  /home/rjones/d/qemu/arm-softmmu/qemu-system-arm \
  -global virtio-blk-device.scsi=off \
  -nodefconfig \
  -enable-fips \
  -nodefaults \
  -display none \
  -M virt \
  -machine accel=kvm:tcg \
  -m 500 \
  -no-reboot \
  -rtc driftfix=slew \
  -global kvm-pit.lost_tick_policy=discard \
  -kernel /home/rjones/d/libguestfs/tmp/.guestfs-1001/appliance.d/kernel \
  -initrd /home/rjones/d/libguestfs/tmp/.guestfs-1001/appliance.d/initrd \
  -device virtio-scsi-device,id=scsi \
  -drive 
file=/home/rjones/d/libguestfs/tmp/libguestfseV4fT5/scratch.1,cache=unsafe,format=raw,id=hd0,if=none
 \
  -device scsi-hd,drive=hd0 \
  -drive 
file=/home/rjones/d/libguestfs/tmp/.guestfs-1001/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none
 \
  -device scsi-hd,drive=appliance \
  -device virtio-serial-device \
  -serial stdio \
  -chardev 
socket,path=/home/rjones/d/libguestfs/tmp/libguestfseV4fT5/guestfsd.sock,id=channel0
 \
  -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
  -append 'panic=1 mem=500M console=ttyAMA0 udevtimeout=6000 no_timer_check 
lpj=4464640 acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb 
selinux=0 guestfs_verbose=1 TERM=xterm-256color'

  The appliance boots, but segfaults as soon as the virtio-scsi driver
  is loaded:

  supermin: internal insmod virtio_scsi.ko
  [3.992963] scsi0 : Virtio SCSI HBA
  libguestfs: error: appliance closed the connection unexpectedly, see earlier 
error messages

  I captured a core dump:

  Core was generated by `/home/rjones/d/qemu/arm-softmmu/qemu-system-arm 
-global virtio-blk-device.scsi='.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x000856bc in virtio_scsi_handle_cmd_req_submit (s=, 
  req=0x6c03acf8) at /home/rjones/d/qemu/hw/scsi/virtio-scsi.c:551
  551   bdrv_io_unplug(req->sreq->dev->conf.bs);
  (gdb) bt
  #0  0x000856bc in virtio_scsi_handle_cmd_req_submit (s=, 
  req=0x6c03acf8) at /home/rjones/d/qemu/hw/scsi/virtio-scsi.c:551
  #1  0x0008573a in virtio_scsi_handle_cmd (vdev=0xac4d68, vq=0xafe4b8)
  at /home/rjones/d/qemu/hw/scsi/virtio-scsi.c:573
  #2  0x0004fdbe in access_with_adjusted_size (addr=80, 
  value=value@entry=0x4443e6c0, size=size@entry=4, access_size_min=1, 
  access_size_max=, access_size_max@entry=0, 
  access=access@entry=0x4fee9 , 
  mr=mr@entry=0xa53fa8) at /home/rjones/d/qemu/memory.c:480
  #3  0x00054234 in memory_region_dispatch_write (size=4, data=2, 
  addr=, mr=0xa53fa8) at /home/rjones/d/qemu/memory.c:1117
  #4  io_mem_write (mr=0xa53fa8, addr=, val=val@entry=2, 
  size=size@entry=4) at /home/rjones/d/qemu/memory.c:1958
  #5  0x00021c88 in address_space_rw (as=0x3b96b4 , 
  addr=167788112, buf=buf@entry=0x4443e790 "\002", len=len@entry=4, 
  is_write=is_write@entry=true) at /home/rjones/d/qemu/exec.c:2135
  #6  0x00021de6 in address_space_write (len=4, buf=0x4443e790 "\002", 
  addr=, as=)
  at /home/rjones/d/qemu/exec.c:2202
  #7  subpage_write (opaque=, addr=, value=2, 
  len=4) at /home/rjones/d/qemu/exec.c:1811
  #8  0x0004fdbe in access_with_adjusted_size (addr=592, 
  value=value@entry=0x4443e820, size=size@entry=4, access_size_min=1, 
  access_size_max=, access_size_max@entry=0, 
  access=access@entry=0x4fee9 , 
  mr=mr@entry=0xaed980) at /home/rjones/d/qemu/memory.c:480
  #9  0x00054234 in memory_region_dispatch_write (size=4, data=2, 
  addr=, mr=0xaed980) at /home/rjones/d/qemu/memory.c:1117
  #10 io_mem_write (mr=0xaed980, addr=, val=2, size=size@entry=4)
  at /home/rjones/d/qemu/memory.c:1958
  #11 0x00057f24 in io_writel (retaddr=1121296542, Cannot access memory at 
address 0x0
  addr=, val=2, 
  physaddr=592, env=0x9d6c50) at /home/rjones/d/qemu/softmmu_template.h:381
  #12 helper_le_stl_mmu (env=0x9d6c50, addr=, val=2, 
  mmu_idx=, retaddr=1121296542)
  at /home/rjones/d/qemu/softmmu_template.h:419
  #13 0x42d5a0a0 in ?? ()
  Cannot access memory at address 0x0
  Backtrace stopped: previous frame identical to this frame (corrupt stack?)
  (gdb) print req
  $1 = (VirtIOSCSIReq *) 0x6c03acf8
  (gdb) print req->sreq
  $2 = (SCSIRequest *) 0xc2c2c2c2
  (gdb) print req->sreq->dev
  Cannot access memory at address 0xc2c2c2c6
  (gdb) print *req
  $3 = {
dev = 0x6c40, 
vq = 0x6c40, 
qsgl = {
  sg = 0x0, 
  nsg = 0, 
  nalloc = -1027423550, 
  size

Re: [Qemu-devel] [PATCH v5 11/33] target-arm: arrayfying fieldoffset for banking

2014-10-07 Thread Peter Maydell

On 7 October 2014 22:50, Greg Bellows  wrote:
> I'm still trying to wrap my head around it, but I believe there are
> cases where we use a different register set depending on whether a
> given EL is 32 or 64-bit.

Well, if an EL is 64-bit then it sees (effectively) a totally
different register set anyway. We merge together the STATE_AA64
and STATE_AA32 reginfo structures in the sourcecode[*], but the
process of populating the hashtable effectively creates two
disjoint sets of register info, one for STATE_AA32 and one for
STATE_AA64. The 64-bit registers may have the same underlying
state as one or more 32-bit registers (which the ARM ARM refers
to as the registers being "architecturally mapped" to each other,
and which we typically implement by pointing the fieldoffsets to
the same underlying uint64_t), but they're not strictly speaking
the same registers.

[*] This merging is purely for convenience and to avoid having
to write out multiple near-identical reginfo definitions: it's
always possible to write out a non-merged equivalent pair of
reginfo fields. Similarly it's always going to be possible to
write out separate (STATE_AA64, STATE_AA32 BANK_S, STATE_AA32 BANK_NS)
versions, we just want to be able to collapse them together
into one reginfo most of the time.

The other thing to watch out for is differences in what
secure-SVC/etc code sees depending on whether EL3 is 32 bit
or 64 bit. For instance, if EL3 is 32 bit then the SCTLR is banked,
and all the non-secure privileged modes execute at EL1 and
see SCTLR(NS), while the secure privileged modes execute at
EL3 and see SCTLR(S). However if EL3 is 64 bit then the SCTLR
is not banked, and both secure and non-secure 32 bit
privileged modes execute at EL1 and see SCTLR(NS). (It's the
responsibility of a 64-bit EL3 monitor to swap the contents of the
EL1 registers when it switches between Secure and Nonsecure OSes.)
But the USE_SECURE_REG() macro handles this correctly, so we should
be good there.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v5 0/5] add description field in ObjectProperty and PropertyInfo struct

2014-10-07 Thread Paolo Bonzini

Il 07/10/2014 08:33, arei.gong...@huawei.com ha scritto:
> From: Gonglei 
> 
> v5 -> v4:
>  1. add some improvements by Michael's suggtion, Thanks. (Michael)
>  2. add 'Reviewed-by' tag (Paolo, Michael, Eric)

Andreas, this series depends on patches in qom-next so you'll have to
take it.

Thanks,

Paolo

> v4 -> v3:
>  1. rebase on qom-next tree (Andreas)
>  2. fix memory leak in PATCH 2, move object_property_set_description calling
> in object_property_add_alias() from PATCH 3 to PATCH 2. (Paolo)
>  3. drop "?:" in PATCH 2, call g_strdup() directly
>  4. rework PATCH 4, change description as optional field,
> drop "?:" conditional express (Eric)
>  
> v3 -> v2:
>  1. add a new "description" field to DevicePropertyInfo, and format
> it in qdev_device_help() in PATCH 6 (Paolo)
> 
> v2 -> v1:
>  1. rename "fail" label to "out" in PATCH 1 (Andreas)
>  2. improve descriptions in PATCH 3 (Paolo, adding Signed-off-by Paolo in 
> this patch)
>  3. rework PATCH 5, set description at qdev_property_add_static(),
> then copy the description of target_obj.property. (Paolo)
>  4. free description filed of ObjectProperty avoid memory leak in PATCH 4.
> 
> This patch series based on qom-next tree:
>  https://github.com/afaerber/qemu-cpu/commits/qom-next
> 
> Add a description field in both ObjectProperty and PropertyInfo struct.
> The descriptions can serve as documentation in the code,
> and they can be used to provide better help. For example:
> 
> Before this patch series:
> 
> $./qemu-system-x86_64 -device virtio-blk-pci,?
> 
> virtio-blk-pci.iothread=link
> virtio-blk-pci.x-data-plane=bool
> virtio-blk-pci.scsi=bool
> virtio-blk-pci.config-wce=bool
> virtio-blk-pci.serial=str
> virtio-blk-pci.secs=uint32
> virtio-blk-pci.heads=uint32
> virtio-blk-pci.cyls=uint32
> virtio-blk-pci.discard_granularity=uint32
> virtio-blk-pci.bootindex=int32
> virtio-blk-pci.opt_io_size=uint32
> virtio-blk-pci.min_io_size=uint16
> virtio-blk-pci.physical_block_size=uint16
> virtio-blk-pci.logical_block_size=uint16
> virtio-blk-pci.drive=str
> virtio-blk-pci.virtio-backend=child
> virtio-blk-pci.command_serr_enable=on/off
> virtio-blk-pci.multifunction=on/off
> virtio-blk-pci.rombar=uint32
> virtio-blk-pci.romfile=str
> virtio-blk-pci.addr=pci-devfn
> virtio-blk-pci.event_idx=on/off
> virtio-blk-pci.indirect_desc=on/off
> virtio-blk-pci.vectors=uint32
> virtio-blk-pci.ioeventfd=on/off
> virtio-blk-pci.class=uint32
> 
> After:
> 
> $./qemu-system-x86_64 -device virtio-blk-pci,?
> 
> virtio-blk-pci.iothread=link
> virtio-blk-pci.x-data-plane=bool (on/off)
> virtio-blk-pci.scsi=bool (on/off)
> virtio-blk-pci.config-wce=bool (on/off)
> virtio-blk-pci.serial=str
> virtio-blk-pci.secs=uint32
> virtio-blk-pci.heads=uint32
> virtio-blk-pci.cyls=uint32
> virtio-blk-pci.discard_granularity=uint32
> virtio-blk-pci.bootindex=int32
> virtio-blk-pci.opt_io_size=uint32
> virtio-blk-pci.min_io_size=uint16
> virtio-blk-pci.physical_block_size=uint16 (A power of two between 512 and 
> 32768)
> virtio-blk-pci.logical_block_size=uint16 (A power of two between 512 and 
> 32768)
> virtio-blk-pci.drive=str (ID of a drive to use as a backend)
> virtio-blk-pci.virtio-backend=child
> virtio-blk-pci.command_serr_enable=bool (on/off)
> virtio-blk-pci.multifunction=bool (on/off)
> virtio-blk-pci.rombar=uint32
> virtio-blk-pci.romfile=str
> virtio-blk-pci.addr=int32 (Slot and optional function number, example: 06.0 
> or 06)
> virtio-blk-pci.event_idx=bool (on/off)
> virtio-blk-pci.indirect_desc=bool (on/off)
> virtio-blk-pci.vectors=uint32
> virtio-blk-pci.ioeventfd=bool (on/off)
> virtio-blk-pci.class=uint32
> 
> 
> Gonglei (5):
>   qdev: add description field in PropertyInfo struct
>   qom: add description field in ObjectProperty struct
>   qdev: set the object property's description to the qdev property's.
>   qmp: print descriptions of object properties
>   qdev: drop legacy_name from qdev properties
> 
>  hw/core/qdev-properties-system.c |  8 
>  hw/core/qdev-properties.c| 14 --
>  hw/core/qdev.c   |  5 +
>  include/hw/qdev-core.h   |  2 +-
>  include/qom/object.h | 14 ++
>  qapi-schema.json |  4 +++-
>  qdev-monitor.c   |  7 ++-
>  qmp.c| 13 ++---
>  qom/object.c | 20 
>  target-ppc/translate_init.c  |  2 +-
>  10 files changed, 72 insertions(+), 17 deletions(-)
>

Re: [Qemu-devel] [PATCH 2/2] qemu-sockets: Add error to non-blocking connect handler

2014-10-07 Thread Paolo Bonzini

Il 06/10/2014 19:59, miny...@acm.org ha scritto:
> +error_setg_errno(&err, errno, "Error connecting to socket");
>  closesocket(s->fd);
>  s->fd = rc;
>  }
> @@ -257,9 +259,14 @@ static void wait_for_connect(void *opaque)
>  while (s->current_addr->ai_next != NULL && s->fd < 0) {
>  s->current_addr = s->current_addr->ai_next;
>  s->fd = inet_connect_addr(s->current_addr, &in_progress, s, 
> NULL);
> +if (s->fd < 0) {
> +error_free(err);
> +err = NULL;
> +error_setg_errno(&err, errno, "Unable to start socket 
> connect");

So the above snippet is the actual errors that are passed here:

> +static void check_report_connect_error(CharDriverState *chr, const char *str,
> +   Error *err)
>  {
>  TCPCharDriver *s = chr->opaque;
>  
>  if (!s->connect_err_reported) {
> -error_report("%s char device %s\n", str, chr->label);
> +error_report("%s char device %s: %s\n", str, chr->label,
> + error_get_pretty(err));


If you just make it

error_report("%s: %s", chr->label, error_get_pretty(err));

we still get a good error.  It would arguably be better, since there's
no duplication, except that it doesn't mention character devices
anymore.  But something like "serial0: error connecting to socket" is a
decent error.

Thanks,

Paolo

Re: [Qemu-devel] [PATCH v5 11/33] target-arm: arrayfying fieldoffset for banking

2014-10-07 Thread Greg Bellows

On 7 October 2014 02:12, Peter Maydell  wrote:

> On 7 October 2014 06:06, Greg Bellows  wrote:
> >
> >
> > On 6 October 2014 11:19, Peter Maydell  wrote:
> >>
> >> On 30 September 2014 22:49, Greg Bellows 
> wrote:
> >> > From: Fabian Aggeler 
> >> >
> >> > Prepare ARMCPRegInfo to support specifying two fieldoffsets per
> >> > register definition. This will allow us to keep one register
> >> > definition for banked registers (different offsets for secure/
> >> > non-secure world).
> >> >
> >> > Signed-off-by: Fabian Aggeler 
> >> > Signed-off-by: Greg Bellows 
> >> >
> >> > --
> >> > v4 -> v5
> >> > - Added ARM CP register secure and non-secure bank flags
> >> > - Added setting of secure and non-secure flags furing registration
> >> > ---
> >> >  target-arm/cpu.h| 23 +++-
> >> >  target-arm/helper.c | 60
> >> > +
> >> >  2 files changed, 65 insertions(+), 18 deletions(-)
> >> >
> >> > diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> >> > index 1700676..9681d45 100644
> >> > --- a/target-arm/cpu.h
> >> > +++ b/target-arm/cpu.h
> >> > @@ -958,10 +958,12 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t
> >> > cpregid)
> >> >  #define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | (4 << 8))
> >> >  #define ARM_CP_DC_ZVA (ARM_CP_SPECIAL | (5 << 8))
> >> >  #define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
> >> > +#define ARM_CP_BANK_S   (1 << 16)
> >> > +#define ARM_CP_BANK_NS  (2 << 16)
> >>
> >> I thought we were going to put these flags into a reginfo->secure
> >> field? Mixing them into the 'type' bits seems unnecessarily
> >> confusing to me.
> >
> >
> > Hmmm... that's not how I interpreted our discussion.  We discussed having
> > BANK_ flags which I figured we were talking about the existing flags.
> So,
> > you are thinking that the "secure" field becomes a separate flags, so we
> > would have 2 flags fields.  Not sure that is any less confusing, maybe
> more
> > because then you have to worry about the flags being put in the right
> place.
>
> Sorry for any confusion. My intention was that the previous
> 'secure' field which just had a 1/0 value should have flags
> in it instead. Note that we don't have a generic "flags" field;
> we have a "type" field which indicates properties of how the
> register itself behaves (unrelated to what encodings and
> states it is visible from), we have a "state" field which has
> the flags for whether it is visible from AArch32 or AArch64
> or both, and we have an "access" field which has flags for
> whether it is readable or writable from various exception
> levels. I think having a separate "secure" field is easier
> to understand and fits into that approach.
>
>
Fixed in v6 by separating out the security flags and adding a secure field.


>
>
> >> > -ptrdiff_t fieldoffset; /* offsetof(CPUARMState, field) */
> >> > +union { /* offsetof(CPUARMState, field) */
> >> > +struct {
> >> > +ptrdiff_t fieldoffset_padding;
> >> > +ptrdiff_t fieldoffset;
> >>
> >> ...why is the padding field first? Given that we always write
> >> fieldoffset when we put the banked versions into the hash table
> >> I don't think it should matter, should it?
> >
> >
> > The padding aligns the existing fieldoffset with the non-secure bank.
> For
> > correctness, I added the padding to truly align the default fieldoffset
> with
> > the non-secure bank.  I don't think it matters otherwise.
>
> But do we ever write to "fieldoffset" and then read from
> "bank_fieldoffsets[1]" (or vice versa)? If we don't then it's
> not necessary for correctness at all... (If we do do that, where
> does it happen?)
>
>
In short, No.  The only time we use bank_fieldoffset is during registration
and that is to fixup fieldoffset to contain the correct bank offset.
Otherwise, we always use fieldoffset when determining the register offset.

I'm still trying to wrap my head around it, but I believe there are cases
where we use a different register set depending on whether a given EL is 32
or 64-bit.  I need to spend a bit more time working through the scenarios.


> thanks
> -- PMM
>

Re: [Qemu-devel] [PATCH 1/1] target-i386: prevent users from setting threads>1 for AMD CPUs

2014-10-07 Thread Paolo Bonzini

Il 07/10/2014 23:16, Wei Huang ha scritto:
> It isn't a bug IMO. I tested various combinations; and current QEMU
> handles it very well. It converts threads=n to proper
> CPUID__0001_EBX[LogicalProcCount] and CPUID_8000_0008_ECX[NC]
> accordingly for AMD.

So if it ain't broken, don't fix it. :)

>> I am worried that the default CPU is an AMD one when KVM is disabled,
>> and thus "qemu-system-x86_64 -smp threads=2" will likely be broken.
> 
> "-smp threads=2" will be rejected by the patch.

Yeah, I am afraid that is an incompatible change, and one more reason
not to do this.  AMD not selling hyperthreaded processors does not mean
that they could not be represented properly with the CPUID leaves that
AMD processors support.

Paolo

> Unless the meaning of
> threads is not limited to threads-per-core, shouldn't end users use
> "-smp 2" in this case or something like "-smp 2,cores=2,sockets=1"?

Re: [Qemu-devel] [PATCH 1/1] target-i386: prevent users from setting threads>1 for AMD CPUs

2014-10-07 Thread Wei Huang




On 10/07/2014 03:58 PM, Paolo Bonzini wrote:

Il 07/10/2014 21:44, Wei Huang ha scritto:

AMD CPU doesn't support hyperthreading. Even though QEMU fixes
this issue by setting CPUID__0001_EBX and CPUID_8000_0008_ECX
via conversion, it is better to stop end-users in the first place
with a warning message.


Hi Wei,

what exactly breaks if you try creating an AMD VM with hyperthreading?

Hi Paolo,

It isn't a bug IMO. I tested various combinations; and current QEMU 
handles it very well. It converts threads=n to proper 
CPUID__0001_EBX[LogicalProcCount] and CPUID_8000_0008_ECX[NC] 
accordingly for AMD.


There is a bugzilla reported for such configuration: 
https://bugzilla.redhat.com/show_bug.cgi?id=1135772. So I thought such 
checking might be a good thing to do.


I am worried that the default CPU is an AMD one when KVM is disabled,
and thus "qemu-system-x86_64 -smp threads=2" will likely be broken.


"-smp threads=2" will be rejected by the patch. Unless the meaning of 
threads is not limited to threads-per-core, shouldn't end users use 
"-smp 2" in this case or something like "-smp 2,cores=2,sockets=1"?




Paolo

Re: [Qemu-devel] [PATCH 1/1] target-i386: prevent users from setting threads>1 for AMD CPUs

2014-10-07 Thread Paolo Bonzini

Il 07/10/2014 21:44, Wei Huang ha scritto:
> AMD CPU doesn't support hyperthreading. Even though QEMU fixes
> this issue by setting CPUID__0001_EBX and CPUID_8000_0008_ECX
> via conversion, it is better to stop end-users in the first place
> with a warning message.

Hi Wei,

what exactly breaks if you try creating an AMD VM with hyperthreading?

I am worried that the default CPU is an AMD one when KVM is disabled,
and thus "qemu-system-x86_64 -smp threads=2" will likely be broken.

Paolo

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Peter Feiner

On Tue, Oct 07, 2014 at 05:52:47PM +0200, Andrea Arcangeli wrote:
> I probably grossly overestimated the benefits of resolving the
> userfault with a zerocopy page move, sorry. [...]

For posterity, I think it's worth noting that most expensive aspect of a TLB
shootdown is the interprocessor interrupt necessary to flush other CPUs' TLBs.
On a many-core machine, copying 4K of data looks pretty cheap compared to
taking an interrupt and invalidating TLBs on many cores :-)

> [...] So if we entirely drop the
> zerocopy behavior and the TLB flush of the old page like you
> suggested, the way to keep the userfaultfd mechanism decoupled from
> the userfault resolution mechanism would be to implement an
> atomic-copy syscall. That would work for SIGBUS userfaults too without
> requiring a pseudofd then. It would be enough then to call
> mcopy_atomic(userfault_addr,tmp_addr,len) with the only constraints
> that len must be a multiple of PAGE_SIZE. Of course mcopy_atomic
> wouldn't page fault or call GUP into the destination address (it can't
> otherwise the in-flight partial copy would be visible to the process,
> breaking the atomicity of the copy), but it would fill in the
> pte/trans_huge_pmd with the same strict behavior that remap_anon_pages
> currently has (in turn it would by design bypass the VM_USERFAULT
> check and be ideal for resolving userfaults).
> 
> mcopy_atomic could then be also extended to tmpfs and it would work
> without requiring the source page to be a tmpfs page too without
> having to convert page types on the fly.
> 
> If I add mcopy_atomic, the patch in subject (10/17) can be dropped of
> course so it'd be even less intrusive than the current
> remap_anon_pages and it would require zero TLB flush during its
> runtime (it would just require an atomic copy).

I like this new approach. It will be good to have a single interface for
resolving anon and tmpfs userfaults.

> So should I try to embed a mcopy_atomic inside userfault_write or can
> I expose it to userland as a standalone new syscall? Or should I do
> something different? Comments?

One interesting (ab)use of userfault_write would be that the faulting process
and the fault-handling process could be different, which would be necessary
for post-copy live migration in CRIU (http://criu.org).

Aside from the asthetic difference, I can't think of any advantage in favor of
a syscall.

Peter

[Qemu-devel] [PATCH 1/1] target-i386: prevent users from setting threads>1 for AMD CPUs

2014-10-07 Thread Wei Huang

AMD CPU doesn't support hyperthreading. Even though QEMU fixes
this issue by setting CPUID__0001_EBX and CPUID_8000_0008_ECX
via conversion, it is better to stop end-users in the first place
with a warning message.

Signed-off-by: Wei Huang 
---
 target-i386/cpu.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index e7bf9de..c377cd2 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2696,6 +2696,10 @@ static void x86_cpu_apic_realize(X86CPU *cpu, Error 
**errp)
 }
 #endif
 
+#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \
+ (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \
+ (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3)
+
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 {
 CPUState *cs = CPU(dev);
@@ -2711,9 +2715,7 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 /* On AMD CPUs, some CPUID[8000_0001].EDX bits must match the bits on
  * CPUID[1].EDX.
  */
-if (env->cpuid_vendor1 == CPUID_VENDOR_AMD_1 &&
-env->cpuid_vendor2 == CPUID_VENDOR_AMD_2 &&
-env->cpuid_vendor3 == CPUID_VENDOR_AMD_3) {
+if (IS_AMD_CPU(env)) {
 env->features[FEAT_8000_0001_EDX] &= ~CPUID_EXT2_AMD_ALIASES;
 env->features[FEAT_8000_0001_EDX] |= (env->features[FEAT_1_EDX]
& CPUID_EXT2_AMD_ALIASES);
@@ -2742,6 +2744,21 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 mce_init(cpu);
 qemu_init_vcpu(cs);
 
+/* AMD CPU doesn't support hyperthreading. Even though QEMU does fix
+ * this issue by setting CPUID__0001_EBX and CPUID_8000_0008_ECX
+ * correctly, it is still better to stop end-users in the first place
+ * by giving out a warning message.
+ *
+ * NOTE: cs->nr_threads is initialized in qemu_init_vcpu(). So the
+ * following code has to follow qemu_init_vcpu().
+ */
+if (IS_AMD_CPU(env) && (cs->nr_threads > 1)) {
+error_setg(&local_err,
+   "AMD CPU doesn't support hyperthreading. Please configure "
+   "-smp options correctly.");
+goto out;
+}
+
 x86_cpu_apic_realize(cpu, &local_err);
 if (local_err != NULL) {
 goto out;
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 1/1] target-i386: prevent users from setting threads>1 for AMD CPUs

2014-10-07 Thread Wei Huang


Sorry, please skip this version. I am sending out a updated one.

-Wei

On 10/07/2014 02:17 PM, Wei Huang wrote:

AMD CPU doesn't support hyperthreading. Even though QEMU fixes
this issue by setting CPUID__0001_EBX and CPUID_8000_0008_ECX
via conversion, it is better to stop end-users in the first place
with a warning message.

Signed-off-by: Wei Huang 
---
  target-i386/cpu.c | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index e7bf9de..01bbcaf 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2742,6 +2742,24 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
  mce_init(cpu);
  qemu_init_vcpu(cs);

+/* AMD CPU doesn't support hyperthreading. Even though QEMU does fix
+ * this issue by setting CPUID__0001_EBX and CPUID_8000_0008_ECX
+ * correctly, it is still better to stop end-users in the first place
+ * by giving out a warning message.
+ *
+ * NOTE: cs->nr_threads is initialized in qemu_init_vcpu(). So the
+ * following code has to follow qemu_init_vcpu().
+ */
+if (env->cpuid_vendor1 == CPUID_VENDOR_AMD_1 &&
+env->cpuid_vendor2 == CPUID_VENDOR_AMD_2 &&
+env->cpuid_vendor3 == CPUID_VENDOR_AMD_3 &&
+(cs->nr_threads > 1)) {
+error_setg(&local_err,
+   "AMD CPU doesn't support hyperthreading. Please configure "
+   "-smp options correctly.");
+goto out;
+}
+
  x86_cpu_apic_realize(cpu, &local_err);
  if (local_err != NULL) {
  goto out;

[Qemu-devel] [PATCH 1/1] target-i386: prevent users from setting threads>1 for AMD CPUs

2014-10-07 Thread Wei Huang

AMD CPU doesn't support hyperthreading. Even though QEMU fixes
this issue by setting CPUID__0001_EBX and CPUID_8000_0008_ECX
via conversion, it is better to stop end-users in the first place
with a warning message.

Signed-off-by: Wei Huang 
---
 target-i386/cpu.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index e7bf9de..01bbcaf 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2742,6 +2742,24 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 mce_init(cpu);
 qemu_init_vcpu(cs);
 
+/* AMD CPU doesn't support hyperthreading. Even though QEMU does fix
+ * this issue by setting CPUID__0001_EBX and CPUID_8000_0008_ECX
+ * correctly, it is still better to stop end-users in the first place
+ * by giving out a warning message.
+ *
+ * NOTE: cs->nr_threads is initialized in qemu_init_vcpu(). So the
+ * following code has to follow qemu_init_vcpu().
+ */
+if (env->cpuid_vendor1 == CPUID_VENDOR_AMD_1 &&
+env->cpuid_vendor2 == CPUID_VENDOR_AMD_2 &&
+env->cpuid_vendor3 == CPUID_VENDOR_AMD_3 &&
+(cs->nr_threads > 1)) {
+error_setg(&local_err,
+   "AMD CPU doesn't support hyperthreading. Please configure "
+   "-smp options correctly.");
+goto out;
+}
+
 x86_cpu_apic_realize(cpu, &local_err);
 if (local_err != NULL) {
 goto out;
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC 07/11] dataplane: allow virtio-1 devices

2014-10-07 Thread Cornelia Huck

Handle endianness conversion for virtio-1 virtqueues correctly.

Note that dataplane now needs to be built per-target.

Signed-off-by: Cornelia Huck 
---
 hw/block/dataplane/virtio-blk.c |3 +-
 hw/scsi/virtio-scsi-dataplane.c |2 +-
 hw/virtio/Makefile.objs |2 +-
 hw/virtio/dataplane/Makefile.objs   |2 +-
 hw/virtio/dataplane/vring.c |   85 +++
 include/hw/virtio/dataplane/vring.h |   64 --
 6 files changed, 113 insertions(+), 45 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5458f9d..eb45a3d 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -16,6 +16,7 @@
 #include "qemu/iov.h"
 #include "qemu/thread.h"
 #include "qemu/error-report.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/dataplane/vring.h"
 #include "block/block.h"
 #include "hw/virtio/virtio-blk.h"
@@ -75,7 +76,7 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
 VirtIOBlockDataPlane *s = req->dev->dataplane;
 stb_p(&req->in->status, status);
 
-vring_push(&req->dev->dataplane->vring, &req->elem,
+vring_push(s->vdev, &req->dev->dataplane->vring, &req->elem,
req->qiov.size + sizeof(*req->in));
 
 /* Suppress notification to guest by BH and its scheduled
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index b778e05..3e2b706 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -81,7 +81,7 @@ VirtIOSCSIReq *virtio_scsi_pop_req_vring(VirtIOSCSI *s,
 
 void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
 {
-vring_push(&req->vring->vring, &req->elem,
+vring_push((VirtIODevice *)req->dev, &req->vring->vring, &req->elem,
req->qsgl.size + req->resp_iov.size);
 event_notifier_set(&req->vring->guest_notifier);
 }
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index d21c397..19b224a 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y += virtio-rng.o
 common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 common-obj-y += virtio-bus.o
 common-obj-y += virtio-mmio.o
-common-obj-$(CONFIG_VIRTIO) += dataplane/
+obj-$(CONFIG_VIRTIO) += dataplane/
 
 obj-y += virtio.o virtio-balloon.o 
 obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
diff --git a/hw/virtio/dataplane/Makefile.objs 
b/hw/virtio/dataplane/Makefile.objs
index 9a8cfc0..753a9ca 100644
--- a/hw/virtio/dataplane/Makefile.objs
+++ b/hw/virtio/dataplane/Makefile.objs
@@ -1 +1 @@
-common-obj-y += vring.o
+obj-y += vring.o
diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index b84957f..4624521 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -18,6 +18,7 @@
 #include "hw/hw.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/dataplane/vring.h"
 #include "qemu/error-report.h"
 
@@ -83,7 +84,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 vring_init(&vring->vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
 
 vring->last_avail_idx = virtio_queue_get_last_avail_idx(vdev, n);
-vring->last_used_idx = vring->vr.used->idx;
+vring->last_used_idx = vring_get_used_idx(vdev, vring);
 vring->signalled_used = 0;
 vring->signalled_used_valid = false;
 
@@ -104,7 +105,7 @@ void vring_teardown(Vring *vring, VirtIODevice *vdev, int n)
 void vring_disable_notification(VirtIODevice *vdev, Vring *vring)
 {
 if (!(vdev->guest_features[0] & (1 << VIRTIO_RING_F_EVENT_IDX))) {
-vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
+vring_set_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
 }
 }
 
@@ -117,10 +118,10 @@ bool vring_enable_notification(VirtIODevice *vdev, Vring 
*vring)
 if (vdev->guest_features[0] & (1 << VIRTIO_RING_F_EVENT_IDX)) {
 vring_avail_event(&vring->vr) = vring->vr.avail->idx;
 } else {
-vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
+vring_clear_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
 }
 smp_mb(); /* ensure update is seen before reading avail_idx */
-return !vring_more_avail(vring);
+return !vring_more_avail(vdev, vring);
 }
 
 /* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
@@ -134,12 +135,13 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
 smp_mb();
 
 if ((vdev->guest_features[0] & VIRTIO_F_NOTIFY_ON_EMPTY) &&
-unlikely(vring->vr.avail->idx == vring->last_avail_idx)) {
+unlikely(!vring_more_avail(vdev, vring))) {
 return true;
 }
 
 if (!(vdev->guest_features[0] & VIRTIO_RING_F_EVENT_IDX)) {
-return !(vring->vr.avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
+return !(vring_get_avail_flags(vdev, vring) &
+ VRING_AVAIL_F_NO_INTERRUPT);
 }

[Qemu-devel] [PATCH RFC 07/11] virtio_net: use v1.0 endian.

2014-10-07 Thread Cornelia Huck

From: Rusty Russell 

[Cornelia Huck: converted some missed fields]
Signed-off-by: Rusty Russell 
Signed-off-by: Cornelia Huck 
---
 drivers/net/virtio_net.c |   31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 59caa06..cd18946 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -353,13 +353,14 @@ err:
 }
 
 static struct sk_buff *receive_mergeable(struct net_device *dev,
+struct virtnet_info *vi,
 struct receive_queue *rq,
 unsigned long ctx,
 unsigned int len)
 {
void *buf = mergeable_ctx_to_buf_address(ctx);
struct skb_vnet_hdr *hdr = buf;
-   int num_buf = hdr->mhdr.num_buffers;
+   u16 num_buf = virtio_to_cpu_u16(rq->vq->vdev, hdr->mhdr.num_buffers);
struct page *page = virt_to_head_page(buf);
int offset = buf - page_address(page);
unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
@@ -375,7 +376,9 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
ctx = (unsigned long)virtqueue_get_buf(rq->vq, &len);
if (unlikely(!ctx)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
-dev->name, num_buf, hdr->mhdr.num_buffers);
+dev->name, num_buf,
+virtio_to_cpu_u16(rq->vq->vdev,
+  hdr->mhdr.num_buffers));
dev->stats.rx_length_errors++;
goto err_buf;
}
@@ -460,7 +463,7 @@ static void receive_buf(struct receive_queue *rq, void 
*buf, unsigned int len)
}
 
if (vi->mergeable_rx_bufs)
-   skb = receive_mergeable(dev, rq, (unsigned long)buf, len);
+   skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
else if (vi->big_packets)
skb = receive_big(dev, rq, buf, len);
else
@@ -479,8 +482,8 @@ static void receive_buf(struct receive_queue *rq, void 
*buf, unsigned int len)
if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
pr_debug("Needs csum!\n");
if (!skb_partial_csum_set(skb,
- hdr->hdr.csum_start,
- hdr->hdr.csum_offset))
+ virtio_to_cpu_u16(vi->vdev, hdr->hdr.csum_start),
+ virtio_to_cpu_u16(vi->vdev, hdr->hdr.csum_offset)))
goto frame_err;
} else if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
@@ -511,7 +514,8 @@ static void receive_buf(struct receive_queue *rq, void 
*buf, unsigned int len)
if (hdr->hdr.gso_type & VIRTIO_NET_HDR_GSO_ECN)
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
 
-   skb_shinfo(skb)->gso_size = hdr->hdr.gso_size;
+   skb_shinfo(skb)->gso_size = virtio_to_cpu_u16(vi->vdev,
+ 
hdr->hdr.gso_size);
if (skb_shinfo(skb)->gso_size == 0) {
net_warn_ratelimited("%s: zero gso size.\n", dev->name);
goto frame_err;
@@ -871,16 +875,19 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff 
*skb)
 
if (skb->ip_summed == CHECKSUM_PARTIAL) {
hdr->hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-   hdr->hdr.csum_start = skb_checksum_start_offset(skb);
-   hdr->hdr.csum_offset = skb->csum_offset;
+   hdr->hdr.csum_start = cpu_to_virtio_u16(vi->vdev,
+   skb_checksum_start_offset(skb));
+   hdr->hdr.csum_offset = cpu_to_virtio_u16(vi->vdev,
+skb->csum_offset);
} else {
hdr->hdr.flags = 0;
hdr->hdr.csum_offset = hdr->hdr.csum_start = 0;
}
 
if (skb_is_gso(skb)) {
-   hdr->hdr.hdr_len = skb_headlen(skb);
-   hdr->hdr.gso_size = skb_shinfo(skb)->gso_size;
+   hdr->hdr.hdr_len = cpu_to_virtio_u16(vi->vdev, 
skb_headlen(skb));
+   hdr->hdr.gso_size = cpu_to_virtio_u16(vi->vdev,
+ 
skb_shinfo(skb)->gso_size);
if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
@@ -1181,7 +1188,7 @@ static void virtnet_set_rx_mode(struct net_device *dev)
sg_init_table(sg, 2);
 
/* Store the unicast

Re: [Qemu-devel] IDs in QOM

2014-10-07 Thread Paolo Bonzini

Il 07/10/2014 20:41, Kevin Wolf ha scritto:
> Is there any way to add netdevs/chardevs/devices in a non-QemuOpts way?

For chardevs, yes.

> I think always checking for the same allowed set of characters is the
> only sane way to do things. Otherwise you end up with names that can be
> used in one place, but not in another, e.g. you can create an object,
> but then not delete it again using HMP.

Note that deletion does not use QemuOpts, so device_del and netdev_del
could delete things with funny names.

> Or you can use a name for
> hotplug, but not on the initial startup when the command line
> configures the device. That's definitely something to be avoided.

I agree.

>>> > > So the "automatic arrayification" convenience feature added a property
>>> > > name restriction.  What makes us sure this is the last time we add name
>>> > > restrictions?
>> > 
>> > Nothing.  However, does it matter, as long as the now-disallowed names
>> > were not possible at all in -object/object_add?
> They were possible in QMP object-add, weren't they?

Yes.  I think we agree that we're going to change that.

Paolo

Re: [Qemu-devel] IDs in QOM

2014-10-07 Thread Paolo Bonzini

Il 07/10/2014 20:39, Markus Armbruster ha scritto:
 >> > 1) always use the same restriction when a user creates objects;
 >> >
 >> > 2) do not introduce restrictions when a user is not using QemuOpts.
 >> >
 >> > We've been doing (2) so far; often it is just because QMP wrappers 
 >> > also
 >> > used QemuOpts, but not necessarily.  So object_add just does the same.
>>> >>
>>> >> We've been doing (2) so far simply because we've never wasted a thought
>>> >> on it!  Since we're wasting thoughts now: which one do we like better?
>> >
>> > User interfaces other than QOM have been doing (2) too.
> From an implementation point of view, doing nothing is doing (2).
> 
> From an interface point of view, it's (2) only when interfaces bypassing
> QemuOpts exist.
> 
>> > netdev-add and device-add have been doing (2) because they use QemuOpts
>> > under the hood.
> qdev_device_add() uses QemuOpts.  Until relatively recently, that was
> the only way to create a qdev.  Nowadays, you could create one using QOM
> directly, bypassing QemuOpts's ID restriction.
> 
> netdev-add is similar iirc.
> 
>> > blockdev-add has been consciously doing (2) for node-name.
> Actually, we consciously fixed it to do (1):
> 
> commit 9aebf3b89281a173d2dfeee379b800be5e3f363e
> Author: Kevin Wolf 
> Date:   Thu Sep 25 09:54:02 2014 +0200
> 
> block: Validate node-name
> 
> The device_name of a BlockDriverState is currently checked because it is
> always used as a QemuOpts ID and qemu_opts_create() checks whether such
> IDs are wellformed.
> 
> node-name is supposed to share the same namespace, but it isn't checked
> currently. This patch adds explicit checks both for device_name and
> node-name so that the same rules will still apply even if QemuOpts won't
> be used any more at some point.
> 
> qemu-img used to use names with spaces in them, which isn't allowed any
> more. Replace them with underscores.
> 
>> > chardev-add has been doing (1), and I'd argue that this is a bug in
>> > chardev-add.
> I'm not sure I got you here.
> 

Nevermind, I've consistently swapped (1) and (2).

Paolo

Re: [Qemu-devel] IDs in QOM

2014-10-07 Thread Kevin Wolf

Am 07.10.2014 um 17:14 hat Paolo Bonzini geschrieben:
> Il 07/10/2014 14:16, Markus Armbruster ha scritto:
> >> > Possibly, except this would propagate all the way through the APIs.  For
> >> > example, right now [*] is added automatically to MemoryRegion
> >> > properties, but this can change in the future since many MemoryRegions
> >> > do not need array-like names.  Then you would have two sets of
> >> > MemoryRegion creation APIs, one that array-ifies names and one that 
> >> > doesn't.
> > Why couldn't you have a separate name generator there as well?
> > 
> > QOM:
> > * object_property_add() takes a non-magical name argument
> > * object_gen_name() takes a base name and generates a stream of
> >   derived names suitable for object_property_add()
> > 
> > Memory:
> > * memory_region_init() takes a non-magical name argument
> > * memory_gen_name() takes a base name... you get the idea
> >   actually a wrapper around object_gen_name()
> 
> I see what you mean; you could even reuse object_gen_name().  It looks
> sane, I guess one has to see a patch to judge if it also _is_ sane. :)
> 
> > > > Why is it a good idea have two separate restrictions on property names?
> > > > A loser one that applies always (anything but '\0' and '/'), and a
> > > > stricter one that applies sometimes (letters, digits, '-', '.', '_',
> > > > starting with a letter).
> > > > 
> > > > If yes, how is "sometimes" defined?
> > >
> > > It applies to objects created by the user (either in
> > > /machine/peripheral, or in /objects).  Why the restriction?  For
> > > -object, because creating the object involves QemuOpts.  You then have
> > > two ways to satisfy the principle of least astonishment:
> > >
> > > 1) always use the same restriction when a user creates objects;
> > >
> > > 2) do not introduce restrictions when a user is not using QemuOpts.
> > >
> > > We've been doing (2) so far; often it is just because QMP wrappers also
> > > used QemuOpts, but not necessarily.  So object_add just does the same.
> >
> > We've been doing (2) so far simply because we've never wasted a thought
> > on it!  Since we're wasting thoughts now: which one do we like better?
> 
> User interfaces other than QOM have been doing (2) too.
> 
> netdev-add and device-add have been doing (2) because they use QemuOpts
> under the hood.
> 
> blockdev-add has been consciously doing (2) for node-name.
> 
> chardev-add has been doing (1), and I'd argue that this is a bug in
> chardev-add.

Is there any way to add netdevs/chardevs/devices in a non-QemuOpts way?
If not, (1) and (2) are the same thing, and the checks are always
applied.

I think always checking for the same allowed set of characters is the
only sane way to do things. Otherwise you end up with names that can be
used in one place, but not in another, e.g. you can create an object,
but then not delete it again using HMP. Or you can use a name for
hotplug, but not on the initial startup when the command line
configures the device. That's definitely something to be avoided.

> > So the "automatic arrayification" convenience feature added a property
> > name restriction.  What makes us sure this is the last time we add name
> > restrictions?
> 
> Nothing.  However, does it matter, as long as the now-disallowed names
> were not possible at all in -object/object_add?

They were possible in QMP object-add, weren't they?

Kevin

Re: [Qemu-devel] IDs in QOM

2014-10-07 Thread Markus Armbruster

Paolo Bonzini  writes:

> Il 07/10/2014 14:16, Markus Armbruster ha scritto:
>>> > Possibly, except this would propagate all the way through the APIs.  For
>>> > example, right now [*] is added automatically to MemoryRegion
>>> > properties, but this can change in the future since many MemoryRegions
>>> > do not need array-like names.  Then you would have two sets of
>>> > MemoryRegion creation APIs, one that array-ifies names and one
>>> > that doesn't.
>> Why couldn't you have a separate name generator there as well?
>> 
>> QOM:
>> * object_property_add() takes a non-magical name argument
>> * object_gen_name() takes a base name and generates a stream of
>>   derived names suitable for object_property_add()
>> 
>> Memory:
>> * memory_region_init() takes a non-magical name argument
>> * memory_gen_name() takes a base name... you get the idea
>>   actually a wrapper around object_gen_name()
>
> I see what you mean; you could even reuse object_gen_name().  It looks
> sane, I guess one has to see a patch to judge if it also _is_ sane. :)

Yup.  Any takers?

>> > > Why is it a good idea have two separate restrictions on property names?
>> > > A loser one that applies always (anything but '\0' and '/'), and a
>> > > stricter one that applies sometimes (letters, digits, '-', '.', '_',
>> > > starting with a letter).
>> > > 
>> > > If yes, how is "sometimes" defined?
>> >
>> > It applies to objects created by the user (either in
>> > /machine/peripheral, or in /objects).  Why the restriction?  For
>> > -object, because creating the object involves QemuOpts.  You then have
>> > two ways to satisfy the principle of least astonishment:
>> >
>> > 1) always use the same restriction when a user creates objects;
>> >
>> > 2) do not introduce restrictions when a user is not using QemuOpts.
>> >
>> > We've been doing (2) so far; often it is just because QMP wrappers also
>> > used QemuOpts, but not necessarily.  So object_add just does the same.
>>
>> We've been doing (2) so far simply because we've never wasted a thought
>> on it!  Since we're wasting thoughts now: which one do we like better?
>
> User interfaces other than QOM have been doing (2) too.

>From an implementation point of view, doing nothing is doing (2).

>From an interface point of view, it's (2) only when interfaces bypassing
QemuOpts exist.

> netdev-add and device-add have been doing (2) because they use QemuOpts
> under the hood.

qdev_device_add() uses QemuOpts.  Until relatively recently, that was
the only way to create a qdev.  Nowadays, you could create one using QOM
directly, bypassing QemuOpts's ID restriction.

netdev-add is similar iirc.

> blockdev-add has been consciously doing (2) for node-name.

Actually, we consciously fixed it to do (1):

commit 9aebf3b89281a173d2dfeee379b800be5e3f363e
Author: Kevin Wolf 
Date:   Thu Sep 25 09:54:02 2014 +0200

block: Validate node-name

The device_name of a BlockDriverState is currently checked because it is
always used as a QemuOpts ID and qemu_opts_create() checks whether such
IDs are wellformed.

node-name is supposed to share the same namespace, but it isn't checked
currently. This patch adds explicit checks both for device_name and
node-name so that the same rules will still apply even if QemuOpts won't
be used any more at some point.

qemu-img used to use names with spaces in them, which isn't allowed any
more. Replace them with underscores.

> chardev-add has been doing (1), and I'd argue that this is a bug in
> chardev-add.

I'm not sure I got you here.

> QOM has two families of operations.
>
> One is -object/object-add/object-del.  This is a high-level operation
> that only works with specific QOM classes (those that implement
> UserCreatable) and only operate on a specific part of the QOM tree
> (/objects).
>
> The other is qom-get/qom-set.  This is a low-level operation that can
> explore all of the QOM tree.  It cannot _create_ new objects and
> properties, however, so the user cannot escape the naming sandbox that
> we put in place for him.
>
> I think it's fair to limit the high-level operations to the same id
> space, no matter if they're QemuOpts based or not.

Yes.

"QemuOpts-based" should never be a concern at the interface.  It's an
implementation detail.

>> Based on experience, I'd rather not make "user-created"
>> vs. "system-created" a hard boundary.  Once a system-created funny name
>> has become ABI, we can't ever let the user create it.  One reason for me
>> to prefer (1).
>
> Anything that is outside /objects is "funny", not just anything that has
> weird characters in its name.  The QOM API consists of "magic" object
> canonical paths and magic property names which, as far as I know, can be
> easily listed:
>
> * the aforementioned /machine.rtc-time that lets you detect missed
> RTC_CHANGE events
>
> * the /backend tree that includes info on the graphic consoles.  Not
> sure if this is considered stable, but it's there.
>

[Qemu-devel] [PATCH RFC 06/11] virtio: allow transports to get avail/used addresses

2014-10-07 Thread Cornelia Huck

For virtio-1, we can theoretically have a more complex virtqueue
layout with avail and used buffers not on a contiguous memory area
with the descriptor table. For now, it's fine for a transport driver
to stay with the old layout: It needs, however, a way to access
the locations of the avail/used rings so it can register them with
the host.

Reviewed-by: David Hildenbrand 
Signed-off-by: Cornelia Huck 
---
 drivers/virtio/virtio_ring.c |   16 
 include/linux/virtio.h   |3 +++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 350c39b..dd0d4ec 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -961,4 +961,20 @@ void virtio_break_device(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_break_device);
 
+void *virtqueue_get_avail(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq->vring.avail;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_avail);
+
+void *virtqueue_get_used(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq->vring.used;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_used);
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 68cadd4..f10e6e7 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -76,6 +76,9 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
 
 bool virtqueue_is_broken(struct virtqueue *vq);
 
+void *virtqueue_get_avail(struct virtqueue *vq);
+void *virtqueue_get_used(struct virtqueue *vq);
+
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 01/11] linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1

2014-10-07 Thread Cornelia Huck

From: Thomas Huth 

Add the new VIRTIO_F_VERSION_1 definition to the virtio_config.h
linux header.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 linux-headers/linux/virtio_config.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-headers/linux/virtio_config.h 
b/linux-headers/linux/virtio_config.h
index 75dc20b..16aa289 100644
--- a/linux-headers/linux/virtio_config.h
+++ b/linux-headers/linux/virtio_config.h
@@ -54,4 +54,7 @@
 /* Can the device handle any descriptor layout? */
 #define VIRTIO_F_ANY_LAYOUT27
 
+/* v1.0 compliant. */
+#define VIRTIO_F_VERSION_1 32
+
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.7.9.5

Re: [Qemu-devel] [PULL v2 0/5] linux-user patches for 2.2

2014-10-07 Thread Peter Maydell

On 6 October 2014 20:11,   wrote:
> From: Riku Voipio 
>
> The following changes since commit 2472b6c07bb50179019589af1c22f43935ab7f5c:
>
>   gdbstub: Allow target CPUs to specify watchpoint STOP_BEFORE_ACCESS flag 
> (2014-10-06 14:25:43 +0100)
>
> are available in the git repository at:
>
>   git://git.linaro.org/people/riku.voipio/qemu.git 
> tags/pull-linux-user-20141006-2
>
> for you to fetch changes up to 1a1c4db9b298956e89caf53b09b6a7a960d55d66:
>
>   translate-all.c: memory walker initial address miscalculation (2014-10-06 
> 21:53:35 +0300)
>
> 
> linux-user pull for 2.2
>
> Clearest linux-user patches sent to the list since august,
> Apart from Mikhails patch, the rest are quite trivial.
>
> v2: check for CONFIG_TIMERFD only after it has been defined
>
> 

Applied, thanks.

-- PMM

[Qemu-devel] [PATCH RFC 04/11] s390x/virtio-ccw: fix check for WRITE_FEAT

2014-10-07 Thread Cornelia Huck

We need to check guest feature size, not host feature size to find
out whether we should call virtio_set_features(). This check is
possible now that vdev->guest_features is an array.

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 8aa79a7..69add47 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -399,7 +399,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 features.index = ldub_phys(&address_space_memory,
ccw.cda + sizeof(features.features));
 features.features = ldl_le_phys(&address_space_memory, ccw.cda);
-if (features.index < ARRAY_SIZE(dev->host_features)) {
+if (features.index < ARRAY_SIZE(vdev->guest_features)) {
 virtio_set_features(vdev, features.index, features.features);
 } else {
 /*
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 00/11] linux: towards virtio-1 guest support

2014-10-07 Thread Cornelia Huck

This patchset tries to go towards implementing both virtio-1 compliant and
transitional virtio drivers in Linux. Branch available at

git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux virtio-1

This is based on some old patches by Rusty to handle extended feature bits
and endianness conversions. Thomas implemented the new virtio-ccw transport
revision command, and I hacked up some further endianness stuff and
virtio-ccw enablement. Probably a lot still missing, but I can run a
virtio-ccw guest that enables virtio-1 accesses if the host supports it
(via the qemu host patchset) - virtio-net and virtio-blk only so far.

I consider this patchset a starting point for further discussions.

Cornelia Huck (5):
  virtio: endianess conversion helpers
  virtio: allow transports to get avail/used addresses
  virtio_blk: use virtio v1.0 endian
  KVM: s390: virtio-ccw revision 1 SET_VQ
  KVM: s390: enable virtio-ccw revision 1

Rusty Russell (5):
  virtio: use u32, not bitmap for struct virtio_device's features
  virtio: add support for 64 bit features.
  virtio_ring: implement endian reversal based on VERSION_1 feature.
  virtio_config: endian conversion for v1.0.
  virtio_net: use v1.0 endian.

Thomas Huth (1):
  KVM: s390: Set virtio-ccw transport revision

 drivers/block/virtio_blk.c |4 +
 drivers/char/virtio_console.c  |2 +-
 drivers/lguest/lguest_device.c |   16 +--
 drivers/net/virtio_net.c   |   31 +++--
 drivers/remoteproc/remoteproc_virtio.c |7 +-
 drivers/s390/kvm/kvm_virtio.c  |   10 +-
 drivers/s390/kvm/virtio_ccw.c  |  165 -
 drivers/virtio/virtio.c|   22 ++--
 drivers/virtio/virtio_mmio.c   |   20 +--
 drivers/virtio/virtio_pci.c|8 +-
 drivers/virtio/virtio_ring.c   |  213 +++-
 include/linux/virtio.h |   46 ++-
 include/linux/virtio_config.h  |   17 +--
 include/uapi/linux/virtio_config.h |3 +
 tools/virtio/linux/virtio.h|   22 +---
 tools/virtio/linux/virtio_config.h |2 +-
 tools/virtio/virtio_test.c |5 +-
 tools/virtio/vringh_test.c |   16 +--
 18 files changed, 428 insertions(+), 181 deletions(-)

-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 06/11] virtio: allow virtio-1 queue layout

2014-10-07 Thread Cornelia Huck

For virtio-1 devices, we allow a more complex queue layout that doesn't
require descriptor table and rings on a physically-contigous memory area:
add virtio_queue_set_rings() to allow transports to set this up.

Signed-off-by: Cornelia Huck 
---
 hw/virtio/virtio.c |   16 
 include/hw/virtio/virtio.h |2 ++
 2 files changed, 18 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index e6ae3a0..147d227 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq)
 {
 hwaddr pa = vq->pa;
 
+if (pa == -1ULL) {
+/*
+ * This is a virtio-1 style vq that has already been setup
+ * in virtio_queue_set.
+ */
+return;
+}
 vq->vring.desc = pa;
 vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
 vq->vring.used = vring_align(vq->vring.avail +
@@ -719,6 +726,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
 return vdev->vq[n].pa;
 }
 
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used)
+{
+vdev->vq[n].pa = -1ULL;
+vdev->vq[n].vring.desc = desc;
+vdev->vq[n].vring.avail = avail;
+vdev->vq[n].vring.used = used;
+}
+
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
 {
 /* Don't allow guest to flip queue between existent and
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 40e567c..f840320 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -227,6 +227,8 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, 
hwaddr addr);
 hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n);
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num);
 int virtio_queue_get_num(VirtIODevice *vdev, int n);
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used);
 void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 10/11] s390x/virtio-ccw: support virtio-1 set_vq format

2014-10-07 Thread Cornelia Huck

Support the new CCW_CMD_SET_VQ format for virtio-1 devices.

While we're at it, refactor the code a bit and enforce big endian
fields (which had always been required, even for legacy).

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c |  114 ++---
 1 file changed, 80 insertions(+), 34 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 0d414f6..80efe88 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -238,11 +238,20 @@ VirtualCssBus *virtual_css_bus_init(void)
 }
 
 /* Communication blocks used by several channel commands. */
-typedef struct VqInfoBlock {
+typedef struct VqInfoBlockLegacy {
 uint64_t queue;
 uint32_t align;
 uint16_t index;
 uint16_t num;
+} QEMU_PACKED VqInfoBlockLegacy;
+
+typedef struct VqInfoBlock {
+uint64_t desc;
+uint32_t res0;
+uint16_t index;
+uint16_t num;
+uint64_t avail;
+uint64_t used;
 } QEMU_PACKED VqInfoBlock;
 
 typedef struct VqConfigBlock {
@@ -269,17 +278,20 @@ typedef struct VirtioRevInfo {
 } QEMU_PACKED VirtioRevInfo;
 
 /* Specify where the virtqueues for the subchannel are in guest memory. */
-static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
-  uint16_t index, uint16_t num)
+static int virtio_ccw_set_vqs(SubchDev *sch, VqInfoBlock *info,
+  VqInfoBlockLegacy *linfo)
 {
 VirtIODevice *vdev = virtio_ccw_get_vdev(sch);
+uint16_t index = info ? info->index : linfo->index;
+uint16_t num = info ? info->num : linfo->num;
+uint64_t desc = info ? info->desc : linfo->queue;
 
 if (index > VIRTIO_PCI_QUEUE_MAX) {
 return -EINVAL;
 }
 
 /* Current code in virtio.c relies on 4K alignment. */
-if (addr && (align != 4096)) {
+if (linfo && desc && (linfo->align != 4096)) {
 return -EINVAL;
 }
 
@@ -287,8 +299,12 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return -EINVAL;
 }
 
-virtio_queue_set_addr(vdev, index, addr);
-if (!addr) {
+if (info) {
+virtio_queue_set_rings(vdev, index, desc, info->avail, info->used);
+} else {
+virtio_queue_set_addr(vdev, index, desc);
+}
+if (!desc) {
 virtio_queue_set_vector(vdev, index, 0);
 } else {
 /* Fail if we don't have a big enough queue. */
@@ -303,10 +319,66 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return 0;
 }
 
-static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
+static int virtio_ccw_handle_set_vq(SubchDev *sch, CCW1 ccw, bool check_len,
+bool is_legacy)
 {
 int ret;
 VqInfoBlock info;
+VqInfoBlockLegacy linfo;
+size_t info_len = is_legacy ? sizeof(linfo) : sizeof(info);
+
+if (check_len) {
+if (ccw.count != info_len) {
+return -EINVAL;
+}
+} else if (ccw.count < info_len) {
+/* Can't execute command. */
+return -EINVAL;
+}
+if (!ccw.cda) {
+return -EFAULT;
+}
+if (is_legacy) {
+linfo.queue = ldq_be_phys(&address_space_memory, ccw.cda);
+linfo.align = ldl_be_phys(&address_space_memory,
+  ccw.cda + sizeof(linfo.queue));
+linfo.index = lduw_be_phys(&address_space_memory,
+   ccw.cda + sizeof(linfo.queue)
+   + sizeof(linfo.align));
+linfo.num = lduw_be_phys(&address_space_memory,
+ ccw.cda + sizeof(linfo.queue)
+ + sizeof(linfo.align)
+ + sizeof(linfo.index));
+ret = virtio_ccw_set_vqs(sch, NULL, &linfo);
+} else {
+info.desc = ldq_be_phys(&address_space_memory, ccw.cda);
+info.index = lduw_be_phys(&address_space_memory,
+  ccw.cda + sizeof(info.desc)
+  + sizeof(info.res0));
+info.num = lduw_be_phys(&address_space_memory,
+ccw.cda + sizeof(info.desc)
+  + sizeof(info.res0)
+  + sizeof(info.index));
+info.avail = ldq_be_phys(&address_space_memory,
+ ccw.cda + sizeof(info.desc)
+ + sizeof(info.res0)
+ + sizeof(info.index)
+ + sizeof(info.num));
+info.used = ldq_be_phys(&address_space_memory,
+ccw.cda + sizeof(info.desc)
++ sizeof(info.res0)
++ sizeof(info.index)
++ sizeof(info.num)
++ sizeof(info.avail));
+ret = virtio_ccw_set_vqs(sch, &info, NULL);
+}
+

[Qemu-devel] [PATCH RFC 02/11] virtio: add support for 64 bit features.

2014-10-07 Thread Cornelia Huck

From: Rusty Russell 

Change the u32 to a u64, and make sure to use 1ULL everywhere!

Cc: Brian Swetland 
Cc: Christian Borntraeger 
[Thomas Huth: fix up virtio-ccw get_features]
Signed-off-by: Rusty Russell 
Signed-off-by: Cornelia Huck 
Acked-by: Pawel Moll 
Acked-by: Ohad Ben-Cohen 
---
 drivers/char/virtio_console.c  |2 +-
 drivers/lguest/lguest_device.c |   10 +-
 drivers/remoteproc/remoteproc_virtio.c |5 -
 drivers/s390/kvm/kvm_virtio.c  |   10 +-
 drivers/s390/kvm/virtio_ccw.c  |   29 -
 drivers/virtio/virtio.c|   12 ++--
 drivers/virtio/virtio_mmio.c   |   14 +-
 drivers/virtio/virtio_pci.c|5 ++---
 drivers/virtio/virtio_ring.c   |2 +-
 include/linux/virtio.h |2 +-
 include/linux/virtio_config.h  |8 
 tools/virtio/linux/virtio.h|2 +-
 tools/virtio/linux/virtio_config.h |2 +-
 13 files changed, 64 insertions(+), 39 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index c4a437e..f9c6288 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -355,7 +355,7 @@ static inline bool use_multiport(struct ports_device 
*portdev)
 */
if (!portdev->vdev)
return 0;
-   return portdev->vdev->features & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+   return portdev->vdev->features & (1ULL << VIRTIO_CONSOLE_F_MULTIPORT);
 }
 
 static DEFINE_SPINLOCK(dma_bufs_lock);
diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index c831c47..4d29bcd 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -94,17 +94,17 @@ static unsigned desc_size(const struct lguest_device_desc 
*desc)
 }
 
 /* This gets the device's feature bits. */
-static u32 lg_get_features(struct virtio_device *vdev)
+static u64 lg_get_features(struct virtio_device *vdev)
 {
unsigned int i;
-   u32 features = 0;
+   u64 features = 0;
struct lguest_device_desc *desc = to_lgdev(vdev)->desc;
u8 *in_features = lg_features(desc);
 
/* We do this the slow but generic way. */
-   for (i = 0; i < min(desc->feature_len * 8, 32); i++)
+   for (i = 0; i < min(desc->feature_len * 8, 64); i++)
if (in_features[i / 8] & (1 << (i % 8)))
-   features |= (1 << i);
+   features |= (1ULL << i);
 
return features;
 }
@@ -144,7 +144,7 @@ static void lg_finalize_features(struct virtio_device *vdev)
memset(out_features, 0, desc->feature_len);
bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
for (i = 0; i < bits; i++) {
-   if (vdev->features & (1 << i))
+   if (vdev->features & (1ULL << i))
out_features[i / 8] |= (1 << (i % 8));
}
 
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index dafaf38..627737e 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -207,7 +207,7 @@ static void rproc_virtio_reset(struct virtio_device *vdev)
 }
 
 /* provide the vdev features as retrieved from the firmware */
-static u32 rproc_virtio_get_features(struct virtio_device *vdev)
+static u64 rproc_virtio_get_features(struct virtio_device *vdev)
 {
struct rproc_vdev *rvdev = vdev_to_rvdev(vdev);
struct fw_rsc_vdev *rsc;
@@ -227,6 +227,9 @@ static void rproc_virtio_finalize_features(struct 
virtio_device *vdev)
/* Give virtio_ring a chance to accept features */
vring_transport_features(vdev);
 
+   /* Make sure we don't have any features > 32 bits! */
+   BUG_ON((u32)vdev->features != vdev->features);
+
/*
 * Remember the finalized features of our vdev, and provide it
 * to the remote processor once it is powered on.
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index d747ca4..6d4cbea 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -80,16 +80,16 @@ static unsigned desc_size(const struct kvm_device_desc 
*desc)
 }
 
 /* This gets the device's feature bits. */
-static u32 kvm_get_features(struct virtio_device *vdev)
+static u64 kvm_get_features(struct virtio_device *vdev)
 {
unsigned int i;
-   u32 features = 0;
+   u64 features = 0;
struct kvm_device_desc *desc = to_kvmdev(vdev)->desc;
u8 *in_features = kvm_vq_features(desc);
 
-   for (i = 0; i < min(desc->feature_len * 8, 32); i++)
+   for (i = 0; i < min(desc->feature_len * 8, 64); i++)
if (in_features[i / 8] & (1 << (i % 8)))
-   features |= (1 << i);
+   features |= (1ULL << i);
return features;
 }
 
@@ -106,7 +106,7 @@ static void kvm_finalize_feat

[Qemu-devel] [PATCH RFC 05/11] virtio: introduce legacy virtio devices

2014-10-07 Thread Cornelia Huck

Introduce a helper function to indicate  whether a virtio device is
operating in legacy or virtio standard mode.

It may be used to make decisions about the endianess of virtio accesses
and other virtio-1 specific changes, enabling us to support transitional
devices.

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/virtio/virtio.c|6 +-
 include/hw/virtio/virtio-access.h |4 
 include/hw/virtio/virtio.h|   13 +++--
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7aaa953..e6ae3a0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -883,7 +883,11 @@ static bool virtio_device_endian_needed(void *opaque)
 VirtIODevice *vdev = opaque;
 
 assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
-return vdev->device_endian != virtio_default_endian();
+if (virtio_device_is_legacy(vdev)) {
+return vdev->device_endian != virtio_default_endian();
+}
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE;
 }
 
 static const VMStateDescription vmstate_virtio_device_endian = {
diff --git a/include/hw/virtio/virtio-access.h 
b/include/hw/virtio/virtio-access.h
index 46456fd..c123ee0 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -19,6 +19,10 @@
 
 static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
 {
+if (!virtio_device_is_legacy(vdev)) {
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return false;
+}
 #if defined(TARGET_IS_BIENDIAN)
 return virtio_is_big_endian(vdev);
 #elif defined(TARGET_WORDS_BIGENDIAN)
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b408166..40e567c 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -275,9 +275,18 @@ void virtio_queue_set_host_notifier_fd_handler(VirtQueue 
*vq, bool assign,
 void virtio_queue_notify_vq(VirtQueue *vq);
 void virtio_irq(VirtQueue *vq);
 
+static inline bool virtio_device_is_legacy(VirtIODevice *vdev)
+{
+return !(vdev->guest_features[1] & (1 << (VIRTIO_F_VERSION_1 - 32)));
+}
+
 static inline bool virtio_is_big_endian(VirtIODevice *vdev)
 {
-assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
-return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+if (virtio_device_is_legacy(vdev)) {
+assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
+return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+}
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return false;
 }
 #endif
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 05/11] virtio_config: endian conversion for v1.0.

2014-10-07 Thread Cornelia Huck

From: Rusty Russell 

Reviewed-by: David Hildenbrand 
Signed-off-by: Rusty Russell 
Signed-off-by: Cornelia Huck 
---
 include/linux/virtio_config.h |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index a0e16d8..ca22e3a 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -222,12 +222,13 @@ static inline u16 virtio_cread16(struct virtio_device 
*vdev,
 {
u16 ret;
vdev->config->get(vdev, offset, &ret, sizeof(ret));
-   return ret;
+   return virtio_to_cpu_u16(vdev, ret);
 }
 
 static inline void virtio_cwrite16(struct virtio_device *vdev,
   unsigned int offset, u16 val)
 {
+   val = cpu_to_virtio_u16(vdev, val);
vdev->config->set(vdev, offset, &val, sizeof(val));
 }
 
@@ -236,12 +237,13 @@ static inline u32 virtio_cread32(struct virtio_device 
*vdev,
 {
u32 ret;
vdev->config->get(vdev, offset, &ret, sizeof(ret));
-   return ret;
+   return virtio_to_cpu_u32(vdev, ret);
 }
 
 static inline void virtio_cwrite32(struct virtio_device *vdev,
   unsigned int offset, u32 val)
 {
+   val = cpu_to_virtio_u32(vdev, val);
vdev->config->set(vdev, offset, &val, sizeof(val));
 }
 
@@ -250,12 +252,13 @@ static inline u64 virtio_cread64(struct virtio_device 
*vdev,
 {
u64 ret;
vdev->config->get(vdev, offset, &ret, sizeof(ret));
-   return ret;
+   return virtio_to_cpu_u64(vdev, ret);
 }
 
 static inline void virtio_cwrite64(struct virtio_device *vdev,
   unsigned int offset, u64 val)
 {
+   val = cpu_to_virtio_u64(vdev, val);
vdev->config->set(vdev, offset, &val, sizeof(val));
 }
 
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 00/11] qemu: towards virtio-1 host support

2014-10-07 Thread Cornelia Huck

This patchset aims to get us some way to implement virtio-1 compliant
and transitional devices in qemu. Branch available at

git://github.com/cohuck/qemu virtio-1

I've mainly focused on:
- endianness handling
- extended feature bits
- virtio-ccw new/changed commands

Thanks go to Thomas for some preliminary work in this area.

I've been able to start guests both with and without the virtio-1 patches
in the linux guest patchset, with virtio-net and virtio-blk devices (with
and without dataplane). virtio-ccw only :) vhost, migration and loads of
other things have been ignored for now.

I'd like to know whether I walk into the right direction, so let's consider
this as a starting point.

Cornelia Huck (8):
  virtio: cull virtio_bus_set_vdev_features
  virtio: support more feature bits
  s390x/virtio-ccw: fix check for WRITE_FEAT
  virtio: introduce legacy virtio devices
  virtio: allow virtio-1 queue layout
  dataplane: allow virtio-1 devices
  s390x/virtio-ccw: support virtio-1 set_vq format
  s390x/virtio-ccw: enable virtio 1.0

Thomas Huth (3):
  linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1
  s390x/css: Add a callback for when subchannel gets disabled
  s390x/virtio-ccw: add virtio set-revision call

 hw/9pfs/virtio-9p-device.c  |7 +-
 hw/block/dataplane/virtio-blk.c |3 +-
 hw/block/virtio-blk.c   |9 +-
 hw/char/virtio-serial-bus.c |9 +-
 hw/net/virtio-net.c |   38 ---
 hw/s390x/css.c  |   12 +++
 hw/s390x/css.h  |1 +
 hw/s390x/s390-virtio-bus.c  |9 +-
 hw/s390x/virtio-ccw.c   |  188 +++
 hw/s390x/virtio-ccw.h   |7 +-
 hw/scsi/vhost-scsi.c|7 +-
 hw/scsi/virtio-scsi-dataplane.c |2 +-
 hw/scsi/virtio-scsi.c   |8 +-
 hw/virtio/Makefile.objs |2 +-
 hw/virtio/dataplane/Makefile.objs   |2 +-
 hw/virtio/dataplane/vring.c |   95 ++
 hw/virtio/virtio-balloon.c  |8 +-
 hw/virtio/virtio-bus.c  |   23 +
 hw/virtio/virtio-mmio.c |9 +-
 hw/virtio/virtio-pci.c  |   13 +--
 hw/virtio/virtio-rng.c  |2 +-
 hw/virtio/virtio.c  |   51 +++---
 include/hw/virtio/dataplane/vring.h |   64 +++-
 include/hw/virtio/virtio-access.h   |4 +
 include/hw/virtio/virtio-bus.h  |   10 +-
 include/hw/virtio/virtio.h  |   34 +--
 linux-headers/linux/virtio_config.h |3 +
 27 files changed, 442 insertions(+), 178 deletions(-)

-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 11/11] KVM: s390: enable virtio-ccw revision 1

2014-10-07 Thread Cornelia Huck

Now that virtio-ccw has everything needed to support virtio 1.0 in
place, try to enable it if the host supports it.

Reviewed-by: David Hildenbrand 
Signed-off-by: Cornelia Huck 
---
 drivers/s390/kvm/virtio_ccw.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index f97d3fb..a2e0c33 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -103,7 +103,7 @@ struct virtio_rev_info {
 };
 
 /* the highest virtio-ccw revision we support */
-#define VIRTIO_CCW_REV_MAX 0
+#define VIRTIO_CCW_REV_MAX 1
 
 struct virtio_ccw_vq_info {
struct virtqueue *vq;
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 09/11] KVM: s390: Set virtio-ccw transport revision

2014-10-07 Thread Cornelia Huck

From: Thomas Huth 

With the new SET-VIRTIO-REVISION command of the virtio 1.0 standard, we
can now negotiate the virtio-ccw revision after setting a channel online.

Note that we don't negotiate version 1 yet.

[Cornelia Huck: reworked revision loop a bit]
Reviewed-by: David Hildenbrand 
Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 drivers/s390/kvm/virtio_ccw.c |   63 +
 1 file changed, 63 insertions(+)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 4173b59..cbe2ba8 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -55,6 +55,7 @@ struct virtio_ccw_device {
struct ccw_device *cdev;
__u32 curr_io;
int err;
+   unsigned int revision; /* Transport revision */
wait_queue_head_t wait_q;
spinlock_t lock;
struct list_head virtqueues;
@@ -86,6 +87,15 @@ struct virtio_thinint_area {
u8 isc;
 } __packed;
 
+struct virtio_rev_info {
+   __u16 revision;
+   __u16 length;
+   __u8 data[];
+};
+
+/* the highest virtio-ccw revision we support */
+#define VIRTIO_CCW_REV_MAX 0
+
 struct virtio_ccw_vq_info {
struct virtqueue *vq;
int num;
@@ -122,6 +132,7 @@ static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
 #define CCW_CMD_WRITE_STATUS 0x31
 #define CCW_CMD_READ_VQ_CONF 0x32
 #define CCW_CMD_SET_IND_ADAPTER 0x73
+#define CCW_CMD_SET_VIRTIO_REV 0x83
 
 #define VIRTIO_CCW_DOING_SET_VQ 0x0001
 #define VIRTIO_CCW_DOING_RESET 0x0004
@@ -134,6 +145,7 @@ static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
 #define VIRTIO_CCW_DOING_READ_VQ_CONF 0x0200
 #define VIRTIO_CCW_DOING_SET_CONF_IND 0x0400
 #define VIRTIO_CCW_DOING_SET_IND_ADAPTER 0x0800
+#define VIRTIO_CCW_DOING_SET_VIRTIO_REV 0x1000
 #define VIRTIO_CCW_INTPARM_MASK 0x
 
 static struct virtio_ccw_device *to_vc_device(struct virtio_device *vdev)
@@ -934,6 +946,7 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev,
case VIRTIO_CCW_DOING_RESET:
case VIRTIO_CCW_DOING_READ_VQ_CONF:
case VIRTIO_CCW_DOING_SET_IND_ADAPTER:
+   case VIRTIO_CCW_DOING_SET_VIRTIO_REV:
vcdev->curr_io &= ~activity;
wake_up(&vcdev->wait_q);
break;
@@ -1053,6 +1066,51 @@ static int virtio_ccw_offline(struct ccw_device *cdev)
return 0;
 }
 
+static int virtio_ccw_set_transport_rev(struct virtio_ccw_device *vcdev)
+{
+   struct virtio_rev_info *rev;
+   struct ccw1 *ccw;
+   int ret;
+
+   ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
+   if (!ccw)
+   return -ENOMEM;
+   rev = kzalloc(sizeof(*rev), GFP_DMA | GFP_KERNEL);
+   if (!rev) {
+   kfree(ccw);
+   return -ENOMEM;
+   }
+
+   /* Set transport revision */
+   ccw->cmd_code = CCW_CMD_SET_VIRTIO_REV;
+   ccw->flags = 0;
+   ccw->count = sizeof(*rev);
+   ccw->cda = (__u32)(unsigned long)rev;
+
+   vcdev->revision = VIRTIO_CCW_REV_MAX;
+   do {
+   rev->revision = vcdev->revision;
+   /* none of our supported revisions carry payload */
+   rev->length = 0;
+   ret = ccw_io_helper(vcdev, ccw,
+   VIRTIO_CCW_DOING_SET_VIRTIO_REV);
+   if (ret == -EOPNOTSUPP) {
+   if (vcdev->revision == 0)
+   /*
+* The host device does not support setting
+* the revision: let's operate it in legacy
+* mode.
+*/
+   ret = 0;
+   else
+   vcdev->revision--;
+   }
+   } while (ret == -EOPNOTSUPP);
+
+   kfree(ccw);
+   kfree(rev);
+   return ret;
+}
 
 static int virtio_ccw_online(struct ccw_device *cdev)
 {
@@ -1093,6 +1151,11 @@ static int virtio_ccw_online(struct ccw_device *cdev)
spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
vcdev->vdev.id.vendor = cdev->id.cu_type;
vcdev->vdev.id.device = cdev->id.cu_model;
+
+   ret = virtio_ccw_set_transport_rev(vcdev);
+   if (ret)
+   goto out_free;
+
ret = register_virtio_device(&vcdev->vdev);
if (ret) {
dev_warn(&cdev->dev, "Failed to register virtio device: %d\n",
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 02/11] virtio: cull virtio_bus_set_vdev_features

2014-10-07 Thread Cornelia Huck

The only user of this function was virtio-ccw, and it should use
virtio_set_features() like everybody else: We need to make sure
that bad features are masked out properly, which this function did
not do.

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c  |3 +--
 hw/virtio/virtio-bus.c |   14 --
 include/hw/virtio/virtio-bus.h |3 ---
 3 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index e7d3ea1..7833dd2 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -400,8 +400,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cda + sizeof(features.features));
 features.features = ldl_le_phys(&address_space_memory, ccw.cda);
 if (features.index < ARRAY_SIZE(dev->host_features)) {
-virtio_bus_set_vdev_features(&dev->bus, features.features);
-vdev->guest_features = features.features;
+virtio_set_features(vdev, features.features);
 } else {
 /*
  * If the guest supports more feature bits, assert that it
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index eb77019..a8ffa07 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -109,20 +109,6 @@ uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus,
 return k->get_features(vdev, requested_features);
 }
 
-/* Set the features of the plugged device. */
-void virtio_bus_set_vdev_features(VirtioBusState *bus,
-  uint32_t requested_features)
-{
-VirtIODevice *vdev = virtio_bus_get_device(bus);
-VirtioDeviceClass *k;
-
-assert(vdev != NULL);
-k = VIRTIO_DEVICE_GET_CLASS(vdev);
-if (k->set_features != NULL) {
-k->set_features(vdev, requested_features);
-}
-}
-
 /* Get bad features of the plugged device. */
 uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus)
 {
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 0756545..0d2e7b4 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -84,9 +84,6 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus);
 /* Get the features of the plugged device. */
 uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus,
 uint32_t requested_features);
-/* Set the features of the plugged device. */
-void virtio_bus_set_vdev_features(VirtioBusState *bus,
-  uint32_t requested_features);
 /* Get bad features of the plugged device. */
 uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus);
 /* Get config of the plugged device. */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 11/11] s390x/virtio-ccw: enable virtio 1.0

2014-10-07 Thread Cornelia Huck

virtio-ccw should now have everything in place to operate virtio 1.0
devices, so let's enable revision 1.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index 03d5955..08edd8d 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -73,7 +73,7 @@ typedef struct VirtIOCCWDeviceClass {
 #define VIRTIO_CCW_FEATURE_SIZE NR_VIRTIO_FEATURE_WORDS
 
 /* The maximum virtio revision we support. */
-#define VIRTIO_CCW_REV_MAX 0
+#define VIRTIO_CCW_REV_MAX 1
 
 /* Performance improves when virtqueue kick processing is decoupled from the
  * vcpu thread using ioeventfd for some devices. */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 08/11] virtio_blk: use virtio v1.0 endian

2014-10-07 Thread Cornelia Huck

Note that we care only about the fields still in use for virtio v1.0.

Reviewed-by: Thomas Huth 
Reviewed-by: David Hildenbrand 
Signed-off-by: Cornelia Huck 
---
 drivers/block/virtio_blk.c |4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0a58140..08a8012 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -119,6 +119,10 @@ static int __virtblk_add_req(struct virtqueue *vq,
sg_init_one(&status, &vbr->status, sizeof(vbr->status));
sgs[num_out + num_in++] = &status;
 
+   /* we only care about fields valid for virtio-1 */
+   vbr->out_hdr.type = cpu_to_virtio_u32(vq->vdev, vbr->out_hdr.type);
+   vbr->out_hdr.sector = cpu_to_virtio_u64(vq->vdev, vbr->out_hdr.sector);
+
return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
 }
 
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 04/11] virtio_ring: implement endian reversal based on VERSION_1 feature.

2014-10-07 Thread Cornelia Huck

From: Rusty Russell 

[Cornelia Huck: we don't need the vq->vring.num -> vq->ring_mask change]
Signed-off-by: Rusty Russell 
Signed-off-by: Cornelia Huck 
---
 drivers/virtio/virtio_ring.c |  195 ++
 1 file changed, 138 insertions(+), 57 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1cfb5ba..350c39b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -145,42 +145,54 @@ static inline int vring_add_indirect(struct 
vring_virtqueue *vq,
i = 0;
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = next(sg, &total_out)) {
-   desc[i].flags = VRING_DESC_F_NEXT;
-   desc[i].addr = sg_phys(sg);
-   desc[i].len = sg->length;
-   desc[i].next = i+1;
+   desc[i].flags = cpu_to_virtio_u16(vq->vq.vdev,
+ VRING_DESC_F_NEXT);
+   desc[i].addr = cpu_to_virtio_u64(vq->vq.vdev,
+sg_phys(sg));
+   desc[i].len = cpu_to_virtio_u32(vq->vq.vdev,
+   sg->length);
+   desc[i].next = cpu_to_virtio_u16(vq->vq.vdev,
+i+1);
i++;
}
}
for (; n < (out_sgs + in_sgs); n++) {
for (sg = sgs[n]; sg; sg = next(sg, &total_in)) {
-   desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
-   desc[i].addr = sg_phys(sg);
-   desc[i].len = sg->length;
-   desc[i].next = i+1;
+   desc[i].flags = cpu_to_virtio_u16(vq->vq.vdev,
+ VRING_DESC_F_NEXT|
+ VRING_DESC_F_WRITE);
+   desc[i].addr = cpu_to_virtio_u64(vq->vq.vdev,
+sg_phys(sg));
+   desc[i].len = cpu_to_virtio_u32(vq->vq.vdev,
+   sg->length);
+   desc[i].next = cpu_to_virtio_u16(vq->vq.vdev, i+1);
i++;
}
}
-   BUG_ON(i != total_sg);
 
/* Last one doesn't continue. */
-   desc[i-1].flags &= ~VRING_DESC_F_NEXT;
+   desc[i-1].flags &= ~cpu_to_virtio_u16(vq->vq.vdev, VRING_DESC_F_NEXT);
desc[i-1].next = 0;
 
-   /* We're about to use a buffer */
-   vq->vq.num_free--;
-
/* Use a single buffer which doesn't continue */
head = vq->free_head;
-   vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
-   vq->vring.desc[head].addr = virt_to_phys(desc);
+   vq->vring.desc[head].flags =
+   cpu_to_virtio_u16(vq->vq.vdev, VRING_DESC_F_INDIRECT);
+   vq->vring.desc[head].addr =
+   cpu_to_virtio_u64(vq->vq.vdev, virt_to_phys(desc));
/* kmemleak gives a false positive, as it's hidden by virt_to_phys */
kmemleak_ignore(desc);
-   vq->vring.desc[head].len = i * sizeof(struct vring_desc);
+   vq->vring.desc[head].len =
+   cpu_to_virtio_u32(vq->vq.vdev, i * sizeof(struct vring_desc));
 
-   /* Update free pointer */
+   BUG_ON(i != total_sg);
+
+   /* Update free pointer (we store this in native endian) */
vq->free_head = vq->vring.desc[head].next;
 
+   /* We've just used a buffer */
+   vq->vq.num_free--;
+
return head;
 }
 
@@ -199,6 +211,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sg;
unsigned int i, n, avail, uninitialized_var(prev), total_sg;
int head;
+   u16 nexti;
 
START_USE(vq);
 
@@ -253,26 +266,46 @@ static inline int virtqueue_add(struct virtqueue *_vq,
vq->vq.num_free -= total_sg;
 
head = i = vq->free_head;
+
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = next(sg, &total_out)) {
-   vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
-   vq->vring.desc[i].addr = sg_phys(sg);
-   vq->vring.desc[i].len = sg->length;
+   vq->vring.desc[i].flags =
+   cpu_to_virtio_u16(vq->vq.vdev,
+ VRING_DESC_F_NEXT);
+   vq->vring.desc[i].addr =
+   cpu_to_virtio_u64(vq->vq.vdev, sg_phys(sg));
+   vq->vring.desc[i].len =
+   cpu_to_virtio_u32(vq->vq.vdev, sg->length);
+
+   /* We chained .next in native: fix endian. */
+   nexti = vq->vring.d

[Qemu-devel] [PATCH RFC 03/11] virtio: support more feature bits

2014-10-07 Thread Cornelia Huck

With virtio-1, we support more than 32 feature bits. Let's make
vdev->guest_features depend on the number of supported feature bits,
allowing us to grow the feature bits automatically.

We also need to enhance the internal functions dealing with getting
and setting features with an additional index field, so that all feature
bits may be accessed (in chunks of 32 bits).

vhost and migration have been ignored for now.

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/9pfs/virtio-9p-device.c |7 ++-
 hw/block/virtio-blk.c  |9 +++--
 hw/char/virtio-serial-bus.c|9 +++--
 hw/net/virtio-net.c|   38 ++
 hw/s390x/s390-virtio-bus.c |9 +
 hw/s390x/virtio-ccw.c  |   17 ++---
 hw/scsi/vhost-scsi.c   |7 +--
 hw/scsi/virtio-scsi.c  |8 
 hw/virtio/dataplane/vring.c|   10 +-
 hw/virtio/virtio-balloon.c |8 ++--
 hw/virtio/virtio-bus.c |9 +
 hw/virtio/virtio-mmio.c|9 +
 hw/virtio/virtio-pci.c |   13 +++--
 hw/virtio/virtio-rng.c |2 +-
 hw/virtio/virtio.c |   29 +
 include/hw/virtio/virtio-bus.h |7 ---
 include/hw/virtio/virtio.h |   19 ++-
 17 files changed, 134 insertions(+), 76 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 2572747..c29c8c8 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -21,8 +21,13 @@
 #include "virtio-9p-coth.h"
 #include "hw/virtio/virtio-access.h"
 
-static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
+static uint32_t virtio_9p_get_features(VirtIODevice *vdev, unsigned int index,
+   uint32_t features)
 {
+if (index > 0) {
+return features;
+}
+
 features |= 1 << VIRTIO_9P_MOUNT_TAG;
 return features;
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 45e0c8f..5abc327 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -561,10 +561,15 @@ static void virtio_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 aio_context_release(bdrv_get_aio_context(s->bs));
 }
 
-static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features)
+static uint32_t virtio_blk_get_features(VirtIODevice *vdev, unsigned int index,
+uint32_t features)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
+if (index > 0) {
+return features;
+}
+
 features |= (1 << VIRTIO_BLK_F_SEG_MAX);
 features |= (1 << VIRTIO_BLK_F_GEOMETRY);
 features |= (1 << VIRTIO_BLK_F_TOPOLOGY);
@@ -597,7 +602,7 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 return;
 }
 
-features = vdev->guest_features;
+features = vdev->guest_features[0];
 
 /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send
  * cache flushes.  Thus, the "auto writethrough" behavior is never
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 3931085..0d843fe 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -75,7 +75,7 @@ static VirtIOSerialPort *find_port_by_name(char *name)
 static bool use_multiport(VirtIOSerial *vser)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(vser);
-return vdev->guest_features & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+return vdev->guest_features[0] & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
 }
 
 static size_t write_to_port(VirtIOSerialPort *port,
@@ -467,10 +467,15 @@ static void handle_input(VirtIODevice *vdev, VirtQueue 
*vq)
 {
 }
 
-static uint32_t get_features(VirtIODevice *vdev, uint32_t features)
+static uint32_t get_features(VirtIODevice *vdev, unsigned int index,
+ uint32_t features)
 {
 VirtIOSerial *vser;
 
+if (index > 0) {
+return features;
+}
+
 vser = VIRTIO_SERIAL(vdev);
 
 if (vser->bus.max_nr_ports > 1) {
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 2040eac..67f91c0 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -86,7 +86,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 
 memcpy(&netcfg, config, n->config_size);
 
-if (!(vdev->guest_features >> VIRTIO_NET_F_CTRL_MAC_ADDR & 1) &&
+if (!(vdev->guest_features[0] >> VIRTIO_NET_F_CTRL_MAC_ADDR & 1) &&
 memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
 memcpy(n->mac, netcfg.mac, ETH_ALEN);
 qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
@@ -305,7 +305,7 @@ static RxFilterInfo 
*virtio_net_query_rxfilter(NetClientState *nc)
 info->multicast_table = str_list;
 info->vlan_table = get_vlan_table(n);
 
-if (!((1 << VIRTIO_NET_F_CTRL_VLAN) & vdev->guest_features)) {
+if (!((1 << VIRTIO_NET_F_CTRL_VLAN) & vdev->gue

[Qemu-devel] [PATCH RFC 01/11] virtio: use u32, not bitmap for struct virtio_device's features

2014-10-07 Thread Cornelia Huck

From: Rusty Russell 

It seemed like a good idea, but it's actually a pain when we get more
than 32 feature bits.  Just change it to a u32 for now.

Cc: Brian Swetland 
Cc: Christian Borntraeger 
Signed-off-by: Rusty Russell 
Signed-off-by: Cornelia Huck 
Acked-by: Pawel Moll 
Acked-by: Ohad Ben-Cohen 
---
 drivers/char/virtio_console.c  |2 +-
 drivers/lguest/lguest_device.c |8 
 drivers/remoteproc/remoteproc_virtio.c |2 +-
 drivers/s390/kvm/kvm_virtio.c  |2 +-
 drivers/s390/kvm/virtio_ccw.c  |   23 +--
 drivers/virtio/virtio.c|   10 +-
 drivers/virtio/virtio_mmio.c   |8 ++--
 drivers/virtio/virtio_pci.c|3 +--
 drivers/virtio/virtio_ring.c   |2 +-
 include/linux/virtio.h |3 +--
 include/linux/virtio_config.h  |2 +-
 tools/virtio/linux/virtio.h|   22 +-
 tools/virtio/linux/virtio_config.h |2 +-
 tools/virtio/virtio_test.c |5 ++---
 tools/virtio/vringh_test.c |   16 
 15 files changed, 39 insertions(+), 71 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index b585b47..c4a437e 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -355,7 +355,7 @@ static inline bool use_multiport(struct ports_device 
*portdev)
 */
if (!portdev->vdev)
return 0;
-   return portdev->vdev->features[0] & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+   return portdev->vdev->features & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
 }
 
 static DEFINE_SPINLOCK(dma_bufs_lock);
diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index d0a1d8a..c831c47 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -137,14 +137,14 @@ static void lg_finalize_features(struct virtio_device 
*vdev)
vring_transport_features(vdev);
 
/*
-* The vdev->feature array is a Linux bitmask: this isn't the same as a
-* the simple array of bits used by lguest devices for features.  So we
-* do this slow, manual conversion which is completely general.
+* Since lguest is currently x86-only, we're little-endian.  That
+* means we could just memcpy.  But it's not time critical, and in
+* case someone copies this code, we do it the slow, obvious way.
 */
memset(out_features, 0, desc->feature_len);
bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
for (i = 0; i < bits; i++) {
-   if (test_bit(i, vdev->features))
+   if (vdev->features & (1 << i))
out_features[i / 8] |= (1 << (i % 8));
}
 
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index a34b506..dafaf38 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -231,7 +231,7 @@ static void rproc_virtio_finalize_features(struct 
virtio_device *vdev)
 * Remember the finalized features of our vdev, and provide it
 * to the remote processor once it is powered on.
 */
-   rsc->gfeatures = vdev->features[0];
+   rsc->gfeatures = vdev->features;
 }
 
 static void rproc_virtio_get(struct virtio_device *vdev, unsigned offset,
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index a134965..d747ca4 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -106,7 +106,7 @@ static void kvm_finalize_features(struct virtio_device 
*vdev)
memset(out_features, 0, desc->feature_len);
bits = min_t(unsigned, desc->feature_len, sizeof(vdev->features)) * 8;
for (i = 0; i < bits; i++) {
-   if (test_bit(i, vdev->features))
+   if (vdev->features & (1 << i))
out_features[i / 8] |= (1 << (i % 8));
}
 }
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index d2c0b44..c5acd19 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -701,7 +701,6 @@ static void virtio_ccw_finalize_features(struct 
virtio_device *vdev)
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
struct virtio_feature_desc *features;
-   int i;
struct ccw1 *ccw;
 
ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
@@ -715,19 +714,15 @@ static void virtio_ccw_finalize_features(struct 
virtio_device *vdev)
/* Give virtio_ring a chance to accept features. */
vring_transport_features(vdev);
 
-   for (i = 0; i < sizeof(*vdev->features) / sizeof(features->features);
-i++) {
-   int highbits = i % 2 ? 32 : 0;
-   features->index = i;
-   features->features = cpu_to_le32(vdev->features[i / 2]
-

[Qemu-devel] [PATCH RFC 08/11] s390x/css: Add a callback for when subchannel gets disabled

2014-10-07 Thread Cornelia Huck

From: Thomas Huth 

We need a possibility to run code when a subchannel gets disabled.
This patch adds the necessary infrastructure.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/css.c |   12 
 hw/s390x/css.h |1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index b67c039..735ec55 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -588,6 +588,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 {
 SCSW *s = &sch->curr_status.scsw;
 PMCW *p = &sch->curr_status.pmcw;
+uint16_t oldflags;
 int ret;
 SCHIB schib;
 
@@ -610,6 +611,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 copy_schib_from_guest(&schib, orig_schib);
 /* Only update the program-modifiable fields. */
 p->intparm = schib.pmcw.intparm;
+oldflags = p->flags;
 p->flags &= ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA |
   PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME |
   PMCW_FLAGS_MASK_MP);
@@ -625,6 +627,12 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 (PMCW_CHARS_MASK_MBFC | PMCW_CHARS_MASK_CSENSE);
 sch->curr_status.mba = schib.mba;
 
+/* Has the channel been disabled? */
+if (sch->disable_cb && (oldflags & PMCW_FLAGS_MASK_ENA) != 0
+&& (p->flags & PMCW_FLAGS_MASK_ENA) == 0) {
+sch->disable_cb(sch);
+}
+
 ret = 0;
 
 out:
@@ -1443,6 +1451,10 @@ void css_reset_sch(SubchDev *sch)
 {
 PMCW *p = &sch->curr_status.pmcw;
 
+if ((p->flags & PMCW_FLAGS_MASK_ENA) != 0 && sch->disable_cb) {
+sch->disable_cb(sch);
+}
+
 p->intparm = 0;
 p->flags &= ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA |
   PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME |
diff --git a/hw/s390x/css.h b/hw/s390x/css.h
index 33104ac..7fa807b 100644
--- a/hw/s390x/css.h
+++ b/hw/s390x/css.h
@@ -81,6 +81,7 @@ struct SubchDev {
 uint8_t ccw_no_data_cnt;
 /* transport-provided data: */
 int (*ccw_cb) (SubchDev *, CCW1);
+void (*disable_cb)(SubchDev *);
 SenseId id;
 void *driver_data;
 };
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 03/11] virtio: endianess conversion helpers

2014-10-07 Thread Cornelia Huck

Provide helper functions that convert from/to LE for virtio devices that
are not operating in legacy mode. We check for the VERSION_1 feature bit
to determine that.

Based on original patches by Rusty Russell and Thomas Huth.

Reviewed-by: David Hildenbrand 
Signed-off-by: Cornelia Huck 
---
 drivers/virtio/virtio.c|4 
 include/linux/virtio.h |   40 
 include/uapi/linux/virtio_config.h |3 +++
 3 files changed, 47 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index cfd5d00..8f74cd6 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -144,6 +144,10 @@ static int virtio_dev_probe(struct device *_d)
if (device_features & (1ULL << i))
dev->features |= (1ULL << i);
 
+   /* Version 1.0 compliant devices set the VIRTIO_F_VERSION_1 bit */
+   if (device_features & (1ULL << VIRTIO_F_VERSION_1))
+   dev->features |= (1ULL << VIRTIO_F_VERSION_1);
+
dev->config->finalize_features(dev);
 
err = drv->probe(dev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index a24b41f..68cadd4 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * virtqueue - a queue to register buffers for sending or receiving.
@@ -102,6 +103,11 @@ static inline struct virtio_device *dev_to_virtio(struct 
device *_dev)
return container_of(_dev, struct virtio_device, dev);
 }
 
+static inline bool virtio_device_legacy(const struct virtio_device *dev)
+{
+   return !(dev->features & (1ULL << VIRTIO_F_VERSION_1));
+}
+
 int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 
@@ -149,4 +155,38 @@ void unregister_virtio_driver(struct virtio_driver *drv);
 #define module_virtio_driver(__virtio_driver) \
module_driver(__virtio_driver, register_virtio_driver, \
unregister_virtio_driver)
+
+/*
+ * v1.0 specifies LE headers, legacy was native endian. Therefore, we must
+ * convert from/to LE if and only if vdev is not legacy.
+ */
+static inline u16 virtio_to_cpu_u16(const struct virtio_device *vdev, u16 v)
+{
+   return virtio_device_legacy(vdev) ? v : le16_to_cpu(v);
+}
+
+static inline u32 virtio_to_cpu_u32(const struct virtio_device *vdev, u32 v)
+{
+   return virtio_device_legacy(vdev) ? v : le32_to_cpu(v);
+}
+
+static inline u64 virtio_to_cpu_u64(const struct virtio_device *vdev, u64 v)
+{
+   return virtio_device_legacy(vdev) ? v : le64_to_cpu(v);
+}
+
+static inline u16 cpu_to_virtio_u16(const struct virtio_device *vdev, u16 v)
+{
+   return virtio_device_legacy(vdev) ? v : cpu_to_le16(v);
+}
+
+static inline u32 cpu_to_virtio_u32(const struct virtio_device *vdev, u32 v)
+{
+   return virtio_device_legacy(vdev) ? v : cpu_to_le32(v);
+}
+
+static inline u64 cpu_to_virtio_u64(const struct virtio_device *vdev, u64 v)
+{
+   return virtio_device_legacy(vdev) ? v : cpu_to_le64(v);
+}
 #endif /* _LINUX_VIRTIO_H */
diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 3ce768c..80e7381 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -54,4 +54,7 @@
 /* Can the device handle any descriptor layout? */
 #define VIRTIO_F_ANY_LAYOUT27
 
+/* v1.0 compliant. */
+#define VIRTIO_F_VERSION_1 32
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 10/11] KVM: s390: virtio-ccw revision 1 SET_VQ

2014-10-07 Thread Cornelia Huck

The CCW_CMD_SET_VQ command has a different format for revision 1+
devices, allowing to specify a more complex virtqueue layout. For
now, we stay however with the old layout and simply use the new
command format for virtio-1 devices.

Signed-off-by: Cornelia Huck 
---
 drivers/s390/kvm/virtio_ccw.c |   54 -
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index cbe2ba8..f97d3fb 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -68,13 +68,22 @@ struct virtio_ccw_device {
void *airq_info;
 };
 
-struct vq_info_block {
+struct vq_info_block_legacy {
__u64 queue;
__u32 align;
__u16 index;
__u16 num;
 } __packed;
 
+struct vq_info_block {
+   __u64 desc;
+   __u32 res0;
+   __u16 index;
+   __u16 num;
+   __u64 avail;
+   __u64 used;
+} __packed;
+
 struct virtio_feature_desc {
__u32 features;
__u8 index;
@@ -100,7 +109,10 @@ struct virtio_ccw_vq_info {
struct virtqueue *vq;
int num;
void *queue;
-   struct vq_info_block *info_block;
+   union {
+   struct vq_info_block s;
+   struct vq_info_block_legacy l;
+   } *info_block;
int bit_nr;
struct list_head node;
long cookie;
@@ -411,13 +423,22 @@ static void virtio_ccw_del_vq(struct virtqueue *vq, 
struct ccw1 *ccw)
spin_unlock_irqrestore(&vcdev->lock, flags);
 
/* Release from host. */
-   info->info_block->queue = 0;
-   info->info_block->align = 0;
-   info->info_block->index = index;
-   info->info_block->num = 0;
+   if (vcdev->revision == 0) {
+   info->info_block->l.queue = 0;
+   info->info_block->l.align = 0;
+   info->info_block->l.index = index;
+   info->info_block->l.num = 0;
+   ccw->count = sizeof(info->info_block->l);
+   } else {
+   info->info_block->s.desc = 0;
+   info->info_block->s.index = index;
+   info->info_block->s.num = 0;
+   info->info_block->s.avail = 0;
+   info->info_block->s.used = 0;
+   ccw->count = sizeof(info->info_block->s);
+   }
ccw->cmd_code = CCW_CMD_SET_VQ;
ccw->flags = 0;
-   ccw->count = sizeof(*info->info_block);
ccw->cda = (__u32)(unsigned long)(info->info_block);
ret = ccw_io_helper(vcdev, ccw,
VIRTIO_CCW_DOING_SET_VQ | index);
@@ -500,13 +521,22 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
}
 
/* Register it with the host. */
-   info->info_block->queue = (__u64)info->queue;
-   info->info_block->align = KVM_VIRTIO_CCW_RING_ALIGN;
-   info->info_block->index = i;
-   info->info_block->num = info->num;
+   if (vcdev->revision == 0) {
+   info->info_block->l.queue = (__u64)info->queue;
+   info->info_block->l.align = KVM_VIRTIO_CCW_RING_ALIGN;
+   info->info_block->l.index = i;
+   info->info_block->l.num = info->num;
+   ccw->count = sizeof(info->info_block->l);
+   } else {
+   info->info_block->s.desc = (__u64)info->queue;
+   info->info_block->s.index = i;
+   info->info_block->s.num = info->num;
+   info->info_block->s.avail = (__u64)virtqueue_get_avail(vq);
+   info->info_block->s.used = (__u64)virtqueue_get_used(vq);
+   ccw->count = sizeof(info->info_block->s);
+   }
ccw->cmd_code = CCW_CMD_SET_VQ;
ccw->flags = 0;
-   ccw->count = sizeof(*info->info_block);
ccw->cda = (__u32)(unsigned long)(info->info_block);
err = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_SET_VQ | i);
if (err) {
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC 09/11] s390x/virtio-ccw: add virtio set-revision call

2014-10-07 Thread Cornelia Huck

From: Thomas Huth 

Handle the virtio-ccw revision according to what the guest sets.
When revision 1 is selected, we have a virtio-1 standard device
with byteswapping for the virtio rings.

When a channel gets disabled, we have to revert to the legacy behavior
in case the next user of the device does not negotiate the revision 1
anymore (e.g. the boot firmware uses revision 1, but the operating
system only uses the legacy mode).

Note that revisions > 0 are still disabled; but we still extend the
feature bit size to be able to handle the VERSION_1 bit.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c |   54 +
 hw/s390x/virtio-ccw.h |7 ++-
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 69add47..0d414f6 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -20,9 +20,11 @@
 #include "hw/virtio/virtio-net.h"
 #include "hw/sysbus.h"
 #include "qemu/bitops.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/s390x/adapter.h"
 #include "hw/s390x/s390_flic.h"
+#include "linux/virtio_config.h"
 
 #include "ioinst.h"
 #include "css.h"
@@ -260,6 +262,12 @@ typedef struct VirtioThinintInfo {
 uint8_t isc;
 } QEMU_PACKED VirtioThinintInfo;
 
+typedef struct VirtioRevInfo {
+uint16_t revision;
+uint16_t length;
+uint8_t data[0];
+} QEMU_PACKED VirtioRevInfo;
+
 /* Specify where the virtqueues for the subchannel are in guest memory. */
 static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
   uint16_t index, uint16_t num)
@@ -299,6 +307,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 {
 int ret;
 VqInfoBlock info;
+VirtioRevInfo revinfo;
 uint8_t status;
 VirtioFeatDesc features;
 void *config;
@@ -373,6 +382,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cda + sizeof(features.features));
 if (features.index < ARRAY_SIZE(dev->host_features)) {
 features.features = dev->host_features[features.index];
+/*
+ * Don't offer version 1 to the guest if it did not
+ * negotiate at least revision 1.
+ */
+if (features.index == 1 && dev->revision <= 0) {
+features.features &= ~(1 << (VIRTIO_F_VERSION_1 - 32));
+}
 } else {
 /* Return zeroes if the guest supports more feature bits. */
 features.features = 0;
@@ -400,6 +416,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cda + sizeof(features.features));
 features.features = ldl_le_phys(&address_space_memory, ccw.cda);
 if (features.index < ARRAY_SIZE(vdev->guest_features)) {
+/*
+ * The guest should not set version 1 if it didn't
+ * negotiate a revision >= 1.
+ */
+if (features.index == 1 && dev->revision <= 0) {
+features.features &= ~(1 << (VIRTIO_F_VERSION_1 - 32));
+}
 virtio_set_features(vdev, features.index, features.features);
 } else {
 /*
@@ -600,6 +623,25 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 }
 break;
+case CCW_CMD_SET_VIRTIO_REV:
+len = sizeof(revinfo);
+if (ccw.count < len || (check_len && ccw.count > len)) {
+ret = -EINVAL;
+break;
+}
+if (!ccw.cda) {
+ret = -EFAULT;
+break;
+}
+cpu_physical_memory_read(ccw.cda, &revinfo, len);
+if (dev->revision >= 0 ||
+revinfo.revision > VIRTIO_CCW_REV_MAX) {
+ret = -ENOSYS;
+break;
+}
+ret = 0;
+dev->revision = revinfo.revision;
+break;
 default:
 ret = -ENOSYS;
 break;
@@ -607,6 +649,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 return ret;
 }
 
+static void virtio_sch_disable_cb(SubchDev *sch)
+{
+VirtioCcwDevice *dev = sch->driver_data;
+
+dev->revision = -1;
+}
+
 static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev)
 {
 unsigned int cssid = 0;
@@ -733,6 +782,7 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 css_sch_build_virtual_schib(sch, 0, VIRTIO_CCW_CHPID_TYPE);
 
 sch->ccw_cb = virtio_ccw_cb;
+sch->disable_cb = virtio_sch_disable_cb;
 
 /* Build senseid data. */
 memset(&sch->id, 0, sizeof(SenseId));
@@ -740,6 +790,8 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 sch->id.cu_type = VIRTIO_CCW_CU_TYPE;
 sch->id.cu_model = vdev->device_id;
 
+dev->revision = -1;
+

Re: [Qemu-devel] [PATCH v5 09/33] target-arm: add macros to access banked registers

2014-10-07 Thread Greg Bellows

Converted in v6

On 7 October 2014 01:54, Peter Maydell  wrote:

> On 7 October 2014 05:02, Greg Bellows  wrote:
> > Right, we need the macros to do string concatenation so they have to be
> > macros.  That combination occurs 3 times from a quick look.  I agree
> that it
> > may be cumbersome to try and invent a name.
> >
> > Anything to do on this?
>
> Make USE_SECURE_REG into an inline function (with a
> decapitalised name), leave the rest.
>
> -- PMM
>

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Dr. David Alan Gilbert

* Linus Torvalds (torva...@linux-foundation.org) wrote:
> On Mon, Oct 6, 2014 at 12:41 PM, Andrea Arcangeli  wrote:
> >
> > Of course if somebody has better ideas on how to resolve an anonymous
> > userfault they're welcome.
> 
> So I'd *much* rather have a "write()" style interface (ie _copying_
> bytes from user space into a newly allocated page that gets mapped)
> than a "remap page" style interface

Something like that might work for the postcopy case; it doesn't work
for some of the other uses that need to stop a page being changed by the
guest, but then need to somehow get a copy of that page internally to QEMU,
and perhaps provide it back later.  remap_anon_pages worked for those cases
as well; I can't think of another current way of doing it in userspace.

I'm thinking here of systems for making VMs with memory larger than a single
host; that's something that's not as well thought out.  I've also seen people
writing emulation that want to trap and emulate some page accesses while
still having the original data available to the emulator itself.

So yes, OK for now, but the result is less general.

Dave

> remapping anonymous pages involves page table games that really aren't
> necessarily a good idea, and tlb invalidates for the old page etc.
> Just don't do it.
> 
>Linus
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Dr. David Alan Gilbert

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> Il 07/10/2014 19:07, Dr. David Alan Gilbert ha scritto:
> >> > 
> >> > So I'd *much* rather have a "write()" style interface (ie _copying_
> >> > bytes from user space into a newly allocated page that gets mapped)
> >> > than a "remap page" style interface
> > Something like that might work for the postcopy case; it doesn't work
> > for some of the other uses that need to stop a page being changed by the
> > guest, but then need to somehow get a copy of that page internally to QEMU,
> > and perhaps provide it back later.
> 
> I cannot parse this.  Which uses do you have in mind?  Is it for
> QEMU-specific or is it for other applications of userfaults?

> As long as the page is atomically mapped, I'm not sure what the
> difference from remap_anon_pages are (as far as the destination page is
> concerned).  Are you thinking of having userfaults enabled on the source
> as well?

What I'm talking about here is when I want to stop a page being accessed by the
guest, do something with the data in qemu, and give it back to the guest 
sometime
later.

The main example is: Memory pools for guests where you swap RAM between a 
series of
VM hosts.  You have to take the page out, send it over the wire, sometime later 
if
the guest tries to access it you userfault and pull it back.
(There's at least one or two companies selling something like this, and at least
one Linux based implementations with their own much more involved kernel hacks)

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Paolo Bonzini

Il 07/10/2014 19:07, Dr. David Alan Gilbert ha scritto:
>> > 
>> > So I'd *much* rather have a "write()" style interface (ie _copying_
>> > bytes from user space into a newly allocated page that gets mapped)
>> > than a "remap page" style interface
> Something like that might work for the postcopy case; it doesn't work
> for some of the other uses that need to stop a page being changed by the
> guest, but then need to somehow get a copy of that page internally to QEMU,
> and perhaps provide it back later.

I cannot parse this.  Which uses do you have in mind?  Is it for
QEMU-specific or is it for other applications of userfaults?

As long as the page is atomically mapped, I'm not sure what the
difference from remap_anon_pages are (as far as the destination page is
concerned).  Are you thinking of having userfaults enabled on the source
as well?

Paolo

> remap_anon_pages worked for those cases
> as well; I can't think of another current way of doing it in userspace.

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Linus Torvalds

On Tue, Oct 7, 2014 at 10:19 AM, Andrea Arcangeli  wrote:
>
> I see what you mean. The only cons I see is that we couldn't use then
> recv(tmp_addr, PAGE_SIZE), remap_anon_pages(faultaddr, tmp_addr,
> PAGE_SIZE, ..)  and retain the zerocopy behavior. Or how could we?
> There's no recvfile(userfaultfd, socketfd, PAGE_SIZE).

You're doing completelt faulty math, and you haven't thought it through.

Your "zero-copy" case is no such thing. Who cares if some packet
receive is zero-copy, when you need to set up the anonymous page to
*receive* the zero copy into, which involves page allocation, page
zeroing, page table setup with VM and page table locking, etc etc.

The thing is, the whole concept of "zero-copy" is pure and utter
bullshit. Sun made a big deal about the whole concept back in the
nineties, and IT DOES NOT WORK. It's a scam. Don't buy into it. It's
crap. It's made-up and not real.

Then, once you've allocated and cleared the page, mapped it in, your
"zero-copy" model involves looking up the page in the page tables
again (locking etc), then doing that zero-copy to the page. Then, when
you remap it, you look it up in the page tables AGAIN, with locking,
move it around, have to clear the old page table entry (which involves
a locked cmpxchg64), a TLB flush with most likely a cross-CPU IPI -
since the people who do this are all threaded and want many CPU's, and
then you insert the page into the new place.

That's *insane*. It's crap. All just to try to avoid one page copy.

Don't do it. remapping games really are complete BS. They never beat
just copying the data. It's that simple.

> As things stands now, I'm afraid with a write() syscall we couldn't do
> it zerocopy.

Really, you need to rethink your whole "zerocopy" model. It's broken.
Nobody sane cares. You've bought into a model that Sun already showed
doesn't work.

The only time zero-copy works is in random benchmarks that are very
careful to not touch the data at all at any point, and also try to
make sure that the benchmark is very much single-threaded so that you
never have the whole cross-CPU IPI issue for the TLB invalidate. Then,
and only then, can zero-copy win. And it's just not a realistic
scenario.

> If it wasn't for the TLB flush of the old page, the remap_anon_pages
> variant would be more optimal than doing a copy through a write
> syscall. Is the copy cheaper than a TLB flush? I probably naively
> assumed the TLB flush was always cheaper.

A local TLB flush is cheap. That's not the problem. The problem is the
setup of the page, and the clearing of the page, and the cross-CPU TLB
flush. And the page table locking, etc etc.

So no, I'm not AT ALL worried about a single "invlpg" instruction.
That's nothing. Local CPU TLB flushes of single pages are basically
free. But that really isn't where the costs are.

Quite frankly, the *only* time page remapping has ever made sense is
when it is used for realloc() kind of purposes, where you need to
remap pages not because of zero-copy, but because you need to change
the virtual address space layout. And then you make sure it's not a
common operation, because you're not doing it as a performance win,
you're doing it because you're changing your virtual layout.

Really. Zerocopy is for benchmarks, and for things like "splice()"
when you can avoid the page tables *entirely*. But the notion that
page remapping of user pages is faster than a copy is pure and utter
garbage. It's simply not true.

So I really think you should aim for a "write()": kind of interface.

With write, you may not get the zero-copy, but on the other hand it
allows you to re-use the source buffer many times without having to
allocate new pages and map it in etc. So a "read()+write()" loop (or,
quite commonly a "generate data computationally from a compressed
source + write()" loop) is actually much more efficient than the
zero-copy remapping, because you don't have all the complexities and
overheads in creating the source page.

It is possible that that could involve "splice()" too, although I
don't really think the source data tends to be in page-aligned chunks.
But hey, splice() at least *can* be faster than copying (and then we
have vmsplice() not because it's magically faster, but because it can
under certain circumstances be worth it, and it kind of made sense to
allow the interface, but I really don't think it's used very much or
very useful).

   Linus

[Qemu-devel] [Bug 1378407] [NEW] [feature request] Partition table wrapper for single-filesystem images

2014-10-07 Thread felix

Public bug reported:

Suppose you have a single filesystem image. It would be nice if QEMU
could generate a virtual partition table for it and make it available to
the guest as a partitioned disk. Otherwise you have to use workarounds
like this:
wiki.archlinux.org/index.php/QEMU#Simulate_virtual_disk_with_MBR_using_linear_RAID

It should be relatively easy to do on top of existing vvfat code.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1378407

Title:
  [feature request] Partition table wrapper for single-filesystem images

Status in QEMU:
  New

Bug description:
  Suppose you have a single filesystem image. It would be nice if QEMU
  could generate a virtual partition table for it and make it available
  to the guest as a partitioned disk. Otherwise you have to use
  workarounds like this:
  
wiki.archlinux.org/index.php/QEMU#Simulate_virtual_disk_with_MBR_using_linear_RAID

  It should be relatively easy to do on top of existing vvfat code.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1378407/+subscriptions

Re: [Qemu-devel] [PATCH v6 16/24] hw: Convert from BlockDriverState to BlockBackend, mostly

2014-10-07 Thread Max Reitz


On 07.10.2014 13:59, Markus Armbruster wrote:

Device models should access their block backends only through the
block-backend.h API.  Convert them, and drop direct includes of
inappropriate headers.

Just four uses of BlockDriverState are left:

* The Xen paravirtual block device backend (xen_disk.c) opens images
   itself when set up via xenbus, bypassing blockdev.c.  I figure it
   should go through qmp_blockdev_add() instead.

* Device model "usb-storage" prompts for keys.  No other device model
   does, and this one probably shouldn't do it, either.

* ide_issue_trim_cb() uses bdrv_aio_discard() instead of
   blk_aio_discard() because it fishes its backend out of a BlockAIOCB,
   which has only the BlockDriverState.

* PC87312State has an unused BlockDriverState[] member.

The next two commits take care of the latter two.

Signed-off-by: Markus Armbruster 
---
  block/block-backend.c| 262 +++
  blockdev.c   |  12 +-
  dma-helpers.c|  61 +++
  hw/arm/collie.c  |   5 +-
  hw/arm/gumstix.c |   5 +-
  hw/arm/highbank.c|   2 +-
  hw/arm/mainstone.c   |   2 +-
  hw/arm/musicpal.c|  10 +-
  hw/arm/nseries.c |   3 +-
  hw/arm/omap1.c   |   2 +-
  hw/arm/omap2.c   |   2 +-
  hw/arm/omap_sx1.c|   5 +-
  hw/arm/pxa2xx.c  |   4 +-
  hw/arm/realview.c|   2 +-
  hw/arm/spitz.c   |   4 +-
  hw/arm/tosa.c|   3 +-
  hw/arm/versatilepb.c |   3 +-
  hw/arm/vexpress.c|   3 +-
  hw/arm/virt.c|   2 +-
  hw/arm/xilinx_zynq.c |   3 +-
  hw/arm/z2.c  |   3 +-
  hw/block/block.c |   7 +-
  hw/block/dataplane/virtio-blk.c  |  24 +--
  hw/block/fdc.c   |  78 +
  hw/block/hd-geometry.c   |  24 +--
  hw/block/m25p80.c|  28 ++--
  hw/block/nand.c  |  50 +++---
  hw/block/nvme.c  |  19 +--
  hw/block/onenand.c   |  67 
  hw/block/pflash_cfi01.c  |  24 +--
  hw/block/pflash_cfi02.c  |  24 +--
  hw/block/virtio-blk.c|  95 +--
  hw/block/xen_disk.c  |  83 +-
  hw/core/qdev-properties-system.c |  26 +--
  hw/core/qdev-properties.c|   2 +-
  hw/cris/axis_dev88.c |   3 +-
  hw/display/tc6393xb.c|   2 +-
  hw/i386/pc.c |   2 +-
  hw/i386/pc_piix.c|   2 +-
  hw/i386/pc_sysfw.c   |   9 +-
  hw/i386/xen/xen_platform.c   |   5 +-
  hw/ide/ahci.c|  31 ++--
  hw/ide/atapi.c   |  33 ++--
  hw/ide/cmd646.c  |   2 +-
  hw/ide/core.c| 184 --
  hw/ide/ich.c |   2 +-
  hw/ide/internal.h|   6 +-
  hw/ide/isa.c |   2 +-
  hw/ide/macio.c   |  50 +++---
  hw/ide/microdrive.c  |   4 +-
  hw/ide/mmio.c|   2 +-
  hw/ide/pci.c |   4 +-
  hw/ide/piix.c|   8 +-
  hw/ide/qdev.c|  11 +-
  hw/ide/via.c |   2 +-
  hw/isa/pc87312.c |   4 +-
  hw/lm32/lm32_boards.c|   5 +-
  hw/lm32/milkymist.c  |   3 +-
  hw/microblaze/petalogix_ml605_mmu.c  |   3 +-
  hw/microblaze/petalogix_s3adsp1800_mmu.c |   3 +-
  hw/mips/mips_fulong2e.c  |   2 +-
  hw/mips/mips_jazz.c  |   2 +-
  hw/mips/mips_malta.c |   6 +-
  hw/mips/mips_r4k.c   |   3 +-
  hw/nvram/spapr_nvram.c   |  17 +-
  hw/pci/pci-hotplug-old.c |   5 +-
  hw/ppc/mac_newworld.c|   2 +-
  hw/ppc/mac_oldworld.c|   2 +-
  hw/ppc/ppc405_boards.c   |  26 +--
  hw/ppc/prep.c|   2 +-
  hw/ppc/spapr.c   |   4 +-
  hw/ppc/virtex_ml507.c|   3 +-
  hw/s390x/s390-virtio-bus.c   |   2 +-
  hw/s390x/s390-virtio.c   |   2 +-
  hw/s390x/virtio-ccw.c|   2 +-
  hw/scsi/megasas.c|  15 +-
  hw/scsi/scsi-bus.c   |  12 +-
  hw/scsi/s

Re: [Qemu-devel] [PATCH v6 07/24] blockdev: Eliminate drive_del()

2014-10-07 Thread Max Reitz


On 07.10.2014 13:59, Markus Armbruster wrote:

drive_del() has become a trivial wrapper around blk_unref().  Get rid
of it.

Signed-off-by: Markus Armbruster 
---
  blockdev.c| 9 ++---
  device-hotplug.c  | 3 ++-
  hw/ide/piix.c | 4 +++-
  include/sysemu/blockdev.h | 1 -
  4 files changed, 7 insertions(+), 10 deletions(-)


Reviewed-by: Max Reitz

Re: [Qemu-devel] [PATCH v6 04/24] block: Connect BlockBackend and DriveInfo

2014-10-07 Thread Max Reitz


On 07.10.2014 13:59, Markus Armbruster wrote:

Make the BlockBackend own the DriveInfo.  Change blockdev_init() to
return the BlockBackend instead of the DriveInfo.

Signed-off-by: Markus Armbruster 
---
  block.c   |  2 --
  block/block-backend.c | 38 
  blockdev.c| 73 ---
  include/sysemu/blockdev.h |  4 +++
  4 files changed, 79 insertions(+), 38 deletions(-)


Reviewed-by: Max Reitz

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Andy Lutomirski

On Tue, Oct 7, 2014 at 8:52 AM, Andrea Arcangeli  wrote:
> On Tue, Oct 07, 2014 at 04:19:13PM +0200, Andrea Arcangeli wrote:
>> mremap like interface, or file+commands protocol interface. I tend to
>> like mremap more, that's why I opted for a remap_anon_pages syscall
>> kept orthogonal to the userfaultfd functionality (remap_anon_pages
>> could be also used standalone as an accelerated mremap in some
>> circumstances) but nothing prevents to just embed the same mechanism
>
> Sorry for the self followup, but something else comes to mind to
> elaborate this further.
>
> In term of interfaces, the most efficient I could think of to minimize
> the enter/exit kernel, would be to append the "source address" of the
> data received from the network transport, to the userfaultfd_write()
> command (by appending 8 bytes to the wakeup command). Said that,
> mixing the mechanism to be notified about userfaults with the
> mechanism to resolve an userfault to me looks a complication. I kind
> of liked to keep the userfaultfd protocol is very simple and doing
> just its thing. The userfaultfd doesn't need to know how the userfault
> was resolved, even mremap would work theoretically (until we run out
> of vmas). I thought it was simpler to keep it that way. However if we
> want to resolve the fault with a "write()" syscall this may be the
> most efficient way to do it, as we're already doing a write() into the
> pseudofd to wakeup the page fault that contains the destination
> address, I just need to append the source address to the wakeup command.
>
> I probably grossly overestimated the benefits of resolving the
> userfault with a zerocopy page move, sorry. So if we entirely drop the
> zerocopy behavior and the TLB flush of the old page like you
> suggested, the way to keep the userfaultfd mechanism decoupled from
> the userfault resolution mechanism would be to implement an
> atomic-copy syscall. That would work for SIGBUS userfaults too without
> requiring a pseudofd then. It would be enough then to call
> mcopy_atomic(userfault_addr,tmp_addr,len) with the only constraints
> that len must be a multiple of PAGE_SIZE. Of course mcopy_atomic
> wouldn't page fault or call GUP into the destination address (it can't
> otherwise the in-flight partial copy would be visible to the process,
> breaking the atomicity of the copy), but it would fill in the
> pte/trans_huge_pmd with the same strict behavior that remap_anon_pages
> currently has (in turn it would by design bypass the VM_USERFAULT
> check and be ideal for resolving userfaults).

At the risk of asking a possibly useless question, would it make sense
to splice data into a userfaultfd?

--Andy

>
> mcopy_atomic could then be also extended to tmpfs and it would work
> without requiring the source page to be a tmpfs page too without
> having to convert page types on the fly.
>
> If I add mcopy_atomic, the patch in subject (10/17) can be dropped of
> course so it'd be even less intrusive than the current
> remap_anon_pages and it would require zero TLB flush during its
> runtime (it would just require an atomic copy).
>
> So should I try to embed a mcopy_atomic inside userfault_write or can
> I expose it to userland as a standalone new syscall? Or should I do
> something different? Comments?
>
> Thanks,
> Andrea



-- 
Andy Lutomirski
AMA Capital Management, LLC

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Andrea Arcangeli

On Tue, Oct 07, 2014 at 04:19:13PM +0200, Andrea Arcangeli wrote:
> mremap like interface, or file+commands protocol interface. I tend to
> like mremap more, that's why I opted for a remap_anon_pages syscall
> kept orthogonal to the userfaultfd functionality (remap_anon_pages
> could be also used standalone as an accelerated mremap in some
> circumstances) but nothing prevents to just embed the same mechanism

Sorry for the self followup, but something else comes to mind to
elaborate this further.

In term of interfaces, the most efficient I could think of to minimize
the enter/exit kernel, would be to append the "source address" of the
data received from the network transport, to the userfaultfd_write()
command (by appending 8 bytes to the wakeup command). Said that,
mixing the mechanism to be notified about userfaults with the
mechanism to resolve an userfault to me looks a complication. I kind
of liked to keep the userfaultfd protocol is very simple and doing
just its thing. The userfaultfd doesn't need to know how the userfault
was resolved, even mremap would work theoretically (until we run out
of vmas). I thought it was simpler to keep it that way. However if we
want to resolve the fault with a "write()" syscall this may be the
most efficient way to do it, as we're already doing a write() into the
pseudofd to wakeup the page fault that contains the destination
address, I just need to append the source address to the wakeup command.

I probably grossly overestimated the benefits of resolving the
userfault with a zerocopy page move, sorry. So if we entirely drop the
zerocopy behavior and the TLB flush of the old page like you
suggested, the way to keep the userfaultfd mechanism decoupled from
the userfault resolution mechanism would be to implement an
atomic-copy syscall. That would work for SIGBUS userfaults too without
requiring a pseudofd then. It would be enough then to call
mcopy_atomic(userfault_addr,tmp_addr,len) with the only constraints
that len must be a multiple of PAGE_SIZE. Of course mcopy_atomic
wouldn't page fault or call GUP into the destination address (it can't
otherwise the in-flight partial copy would be visible to the process,
breaking the atomicity of the copy), but it would fill in the
pte/trans_huge_pmd with the same strict behavior that remap_anon_pages
currently has (in turn it would by design bypass the VM_USERFAULT
check and be ideal for resolving userfaults).

mcopy_atomic could then be also extended to tmpfs and it would work
without requiring the source page to be a tmpfs page too without
having to convert page types on the fly.

If I add mcopy_atomic, the patch in subject (10/17) can be dropped of
course so it'd be even less intrusive than the current
remap_anon_pages and it would require zero TLB flush during its
runtime (it would just require an atomic copy).

So should I try to embed a mcopy_atomic inside userfault_write or can
I expose it to userland as a standalone new syscall? Or should I do
something different? Comments?

Thanks,
Andrea

Re: [Qemu-devel] [PATCH v6 03/24] block: Connect BlockBackend to BlockDriverState

2014-10-07 Thread Max Reitz


On 07.10.2014 13:59, Markus Armbruster wrote:

Convenience function blk_new_with_bs() creates a BlockBackend with its
BlockDriverState.  Callers have to unref both.  The commit after next
will relieve them of the need to unref the BlockDriverState.

Complication: due to the silly way drive_del works, we need a way to
hide a BlockBackend, just like bdrv_make_anon().  To emphasize its
"special" status, give the function a suitably off-putting name:
blk_hide_on_behalf_of_do_drive_del().  Unfortunately, hiding turns the
BlockBackend's name into the empty string.  Can't avoid that without
breaking the blk->bs->device_name equals blk->name invariant.

Signed-off-by: Markus Armbruster 
---
  block.c|  12 ++--
  block/block-backend.c  |  71 ++-
  blockdev.c |  19 +++
  hw/block/xen_disk.c|   8 +--
  include/block/block_int.h  |   2 +
  include/sysemu/block-backend.h |   5 ++
  qemu-img.c | 125 +++--
  qemu-io.c  |   4 +-
  qemu-nbd.c |   4 +-
  9 files changed, 156 insertions(+), 94 deletions(-)


Reviewed-by: Max Reitz

Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT

2014-10-07 Thread Kirill A. Shutemov

On Tue, Oct 07, 2014 at 03:24:58PM +0200, Andrea Arcangeli wrote:
> Hi Kirill,
> 
> On Tue, Oct 07, 2014 at 01:36:45PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > > userland touches a still unmapped virtual address, a sigbus signal is
> > > sent instead of allocating a new page. The sigbus signal handler will
> > > then resolve the page fault in userland by calling the
> > > remap_anon_pages syscall.
> > 
> > Hm. I wounder if this functionality really fits madvise(2) interface: as
> > far as I understand it, it provides a way to give a *hint* to kernel which
> > may or may not trigger an action from kernel side. I don't think an
> > application will behaive reasonably if kernel ignore the *advise* and will
> > not send SIGBUS, but allocate memory.
> > 
> > I would suggest to consider to use some other interface for the
> > functionality: a new syscall or, perhaps, mprotect().
> 
> I didn't feel like adding PROT_USERFAULT to mprotect, which looks
> hardwired to just these flags:

PROT_NOALLOC may be?

> 
>PROT_NONE  The memory cannot be accessed at all.
> 
>PROT_READ  The memory can be read.
> 
>PROT_WRITE The memory can be modified.
> 
>PROT_EXEC  The memory can be executed.

To be complete: PROT_GROWSDOWN, PROT_GROWSUP and unused PROT_SEM.

> So here somebody should comment and choose between:
> 
> 1) set VM_USERFAULT with mprotect(PROT_USERFAULT) instead of
>the current madvise(MADV_USERFAULT)
> 
> 2) drop MADV_USERFAULT and VM_USERFAULT and force the usage of the
>userfaultfd protocol as the only way for userland to catch
>userfaults (each userfaultfd must already register itself into its
>own virtual memory ranges so it's a trivial change for userfaultfd
>users that deletes just 1 or 2 lines of userland code, but it would
>prevent to use the SIGBUS behavior with info->si_addr=faultaddr for
>other users)
> 
> 3) keep things as they are now: use MADV_USERFAULT for SIGBUS
>userfaults, with optional intersection between the
>vm_flags&VM_USERFAULT ranges and the userfaultfd registered ranges
>with vma->vm_userfaultfd_ctx!=NULL to know if to engage the
>userfaultfd protocol instead of the plain SIGBUS

4) new syscall?
 
> I will update the code accordingly to feedback, so please comment.

I don't have strong points on this. Just *feel* it doesn't fit advice
semantics.

The only userspace interface I've designed was not proven good by time.
I would listen what senior maintainers say. :)
 
-- 
 Kirill A. Shutemov

Re: [Qemu-devel] [PATCH 10/11] block: let commit blockjob run in BDS AioContext

2014-10-07 Thread Max Reitz


On 06.10.2014 11:30, Stefan Hajnoczi wrote:

On Sat, Oct 04, 2014 at 11:28:22PM +0200, Max Reitz wrote:

On 01.10.2014 19:01, Stefan Hajnoczi wrote:

The commit block job must run in the BlockDriverState AioContext so that
it works with dataplane.

Acquire the AioContext in blockdev.c so starting the block job is safe.
One detail here is that the bdrv_drain_all() must be moved inside the
aio_context_acquire() region so requests cannot sneak in between the
drain and acquire.

Hm, I see the intent, but in patch 5 you said bdrv_drain_all() should never
be called outside of the main loop (at least that's how it appeared to me).
Wouldn't it be enough to use bdrv_drain() on the source BDS, like in patch
9?

There is no contradiction here because qmp_block_commit() is invoked by
the QEMU monitor from the main loop.

The problem with bdrv_drain_all() is that it acquires AioContexts.  If
called outside the main loop without taking special care, it could
result in lock ordering problems (e.g. two threads trying to acquire all
AioContexts at the same time while already holding their respective
contexts).

qmp_block_commit() is just traditional QEMU global mutex code so it is
allowed to call bdrv_drain_all().


Hm, okay then.


@@ -140,27 +173,14 @@ wait:
  ret = 0;
-if (!block_job_is_cancelled(&s->common) && sector_num == end) {
-/* success */
-ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
+out:
+if (buf) {
+qemu_vfree(buf);
  }

Is this new condition really necessary? However, it won't hurt, so:

This was a mistake.  Since commit
94c8ff3a01d9bd1005f066a0ee3fe43c842a43b7 ("w32: Make qemu_vfree() accept
NULL like the POSIX implementation") it is no longer necessary to check
for NULL pointers.

You can't teach an old dog new tricks :).

Thanks, will fix in the next revision!


Reviewed-by: Max Reitz 

A general question regarding the assertions here and in patch 8: I tried to
break them, but it couldn't find a way. The way I tried was by creating two
devices in different threads with just one qcow2 behind each of them, and
then trying to attach on of those qcow2 BDS to the other as a backing file.
I couldn't find out, how, but I guess this is something we might want to
support in the future. Can we actually be sure that all of the BDS in one
tree are always running in the same AIO context? Are we already enforcing
this?

bdrv_set_aio_context() is recursive so it also sets all the child nodes.
That is the only mechanism to ensure AioContext is consistent across
nodes.

When the BDS graph is manipulated (e.g. attaching new roots, swapping
nodes, etc) we don't perform checks today.

Markus has asked that I add the appropriate assertions so errors are
caught early.  I haven't done that yet but it's a good next step.


Okay, seems good to me. It's not possible to break them now, and if it 
will ever be, the assertions will at least catch it.



As far as I'm aware, these patches don't introduce cases where we would
make the AioContext in the graph inconsistent.  So I see the AioContext
consistency assertions as a separate patch series (which I will work on
next...hopefully not to discover horrible problems!).


And furthermore, basically all the calls to acquire an AIO context are of
the form "aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);". It is *extremely* unlikely if possible
at all, but wouldn't it be possible to change the BDS's AIO context from
another thread after the first function returned and before the lock is
acquired? If that is really the case, I think we should have some atomic
bdrv_acquire_aio_context() function.

No, because only the main loop calls bdrv_set_aio_context().  At the
moment the case you mentioned cannot happen.

Ultimately, we should move away from "this only works in the main loop"
constraints.  In order to provide atomic BDS AioContext acquire we need
a global root that is thread-safe.  That doesn't exist today -
bdrv_states is protected by the QEMU global mutex only.

I thought about adding the infrastructure in this patch series but it is
not necessary yet and would make the series more complicated.

The idea is:

  * Add bdrv_states_lock to protect the global list of BlockDriverStates
  * Integrate bdrv_ref()/bdrv_unref() as well as bdrv_get_aio_context()
so they are atomic and protected by the bdrv_states_lock

So bdrv_find() and other functions that access bdrv_states become the
entry points to acquiring BlockDriverStates in a thread-safe fashion.
bdrv_unref() will need rethinking too to prevent races between freeing a
BDS and bdrv_find().

Can you think of a place where we need this today?  I haven't found one
yet but would like one to develop the code against.


No, I can't think of anything, as long as QMP commands always arrive 
through the main loop.


Thank you for your explanations!

Max

Re: [Qemu-devel] IDs in QOM

2014-10-07 Thread Paolo Bonzini

Il 07/10/2014 14:16, Markus Armbruster ha scritto:
>> > Possibly, except this would propagate all the way through the APIs.  For
>> > example, right now [*] is added automatically to MemoryRegion
>> > properties, but this can change in the future since many MemoryRegions
>> > do not need array-like names.  Then you would have two sets of
>> > MemoryRegion creation APIs, one that array-ifies names and one that 
>> > doesn't.
> Why couldn't you have a separate name generator there as well?
> 
> QOM:
> * object_property_add() takes a non-magical name argument
> * object_gen_name() takes a base name and generates a stream of
>   derived names suitable for object_property_add()
> 
> Memory:
> * memory_region_init() takes a non-magical name argument
> * memory_gen_name() takes a base name... you get the idea
>   actually a wrapper around object_gen_name()

I see what you mean; you could even reuse object_gen_name().  It looks
sane, I guess one has to see a patch to judge if it also _is_ sane. :)

> > > Why is it a good idea have two separate restrictions on property names?
> > > A loser one that applies always (anything but '\0' and '/'), and a
> > > stricter one that applies sometimes (letters, digits, '-', '.', '_',
> > > starting with a letter).
> > > 
> > > If yes, how is "sometimes" defined?
> >
> > It applies to objects created by the user (either in
> > /machine/peripheral, or in /objects).  Why the restriction?  For
> > -object, because creating the object involves QemuOpts.  You then have
> > two ways to satisfy the principle of least astonishment:
> >
> > 1) always use the same restriction when a user creates objects;
> >
> > 2) do not introduce restrictions when a user is not using QemuOpts.
> >
> > We've been doing (2) so far; often it is just because QMP wrappers also
> > used QemuOpts, but not necessarily.  So object_add just does the same.
>
> We've been doing (2) so far simply because we've never wasted a thought
> on it!  Since we're wasting thoughts now: which one do we like better?

User interfaces other than QOM have been doing (2) too.

netdev-add and device-add have been doing (2) because they use QemuOpts
under the hood.

blockdev-add has been consciously doing (2) for node-name.

chardev-add has been doing (1), and I'd argue that this is a bug in
chardev-add.

QOM has two families of operations.

One is -object/object-add/object-del.  This is a high-level operation
that only works with specific QOM classes (those that implement
UserCreatable) and only operate on a specific part of the QOM tree
(/objects).

The other is qom-get/qom-set.  This is a low-level operation that can
explore all of the QOM tree.  It cannot _create_ new objects and
properties, however, so the user cannot escape the naming sandbox that
we put in place for him.

I think it's fair to limit the high-level operations to the same id
space, no matter if they're QemuOpts based or not.

> Based on experience, I'd rather not make "user-created"
> vs. "system-created" a hard boundary.  Once a system-created funny name
> has become ABI, we can't ever let the user create it.  One reason for me
> to prefer (1).

Anything that is outside /objects is "funny", not just anything that has
weird characters in its name.  The QOM API consists of "magic" object
canonical paths and magic property names which, as far as I know, can be
easily listed:

* the aforementioned /machine.rtc-time that lets you detect missed
RTC_CHANGE events

* the /backend tree that includes info on the graphic consoles.  Not
sure if this is considered stable, but it's there.

* /machine/peripheral/foo lets you peek at run-time properties of
-device id=foo - virtio-ballon has a couple of run-time properties,
whose status I am not certain of.  Probably stable but undocumented.

* /objects/bar lets you reconstruct the properties of -object id=bar -
there are no such run-time properties with any promised stability.


In other words, practically all of the QOM API is outside /objects.

But not all hope is lost.  Were we to provide user access to the
creation of graphic consoles, we could preserve the /backend API via
aliases and links.  This way, anything that currently happens in
/machine or /backend can tomorrow happen in /objects, without breaking
backwards compatibility.

Similarly, a QOMified block-backend could be either:

* an object that QEMU creates for you when you give -device
scsi-disk,id=disk,drive=foo.  The canonical path could be something like
/machine/peripheral/disk/drive-backend, with a link in
/machine/peripheral/disk/backend.

* an object that you create with -object
block-backend,id=bar,blockdev=myimg and reference with -device
scsi-disk,backend=bar.  The canonical path would be of course
/objects/bar, but the same link would exist in
/machine/peripheral/disk/backend.

In either case, you would be able to find the block-backend using the
same QOM path and property.

> So the "automatic arrayification" convenience feature added a property
> name

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Andrea Arcangeli

Hello,

On Tue, Oct 07, 2014 at 08:47:59AM -0400, Linus Torvalds wrote:
> On Mon, Oct 6, 2014 at 12:41 PM, Andrea Arcangeli  wrote:
> >
> > Of course if somebody has better ideas on how to resolve an anonymous
> > userfault they're welcome.
> 
> So I'd *much* rather have a "write()" style interface (ie _copying_
> bytes from user space into a newly allocated page that gets mapped)
> than a "remap page" style interface
> 
> remapping anonymous pages involves page table games that really aren't
> necessarily a good idea, and tlb invalidates for the old page etc.
> Just don't do it.

I see what you mean. The only cons I see is that we couldn't use then
recv(tmp_addr, PAGE_SIZE), remap_anon_pages(faultaddr, tmp_addr,
PAGE_SIZE, ..)  and retain the zerocopy behavior. Or how could we?
There's no recvfile(userfaultfd, socketfd, PAGE_SIZE).

Ideally if we could prevent the page data coming from the network to
ever become visible in the kernel we could avoid the TLB flush and
also be zerocopy but I can't see how we could achieve that.

The page data could come through a ssh pipe or anything (qemu supports
all kind of network transports for live migration), this is why
leaving the network protocol into userland is preferable.

As things stands now, I'm afraid with a write() syscall we couldn't do
it zerocopy. We'd still need to receive the memory in a temporary page
and then copy it to a kernel page (invisible to userland while we
write to it) to later map into the userfault address.

If it wasn't for the TLB flush of the old page, the remap_anon_pages
variant would be more optimal than doing a copy through a write
syscall. Is the copy cheaper than a TLB flush? I probably naively
assumed the TLB flush was always cheaper.

Now another idea that comes to mind to be able to add the ability to
switch between copy and TLB flush is using a RAP_FORCE_COPY flag, that
would then do a copy inside remap_anon_pages and leave the original
page mapped in place... (and such flag would also disable the -EBUSY
error if page_mapcount is > 1).

So then if the RAP_FORCE_COPY flag is set remap_anon_pages would
behave like you suggested (but with a mremap-like interface, instead
of a write syscall) and we could benchmark the difference between copy
and TLB flush too. We could even periodically benchmark it at runtime
and switch over the faster method (the more CPUs there are in the host
and the more threads the process has, the faster the copy will be
compared to the TLB flush).

Of course in terms of API I could implement the exact same mechanism
as described above for remap_anon_pages inside a write() to the
userfaultfd (it's a pseudo inode). It'd need two different commands to
prepare for the coming write (with a len multiple of PAGE_SIZE) to
know the address where the page should be mapped into and if to behave
zerocopy or if to skip the TLB flush and copy.

Because the copy vs TLB flush trade off is possible to achieve with
both interfaces, I think it really boils down to choosing between a
mremap like interface, or file+commands protocol interface. I tend to
like mremap more, that's why I opted for a remap_anon_pages syscall
kept orthogonal to the userfaultfd functionality (remap_anon_pages
could be also used standalone as an accelerated mremap in some
circumstances) but nothing prevents to just embed the same mechanism
inside userfaultfd if a file+commands API is preferable. Or we could
add a different syscall (separated from userfaultfd) that creates
another pseudofd to write a command plus the page data into it. Just I
wouldn't see the point of creating a pseudofd just to copy a page
atomically, the write() syscall would look more ideal if the
userfaultfd is already open for other reasons and the pseudofd
overhead is required anyway.

Last thing to keep in mind is that if using userfaults with SIGBUS and
without userfaultfd, remap_anon_pages would have been still useful, so
if we retain the SIGBUS behavior for volatile pages and we don't force
the usage for userfaultfd, it may be cleaner not to use userfaultfd
but a separate pseudofd to do the write() syscall though. Otherwise
the app would need to open the userfaultfd to resolve the fault even
though it's not using the userfaultfd protocol which doesn't look an
intuitive interface to me.

Comments welcome.

Thanks,
Andrea

Re: [Qemu-devel] [Patch v4 6/8] target_arm: Change the reset values based on the ELF entry

2014-10-07 Thread Martin Galvan

On Tue, Oct 7, 2014 at 11:13 AM, Alistair Francis  wrote:
> The Netduino 2 machine won't run unless the reset_pc is based
> on the ELF entry point.
>
> Signed-off-by: Alistair Francis 
> Signed-off-by: Peter Crosthwaite 
> ---
> V2:
>  - Malloc straight away, thanks to Peter C
>
>  hw/arm/armv7m.c | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
> index 7169027..07b36e2 100644
> --- a/hw/arm/armv7m.c
> +++ b/hw/arm/armv7m.c
> @@ -155,11 +155,19 @@ static void armv7m_bitband_init(void)
>
>  /* Board init.  */
>
> +typedef struct ARMV7MResetArgs {
> +ARMCPU *cpu;
> +uint32_t reset_pc;
> +} ARMV7MResetArgs;
> +
>  static void armv7m_reset(void *opaque)
>  {
> -ARMCPU *cpu = opaque;
> +ARMV7MResetArgs *args = opaque;
> +
> +cpu_reset(CPU(args->cpu));
>
> -cpu_reset(CPU(cpu));
> +args->cpu->env.thumb = args->reset_pc & 1;
> +args->cpu->env.regs[15] = args->reset_pc & ~1;
>  }
>
>  /* Init CPU and memory for a v7-M based board.
> @@ -180,6 +188,7 @@ qemu_irq *armv7m_init(MemoryRegion *system_memory, int 
> mem_size, int num_irq,
>  int i;
>  int big_endian;
>  MemoryRegion *hack = g_new(MemoryRegion, 1);
> +ARMV7MResetArgs *reset_args = g_new0(ARMV7MResetArgs, 1);
>
>  if (cpu_model == NULL) {
> cpu_model = "cortex-m3";
> @@ -234,7 +243,11 @@ qemu_irq *armv7m_init(MemoryRegion *system_memory, int 
> mem_size, int num_irq,
>  vmstate_register_ram_global(hack);
>  memory_region_add_subregion(system_memory, 0xf000, hack);
>
> -qemu_register_reset(armv7m_reset, cpu);
> +*reset_args = (ARMV7MResetArgs) {
> +.cpu = cpu,
> +.reset_pc = entry,
> +};
> +qemu_register_reset(armv7m_reset, reset_args);
>  return pic;
>  }

How does this differ from what's being done in arm_cpu_reset for
ARMv7-M? What about the initial MSP?

-- 

Martín Galván

Software Engineer

Taller Technologies Argentina


San Lorenzo 47, 3rd Floor, Office 5

Córdoba, Argentina

Phone: 54 351 4217888 / +54 351 4218211

[Qemu-devel] [Patch v4 6/8] target_arm: Change the reset values based on the ELF entry

2014-10-07 Thread Alistair Francis

The Netduino 2 machine won't run unless the reset_pc is based
on the ELF entry point.

Signed-off-by: Alistair Francis 
Signed-off-by: Peter Crosthwaite 
---
V2:
 - Malloc straight away, thanks to Peter C

 hw/arm/armv7m.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index 7169027..07b36e2 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -155,11 +155,19 @@ static void armv7m_bitband_init(void)
 
 /* Board init.  */
 
+typedef struct ARMV7MResetArgs {
+ARMCPU *cpu;
+uint32_t reset_pc;
+} ARMV7MResetArgs;
+
 static void armv7m_reset(void *opaque)
 {
-ARMCPU *cpu = opaque;
+ARMV7MResetArgs *args = opaque;
+
+cpu_reset(CPU(args->cpu));
 
-cpu_reset(CPU(cpu));
+args->cpu->env.thumb = args->reset_pc & 1;
+args->cpu->env.regs[15] = args->reset_pc & ~1;
 }
 
 /* Init CPU and memory for a v7-M based board.
@@ -180,6 +188,7 @@ qemu_irq *armv7m_init(MemoryRegion *system_memory, int 
mem_size, int num_irq,
 int i;
 int big_endian;
 MemoryRegion *hack = g_new(MemoryRegion, 1);
+ARMV7MResetArgs *reset_args = g_new0(ARMV7MResetArgs, 1);
 
 if (cpu_model == NULL) {
cpu_model = "cortex-m3";
@@ -234,7 +243,11 @@ qemu_irq *armv7m_init(MemoryRegion *system_memory, int 
mem_size, int num_irq,
 vmstate_register_ram_global(hack);
 memory_region_add_subregion(system_memory, 0xf000, hack);
 
-qemu_register_reset(armv7m_reset, cpu);
+*reset_args = (ARMV7MResetArgs) {
+.cpu = cpu,
+.reset_pc = entry,
+};
+qemu_register_reset(armv7m_reset, reset_args);
 return pic;
 }
 
-- 
1.9.1

[Qemu-devel] [Patch v4 8/8] netduino2: Add the Netduino 2 Machine

2014-10-07 Thread Alistair Francis

This patch adds the Netduino 2 Machine.

This is a Cortex-M3 based machine. Information can be found at:
http://www.netduino.com/netduino2/specs.htm

Signed-off-by: Alistair Francis 
---
Changes from RFC:
 - Remove CPU passthrough

 hw/arm/Makefile.objs |  1 +
 hw/arm/netduino2.c   | 54 
 2 files changed, 55 insertions(+)
 create mode 100644 hw/arm/netduino2.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 9769317..2577f68 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -3,6 +3,7 @@ obj-$(CONFIG_DIGIC) += digic_boards.o
 obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
 obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
 obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
+obj-y += netduino2.o
 
 obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
 obj-$(CONFIG_DIGIC) += digic.o
diff --git a/hw/arm/netduino2.c b/hw/arm/netduino2.c
new file mode 100644
index 000..305983f
--- /dev/null
+++ b/hw/arm/netduino2.c
@@ -0,0 +1,54 @@
+/*
+ * Netduino 2 Machine Model
+ *
+ * Copyright (c) 2014 Alistair Francis 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/arm/stm32f205_soc.h"
+
+static void netduino2_init(MachineState *machine)
+{
+DeviceState *dev;
+Error *err = NULL;
+
+dev = qdev_create(NULL, TYPE_STM32F205_SOC);
+if (machine->kernel_filename) {
+qdev_prop_set_string(dev, "kernel-filename", machine->kernel_filename);
+}
+object_property_set_bool(OBJECT(dev), true, "realized", &err);
+if (err != NULL) {
+error_report("%s", error_get_pretty(err));
+exit(1);
+}
+}
+
+static QEMUMachine netduino2_machine = {
+.name = "netduino2",
+.desc = "Netduino 2 Machine",
+.init = netduino2_init,
+};
+
+static void netduino2_machine_init(void)
+{
+qemu_register_machine(&netduino2_machine);
+}
+
+machine_init(netduino2_machine_init);
-- 
1.9.1

[Qemu-devel] [Patch v4 5/8] target_arm: Parameterise the irq lines for armv7m_init

2014-10-07 Thread Alistair Francis

This patch allows the board to specifiy the number of NVIC interrupt
lines when using armv7m_init.

Signed-off-by: Alistair Francis 
---

 hw/arm/armv7m.c  | 7 ---
 hw/arm/stellaris.c   | 5 -
 include/hw/arm/arm.h | 2 +-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index 50281f7..7169027 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -166,14 +166,14 @@ static void armv7m_reset(void *opaque)
mem_size is in bytes.
Returns the NVIC array.  */
 
-qemu_irq *armv7m_init(MemoryRegion *system_memory, int mem_size,
+qemu_irq *armv7m_init(MemoryRegion *system_memory, int mem_size, int num_irq,
   const char *kernel_filename, const char *cpu_model)
 {
 ARMCPU *cpu;
 CPUARMState *env;
 DeviceState *nvic;
 /* FIXME: make this local state.  */
-static qemu_irq pic[64];
+qemu_irq *pic = g_new(qemu_irq, num_irq);
 int image_size;
 uint64_t entry;
 uint64_t lowaddr;
@@ -194,11 +194,12 @@ qemu_irq *armv7m_init(MemoryRegion *system_memory, int 
mem_size,
 armv7m_bitband_init();
 
 nvic = qdev_create(NULL, "armv7m_nvic");
+qdev_prop_set_uint32(nvic, "num-irq", num_irq);
 env->nvic = nvic;
 qdev_init_nofail(nvic);
 sysbus_connect_irq(SYS_BUS_DEVICE(nvic), 0,
qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
-for (i = 0; i < 64; i++) {
+for (i = 0; i < num_irq; i++) {
 pic[i] = qdev_get_gpio_in(nvic, i);
 }
 
diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
index d0c61c5..6fad10f 100644
--- a/hw/arm/stellaris.c
+++ b/hw/arm/stellaris.c
@@ -29,6 +29,8 @@
 #define BP_OLED_SSI  0x02
 #define BP_GAMEPAD   0x04
 
+#define NUM_IRQ_LINES 64
+
 typedef const struct {
 const char *name;
 uint32_t did0;
@@ -1239,7 +1241,8 @@ static void stellaris_init(const char *kernel_filename, 
const char *cpu_model,
 vmstate_register_ram_global(sram);
 memory_region_add_subregion(system_memory, 0x2000, sram);
 
-pic = armv7m_init(system_memory, flash_size, kernel_filename, cpu_model);
+pic = armv7m_init(system_memory, flash_size, NUM_IRQ_LINES,
+  kernel_filename, cpu_model);
 
 if (board->dc1 & (1 << 16)) {
 dev = sysbus_create_varargs(TYPE_STELLARIS_ADC, 0x40038000,
diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index a112930..94e55a4 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -15,7 +15,7 @@
 #include "hw/irq.h"
 
 /* armv7m.c */
-qemu_irq *armv7m_init(MemoryRegion *system_memory, int mem_size,
+qemu_irq *armv7m_init(MemoryRegion *system_memory, int mem_size, int num_irq,
   const char *kernel_filename, const char *cpu_model);
 
 /* arm_boot.c */
-- 
1.9.1

[Qemu-devel] [Patch v4 7/8] stm32f205: Add the stm32f205 SoC

2014-10-07 Thread Alistair Francis

This patch adds the stm32f205 SoC. This will be used by the
Netduino 2 to create a machine.

Signed-off-by: Alistair Francis 
---
Changes from RFC:
 - Small changes thanks to Peter C
 - Split the config settings to device level

 default-configs/arm-softmmu.mak |   1 +
 hw/arm/Makefile.objs|   1 +
 hw/arm/stm32f205_soc.c  | 157 
 include/hw/arm/stm32f205_soc.h  |  69 ++
 4 files changed, 228 insertions(+)
 create mode 100644 hw/arm/stm32f205_soc.c
 create mode 100644 include/hw/arm/stm32f205_soc.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index a2ea8f7..8068100 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -81,6 +81,7 @@ CONFIG_ZYNQ=y
 CONFIG_STM32F205_TIMER=y
 CONFIG_STM32F205_USART=y
 CONFIG_STM32F205_SYSCFG=y
+CONFIG_STM32F205_SOC=y
 
 CONFIG_VERSATILE_PCI=y
 CONFIG_VERSATILE_I2C=y
diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 6088e53..9769317 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -8,3 +8,4 @@ obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o 
pxa2xx_pic.o
 obj-$(CONFIG_DIGIC) += digic.o
 obj-y += omap1.o omap2.o strongarm.o
 obj-$(CONFIG_ALLWINNER_A10) += allwinner-a10.o cubieboard.o
+obj-$(CONFIG_STM32F205_SOC) += stm32f205_soc.o
diff --git a/hw/arm/stm32f205_soc.c b/hw/arm/stm32f205_soc.c
new file mode 100644
index 000..bd9514e
--- /dev/null
+++ b/hw/arm/stm32f205_soc.c
@@ -0,0 +1,157 @@
+/*
+ * STM32F205 SoC
+ *
+ * Copyright (c) 2014 Alistair Francis 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/arm/stm32f205_soc.h"
+
+/* At the moment only Timer 2 to 5 are modelled */
+static const uint32_t timer_addr[] = { 0x4000, 0x4400,
+0x4800, 0x4C00 };
+static const uint32_t usart_addr[] = { 0x40011000, 0x40004400,
+0x40004800, 0x40004C00, 0x40005000, 0x40011400 };
+
+static const int timer_irq[] = {28, 29, 30, 50};
+static const int usart_irq[] = {37, 38, 39, 52, 53, 71, 82, 83};
+
+static void stm32f205_soc_initfn(Object *obj)
+{
+STM32F205State *s = STM32F205_SOC(obj);
+int i;
+
+object_initialize(&s->syscfg, sizeof(s->syscfg), TYPE_STM32F205_SYSCFG);
+qdev_set_parent_bus(DEVICE(&s->syscfg), sysbus_get_default());
+
+for (i = 0; i < 5; i++) {
+object_initialize(&s->usart[i], sizeof(s->usart[i]),
+  TYPE_STM32F205_USART);
+qdev_set_parent_bus(DEVICE(&s->usart[i]), sysbus_get_default());
+}
+
+for (i = 0; i < 4; i++) {
+object_initialize(&s->timer[i], sizeof(s->timer[i]),
+  TYPE_STM32F205_TIMER);
+qdev_set_parent_bus(DEVICE(&s->timer[i]), sysbus_get_default());
+}
+}
+
+static void stm32f205_soc_realize(DeviceState *dev_soc, Error **errp)
+{
+STM32F205State *s = STM32F205_SOC(dev_soc);
+DeviceState *syscfgdev, *usartdev, *timerdev;
+SysBusDevice *syscfgbusdev, *usartbusdev, *timerbusdev;
+qemu_irq *pic;;
+Error *err = NULL;
+int i;
+
+MemoryRegion *system_memory = get_system_memory();
+MemoryRegion *sram = g_new(MemoryRegion, 1);
+MemoryRegion *flash = g_new(MemoryRegion, 1);
+MemoryRegion *flash_alias = g_new(MemoryRegion, 1);
+
+memory_region_init_ram(flash, NULL, "netduino.flash", FLASH_SIZE,
+   &error_abort);
+memory_region_init_alias(flash_alias, NULL, "netduino.flash.alias",
+ flash, 0, FLASH_SIZE);
+
+vmstate_register_ram_global(flash);
+
+memory_region_set_readonly(flash, true);
+memory_region_set_readonly(flash_alias, true);
+
+memory_region_add_subregion(system_memory, FLASH_BASE_ADDRESS, flash);
+memory_region_add_subregion(system_memory, 0, flash_alias);
+
+memory_region_init_ram(sram, NULL, "netduino.sram", SRAM_SIZE,
+   &error_abort);
+vmstate_register_ram_

[Qemu-devel] [Patch v4 3/8] stm32f205_SYSCFG: Add the stm32f205 SYSCFG

2014-10-07 Thread Alistair Francis

This patch adds the stm32f205 System Configuration
Controller. This is used to configure what memory is mapped
at address 0 (although that is not supported) as well
as configure how the EXTI interrupts work (also not
supported at the moment).

This device is not required for basic examples, but more
complex systems will require it (as well as the EXTI device)

Signed-off-by: Alistair Francis 
---
V3:
 - Update debug printing
Changes from RFC:
 - Split the config settings to device level

 default-configs/arm-softmmu.mak|   1 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/stm32f205_syscfg.c | 160 +
 include/hw/misc/stm32f205_syscfg.h |  61 ++
 4 files changed, 223 insertions(+)
 create mode 100644 hw/misc/stm32f205_syscfg.c
 create mode 100644 include/hw/misc/stm32f205_syscfg.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 422dec0..a2ea8f7 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -80,6 +80,7 @@ CONFIG_ZAURUS=y
 CONFIG_ZYNQ=y
 CONFIG_STM32F205_TIMER=y
 CONFIG_STM32F205_USART=y
+CONFIG_STM32F205_SYSCFG=y
 
 CONFIG_VERSATILE_PCI=y
 CONFIG_VERSATILE_I2C=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 979e532..63f03bd 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -39,5 +39,6 @@ obj-$(CONFIG_OMAP) += omap_sdrc.o
 obj-$(CONFIG_OMAP) += omap_tap.o
 obj-$(CONFIG_SLAVIO) += slavio_misc.o
 obj-$(CONFIG_ZYNQ) += zynq_slcr.o
+obj-$(CONFIG_STM32F205_SYSCFG) += stm32f205_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
diff --git a/hw/misc/stm32f205_syscfg.c b/hw/misc/stm32f205_syscfg.c
new file mode 100644
index 000..98b2030
--- /dev/null
+++ b/hw/misc/stm32f205_syscfg.c
@@ -0,0 +1,160 @@
+/*
+ * STM32F205 SYSCFG
+ *
+ * Copyright (c) 2014 Alistair Francis 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/misc/stm32f205_syscfg.h"
+
+#ifndef STM_SYSCFG_ERR_DEBUG
+#define STM_SYSCFG_ERR_DEBUG 0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (STM_SYSCFG_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static void stm32f205_syscfg_reset(DeviceState *dev)
+{
+STM32f205SyscfgState *s = STM32F205_SYSCFG(dev);
+
+s->syscfg_memrmp = 0x;
+s->syscfg_pmc = 0x;
+s->syscfg_exticr1 = 0x;
+s->syscfg_exticr2 = 0x;
+s->syscfg_exticr3 = 0x;
+s->syscfg_exticr4 = 0x;
+s->syscfg_cmpcr = 0x;
+}
+
+static uint64_t stm32f205_syscfg_read(void *opaque, hwaddr addr,
+ unsigned int size)
+{
+STM32f205SyscfgState *s = opaque;
+
+DB_PRINT("0x%x\n", (uint) addr);
+
+switch (addr) {
+case SYSCFG_MEMRMP:
+return s->syscfg_memrmp;
+case SYSCFG_PMC:
+return s->syscfg_pmc;
+case SYSCFG_EXTICR1:
+return s->syscfg_exticr1;
+case SYSCFG_EXTICR2:
+return s->syscfg_exticr2;
+case SYSCFG_EXTICR3:
+return s->syscfg_exticr3;
+case SYSCFG_EXTICR4:
+return s->syscfg_exticr4;
+case SYSCFG_CMPCR:
+return s->syscfg_cmpcr;
+default:
+qemu_log_mask(LOG_GUEST_ERROR,
+  "STM32F205_syscfg_read: Bad offset %x\n", (int)addr);
+return 0;
+}
+
+return 0;
+}
+
+static void stm32f205_syscfg_write(void *opaque, hwaddr addr,
+   uint64_t val64, unsigned int size)
+{
+STM32f205SyscfgState *s = opaque;
+uint32_t value = val64;
+
+DB_PRINT("0x%x, 0x%x\n", value, (uint) addr);
+
+switch (addr) {
+case SYSCFG_MEMRMP:
+qemu_log_mask(LOG_UNIMP,
+  "STM32F205_syscfg_write: Changeing the memory mapping " \
+  "isn't supported in QEMU\n");
+return;
+case SYSCFG_PMC:
+

[Qemu-devel] [Patch v4 1/8] stm32f205_timer: Add the stm32f205 Timer

2014-10-07 Thread Alistair Francis

This patch adds the stm32f205 timers: TIM2, TIM3, TIM4 and TIM5
to QEMU.

Signed-off-by: Alistair Francis 
---
V4:
 - Update timer units again
- Thanks to Peter C
V3:
 - Update debug statements
 - Correct the units for timer_mod
 - Correctly set timer_offset from resets
V2:
 - Reorder the Makefile config
 - Fix up the debug printing
 - Correct the timer event trigger
Changes from RFC:
 - Small changes to functionality and style. Thanks to Peter C
 - Rename to make the timer more generic
 - Split the config settings to device level

 default-configs/arm-softmmu.mak|   1 +
 hw/timer/Makefile.objs |   2 +
 hw/timer/stm32f205_timer.c | 318 +
 include/hw/timer/stm32f205_timer.h | 101 
 4 files changed, 422 insertions(+)
 create mode 100644 hw/timer/stm32f205_timer.c
 create mode 100644 include/hw/timer/stm32f205_timer.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index f3513fa..cf23b24 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -78,6 +78,7 @@ CONFIG_NSERIES=y
 CONFIG_REALVIEW=y
 CONFIG_ZAURUS=y
 CONFIG_ZYNQ=y
+CONFIG_STM32F205_TIMER=y
 
 CONFIG_VERSATILE_PCI=y
 CONFIG_VERSATILE_I2C=y
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 2c86c3d..4bd9617 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -31,3 +31,5 @@ obj-$(CONFIG_DIGIC) += digic-timer.o
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 
 obj-$(CONFIG_ALLWINNER_A10_PIT) += allwinner-a10-pit.o
+
+common-obj-$(CONFIG_STM32F205_TIMER) += stm32f205_timer.o
diff --git a/hw/timer/stm32f205_timer.c b/hw/timer/stm32f205_timer.c
new file mode 100644
index 000..aace8df
--- /dev/null
+++ b/hw/timer/stm32f205_timer.c
@@ -0,0 +1,318 @@
+/*
+ * STM32F205 Timer
+ *
+ * Copyright (c) 2014 Alistair Francis 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/timer/stm32f205_timer.h"
+
+#ifndef STM_TIMER_ERR_DEBUG
+#define STM_TIMER_ERR_DEBUG 0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (STM_TIMER_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static void stm32f205_timer_set_alarm(STM32f205TimerState *s);
+
+static void stm32f205_timer_interrupt(void *opaque)
+{
+STM32f205TimerState *s = opaque;
+
+DB_PRINT("Interrupt\n");
+
+if (s->tim_dier & TIM_DIER_UIE && s->tim_cr1 & TIM_CR1_CEN) {
+s->tim_sr |= 1;
+qemu_irq_pulse(s->irq);
+stm32f205_timer_set_alarm(s);
+}
+}
+
+static void stm32f205_timer_set_alarm(STM32f205TimerState *s)
+{
+uint32_t ticks;
+int64_t now;
+
+DB_PRINT("Alarm set at: 0x%x\n", s->tim_cr1);
+
+now = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
+ticks = s->tim_arr - ((s->tick_offset + (now * (s->freq_hz / 1000))) /
+(s->tim_psc + 1));
+
+DB_PRINT("Alarm set in %d ticks\n", ticks);
+
+if (ticks == 0) {
+timer_del(s->timer);
+stm32f205_timer_interrupt(s);
+} else {
+timer_mod(s->timer, ((now * (s->freq_hz / 1000)) / (s->tim_psc + 1)) +
+ (int64_t) ticks);
+DB_PRINT("Wait Time: %" PRId64 " ticks\n",
+ ((now * (s->freq_hz / 1000)) / (s->tim_psc + 1)) +
+ (int64_t) ticks);
+}
+}
+
+static void stm32f205_timer_reset(DeviceState *dev)
+{
+STM32f205TimerState *s = STM32F205TIMER(dev);
+
+s->tim_cr1 = 0;
+s->tim_cr2 = 0;
+s->tim_smcr = 0;
+s->tim_dier = 0;
+s->tim_sr = 0;
+s->tim_egr = 0;
+s->tim_ccmr1 = 0;
+s->tim_ccmr2 = 0;
+s->tim_ccer = 0;
+s->tim_cnt = 0;
+s->tim_psc = 0;
+s->tim_arr = 0;
+s->tim_ccr1 = 0;
+s->tim_ccr2 = 0;
+s->tim_ccr3 = 0;
+s->tim_ccr4 = 0;
+s->tim_dcr = 0;
+s->tim_dmar = 0;
+s->tim_or = 0;
+
+s->tick_offset = qemu_cloc

[Qemu-devel] [Patch v4 4/8] target_arm: Remove memory region init from armv7m_init

2014-10-07 Thread Alistair Francis

This patch moves the memory region init code from the
armv7m_init function to the stellaris_init function

Signed-off-by: Alistair Francis 
---
V3:
 - Rename the flash_size argument to mem_size
 - Remove the sram_size and related code
- Thanks to Peter C
V2:
 - Change the memory region names to match the machine

 hw/arm/armv7m.c  | 33 +++--
 hw/arm/stellaris.c   | 24 
 include/hw/arm/arm.h |  3 +--
 3 files changed, 24 insertions(+), 36 deletions(-)

diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index ef24ca4..50281f7 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -163,11 +163,10 @@ static void armv7m_reset(void *opaque)
 }
 
 /* Init CPU and memory for a v7-M based board.
-   flash_size and sram_size are in kb.
+   mem_size is in bytes.
Returns the NVIC array.  */
 
-qemu_irq *armv7m_init(MemoryRegion *system_memory,
-  int flash_size, int sram_size,
+qemu_irq *armv7m_init(MemoryRegion *system_memory, int mem_size,
   const char *kernel_filename, const char *cpu_model)
 {
 ARMCPU *cpu;
@@ -180,13 +179,8 @@ qemu_irq *armv7m_init(MemoryRegion *system_memory,
 uint64_t lowaddr;
 int i;
 int big_endian;
-MemoryRegion *sram = g_new(MemoryRegion, 1);
-MemoryRegion *flash = g_new(MemoryRegion, 1);
 MemoryRegion *hack = g_new(MemoryRegion, 1);
 
-flash_size *= 1024;
-sram_size *= 1024;
-
 if (cpu_model == NULL) {
cpu_model = "cortex-m3";
 }
@@ -197,27 +191,6 @@ qemu_irq *armv7m_init(MemoryRegion *system_memory,
 }
 env = &cpu->env;
 
-#if 0
-/* > 32Mb SRAM gets complicated because it overlaps the bitband area.
-   We don't have proper commandline options, so allocate half of memory
-   as SRAM, up to a maximum of 32Mb, and the rest as code.  */
-if (ram_size > (512 + 32) * 1024 * 1024)
-ram_size = (512 + 32) * 1024 * 1024;
-sram_size = (ram_size / 2) & TARGET_PAGE_MASK;
-if (sram_size > 32 * 1024 * 1024)
-sram_size = 32 * 1024 * 1024;
-code_size = ram_size - sram_size;
-#endif
-
-/* Flash programming is done via the SCU, so pretend it is ROM.  */
-memory_region_init_ram(flash, NULL, "armv7m.flash", flash_size,
-   &error_abort);
-vmstate_register_ram_global(flash);
-memory_region_set_readonly(flash, true);
-memory_region_add_subregion(system_memory, 0, flash);
-memory_region_init_ram(sram, NULL, "armv7m.sram", sram_size, &error_abort);
-vmstate_register_ram_global(sram);
-memory_region_add_subregion(system_memory, 0x2000, sram);
 armv7m_bitband_init();
 
 nvic = qdev_create(NULL, "armv7m_nvic");
@@ -244,7 +217,7 @@ qemu_irq *armv7m_init(MemoryRegion *system_memory,
 image_size = load_elf(kernel_filename, NULL, NULL, &entry, &lowaddr,
   NULL, big_endian, ELF_MACHINE, 1);
 if (image_size < 0) {
-image_size = load_image_targphys(kernel_filename, 0, flash_size);
+image_size = load_image_targphys(kernel_filename, 0, mem_size);
 lowaddr = 0;
 }
 if (image_size < 0) {
diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
index 64bd4b4..d0c61c5 100644
--- a/hw/arm/stellaris.c
+++ b/hw/arm/stellaris.c
@@ -1220,10 +1220,26 @@ static void stellaris_init(const char *kernel_filename, 
const char *cpu_model,
 int i;
 int j;
 
-flash_size = ((board->dc0 & 0x) + 1) << 1;
-sram_size = (board->dc0 >> 18) + 1;
-pic = armv7m_init(get_system_memory(),
-  flash_size, sram_size, kernel_filename, cpu_model);
+MemoryRegion *sram = g_new(MemoryRegion, 1);
+MemoryRegion *flash = g_new(MemoryRegion, 1);
+MemoryRegion *system_memory = get_system_memory();
+
+flash_size = (((board->dc0 & 0x) + 1) << 1) * 1024;
+sram_size = ((board->dc0 >> 18) + 1) * 1024;
+
+/* Flash programming is done via the SCU, so pretend it is ROM.  */
+memory_region_init_ram(flash, NULL, "stellaris.flash", flash_size,
+   &error_abort);
+vmstate_register_ram_global(flash);
+memory_region_set_readonly(flash, true);
+memory_region_add_subregion(system_memory, 0, flash);
+
+memory_region_init_ram(sram, NULL, "stellaris.sram", sram_size,
+   &error_abort);
+vmstate_register_ram_global(sram);
+memory_region_add_subregion(system_memory, 0x2000, sram);
+
+pic = armv7m_init(system_memory, flash_size, kernel_filename, cpu_model);
 
 if (board->dc1 & (1 << 16)) {
 dev = sysbus_create_varargs(TYPE_STELLARIS_ADC, 0x40038000,
diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index cefc9e6..a112930 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -15,8 +15,7 @@
 #include "hw/irq.h"
 
 /* armv7m.c */
-qemu_irq *armv7m_init(MemoryRegion *system_memory,
-  int flash_size, int sram_size,
+qemu_irq *armv7m_i

[Qemu-devel] [PATCH] Tracing docs fix configure option and description

2014-10-07 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

Fix the example trace configure option.
Update the text to say that multiple backends are allowed and what
happens when multiple backends are enabled.

Signed-off-by: Dr. David Alan Gilbert 
---
 docs/tracing.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index 7d38926..7117c5e 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -139,12 +139,12 @@ events are not tightly coupled to a specific trace 
backend, such as LTTng or
 SystemTap.  Support for trace backends can be added by extending the 
"tracetool"
 script.
 
-The trace backend is chosen at configure time and only one trace backend can
-be built into the binary:
+The trace backends are chosen at configure time:
 
-./configure --trace-backends=simple
+./configure --enable-trace-backends=simple
 
 For a list of supported trace backends, try ./configure --help or see below.
+If multiple backends are enabled, the trace is sent to them all.
 
 The following subsections describe the supported trace backends.
 
-- 
1.9.3

[Qemu-devel] [Patch v4 2/8] stm32f205_USART: Add the stm32f205 USART Controller

2014-10-07 Thread Alistair Francis

This patch adds the stm32f205 USART controller
(UART also uses the same controller).

Signed-off-by: Alistair Francis 
---
V3:
 - Update debug printing
V2:
 - Drop charecters if the device is not enabled
- Thanks to Peter C
Changes from RFC:
 - Small changes thanks to Peter C
 - USART now implements QEMU blocking functions
 - Split the config settings to device level

 default-configs/arm-softmmu.mak   |   1 +
 hw/char/Makefile.objs |   1 +
 hw/char/stm32f205_usart.c | 218 ++
 include/hw/char/stm32f205_usart.h |  69 
 4 files changed, 289 insertions(+)
 create mode 100644 hw/char/stm32f205_usart.c
 create mode 100644 include/hw/char/stm32f205_usart.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index cf23b24..422dec0 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -79,6 +79,7 @@ CONFIG_REALVIEW=y
 CONFIG_ZAURUS=y
 CONFIG_ZYNQ=y
 CONFIG_STM32F205_TIMER=y
+CONFIG_STM32F205_USART=y
 
 CONFIG_VERSATILE_PCI=y
 CONFIG_VERSATILE_I2C=y
diff --git a/hw/char/Makefile.objs b/hw/char/Makefile.objs
index 317385d..c7b3ce4 100644
--- a/hw/char/Makefile.objs
+++ b/hw/char/Makefile.objs
@@ -15,6 +15,7 @@ obj-$(CONFIG_OMAP) += omap_uart.o
 obj-$(CONFIG_SH4) += sh_serial.o
 obj-$(CONFIG_PSERIES) += spapr_vty.o
 obj-$(CONFIG_DIGIC) += digic-uart.o
+obj-$(CONFIG_STM32F205_USART) += stm32f205_usart.o
 
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_ser.o
 common-obj-$(CONFIG_ISA_DEBUG) += debugcon.o
diff --git a/hw/char/stm32f205_usart.c b/hw/char/stm32f205_usart.c
new file mode 100644
index 000..9d399b8
--- /dev/null
+++ b/hw/char/stm32f205_usart.c
@@ -0,0 +1,218 @@
+/*
+ * STM32F205 USART
+ *
+ * Copyright (c) 2014 Alistair Francis 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/char/stm32f205_usart.h"
+
+#ifndef STM_USART_ERR_DEBUG
+#define STM_USART_ERR_DEBUG 0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (STM_USART_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static int stm32f205_usart_can_receive(void *opaque)
+{
+STM32f205UsartState *s = opaque;
+
+if (!(s->usart_sr & USART_SR_RXNE)) {
+return 1;
+}
+
+return 0;
+}
+
+static void stm32f205_usart_receive(void *opaque, const uint8_t *buf, int size)
+{
+STM32f205UsartState *s = opaque;
+
+s->usart_dr = *buf;
+
+if (!(s->usart_cr1 & USART_CR1_UE && s->usart_cr1 & USART_CR1_RE)) {
+/* USART not enabled - drop the chars */
+DB_PRINT("Dropping the chars\n");
+return;
+}
+
+s->usart_sr |= USART_SR_RXNE;
+
+if (s->usart_cr1 & USART_CR1_RXNEIE) {
+qemu_set_irq(s->irq, 1);
+}
+
+DB_PRINT("Receiving: %c\n", s->usart_dr);
+}
+
+static void stm32f205_usart_reset(DeviceState *dev)
+{
+STM32f205UsartState *s = STM32F205_USART(dev);
+
+s->usart_sr = USART_SR_RESET;
+s->usart_dr = 0x;
+s->usart_brr = 0x;
+s->usart_cr1 = 0x;
+s->usart_cr2 = 0x;
+s->usart_cr3 = 0x;
+s->usart_gtpr = 0x;
+}
+
+static uint64_t stm32f205_usart_read(void *opaque, hwaddr addr,
+   unsigned int size)
+{
+STM32f205UsartState *s = opaque;
+uint64_t retvalue;
+
+DB_PRINT("Read 0x%"HWADDR_PRIx"\n", addr);
+
+switch (addr) {
+case USART_SR:
+retvalue = s->usart_sr;
+s->usart_sr &= ~USART_SR_TC;
+if (s->chr) {
+qemu_chr_accept_input(s->chr);
+}
+return retvalue;
+case USART_DR:
+DB_PRINT("Value: 0x%" PRIx32 ", %c\n", s->usart_dr, (char) 
s->usart_dr);
+s->usart_sr |= USART_SR_TXE;
+s->usart_sr &= ~USART_SR_RXNE;
+return s->usart_dr & 0x3FF;
+case USART_BRR:
+return s->usart_brr;
+

[Qemu-devel] [Patch v4 0/8] Netduino 2 Machine Model

2014-10-07 Thread Alistair Francis

This patch series adds the Netduino 2 Machine to QEMU

Information on the board is avalible at:
http://www.netduino.com/netduino2/specs.htm

The git tree can be found at:
https://github.com/alistair23/qemu/tree/netduino2.4

This patch series makes some changes to the armv7m_init function
that allows the code to be reused with the Netduino 2 and the
Stellaris machines.

Some example code that runs on QEMU is avaliable at:
at: https://github.com/alistair23/CSSE3010-QEMU-Examples

I have more devices in the works, I figured I would just start
with these three

V4:
 - Rebase
 - Correct timer units
V3:
 - Correct the timer interrupts
 - Update debug printing
 - Remove the sram_size argument from armv7m_init
V2:
 - Fix up the Timer device
 - Fix up the USART device
 - Change the memory region names to match the Stellaris board
Changes from RFC:
 - Code cleanup thanks to Peter C's comments
 - Split the Makefile configs to device level
 - Changes to armv7m_init with interupt and memory passing
- See the individual patches for more details


Alistair Francis (8):
  stm32f205_timer: Add the stm32f205 Timer
  stm32f205_USART: Add the stm32f205 USART Controller
  stm32f205_SYSCFG: Add the stm32f205 SYSCFG
  target_arm: Remove memory region init from armv7m_init
  target_arm: Parameterise the irq lines for armv7m_init
  target_arm: Change the reset values based on the ELF entry
  stm32f205: Add the stm32f205 SoC
  netduino2: Add the Netduino 2 Machine

 default-configs/arm-softmmu.mak|   4 +
 hw/arm/Makefile.objs   |   2 +
 hw/arm/armv7m.c|  57 +++
 hw/arm/netduino2.c |  54 +++
 hw/arm/stellaris.c |  27 +++-
 hw/arm/stm32f205_soc.c | 157 ++
 hw/char/Makefile.objs  |   1 +
 hw/char/stm32f205_usart.c  | 218 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/stm32f205_syscfg.c | 160 +++
 hw/timer/Makefile.objs |   2 +
 hw/timer/stm32f205_timer.c | 318 +
 include/hw/arm/arm.h   |   3 +-
 include/hw/arm/stm32f205_soc.h |  69 
 include/hw/char/stm32f205_usart.h  |  69 
 include/hw/misc/stm32f205_syscfg.h |  61 +++
 include/hw/timer/stm32f205_timer.h | 101 
 17 files changed, 1263 insertions(+), 41 deletions(-)
 create mode 100644 hw/arm/netduino2.c
 create mode 100644 hw/arm/stm32f205_soc.c
 create mode 100644 hw/char/stm32f205_usart.c
 create mode 100644 hw/misc/stm32f205_syscfg.c
 create mode 100644 hw/timer/stm32f205_timer.c
 create mode 100644 include/hw/arm/stm32f205_soc.h
 create mode 100644 include/hw/char/stm32f205_usart.h
 create mode 100644 include/hw/misc/stm32f205_syscfg.h
 create mode 100644 include/hw/timer/stm32f205_timer.h

-- 
1.9.1

[Qemu-devel] qemu-nbd and discard

2014-10-07 Thread Peter Lieven


Hi,

if I use qemu-nbd from master and 3.13 host kernel and attach
a QCOW2 image via local device to /dev/nbd0 and then mount
an ext4 partition inside the image with -o discard. Is then fstrim
supposed to work?

Thank you,
Peter

Re: [Qemu-devel] [PATCH v2 37/36] qdev: device_del: search for to be unplugged device in 'peripheral' container

2014-10-07 Thread Igor Mammedov

On Tue, 07 Oct 2014 15:23:45 +0200
Andreas Färber  wrote:

> Am 07.10.2014 um 14:10 schrieb Igor Mammedov:
> > On Tue, 7 Oct 2014 19:59:51 +0800
> > Zhu Guihua  wrote:
> > 
> >> On Thu, 2014-10-02 at 10:08 +, Igor Mammedov wrote:
> >>> device_add puts every device with 'id' inside of 'peripheral'
> >>> container using id's value as the last component name.
> >>> Use it by replacing recursive search on sysbus with path
> >>> lookup in 'peripheral' container, which could handle both
> >>> BUS and BUS-less device cases.
> >>>
> >>
> >> If I want to delete device without id inside of 'peripheral-anon'
> >> container, the command 'device_del' does not work. 
> >> My suggestion is deleting device by the last component name, is this
> >> feasiable?
> > So far device_del was designed to work only with id-ed devices.
> > 
> > What's a use-case for unplugging unnamed device from peripheral-anon?
> 
> I can think of use cases where you may want to balloon memory or CPUs.
yep currently initial CPUs are created without dev->id and even without
device_add help.
However if/when it's switched to device_add we can make them use
auto-generated IDs so they would go into peripheral section.
That would let us keep peripheral-anon for devices that shouldn't
be unplugged.

> 
> But that seems orthogonal to this series.
> 
> Regards,
> Andreas
>

Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages

2014-10-07 Thread Andrea Arcangeli

Hi Kirill,

On Tue, Oct 07, 2014 at 02:10:26PM +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 03, 2014 at 07:08:00PM +0200, Andrea Arcangeli wrote:
> > There's one constraint enforced to allow this simplification: the
> > source pages passed to remap_anon_pages must be mapped only in one
> > vma, but this is not a limitation when used to handle userland page
> > faults with MADV_USERFAULT. The source addresses passed to
> > remap_anon_pages should be set as VM_DONTCOPY with MADV_DONTFORK to
> > avoid any risk of the mapcount of the pages increasing, if fork runs
> > in parallel in another thread, before or while remap_anon_pages runs.
> 
> Have you considered triggering COW instead of adding limitation on
> pages' mapcount? The limitation looks artificial from interface POV.

I haven't considered it, mostly because I see it as a feature that it
returns -EBUSY. I prefer to avoid the risk of userland getting a
successful retval but internally the kernel silently behaving
non-zerocopy by mistake because some userland bug forgot to set
MADV_DONTFORK on the src_vma.

COW would be not zerocopy so it's not ok. We get sub 1msec latency for
userfaults through 10gbit and we don't want to risk wasting CPU
caches.

I however considered allowing to extend the strict behavior (i.e. the
feature) later in a backwards compatible way. We could provide a
non-zerocopy beahvior with a RAP_ALLOW_COW flag that would then turn
the -EBUSY error into a copy.

It's also more complex to implement the cow now, so it would make the
code that really matters, harder to review. So it may be preferable to
extend this later in a backwards compatible way with a new
RAP_ALLOW_COW flag.

The current handling the flags is already written in a way that should
allow backwards compatible extension with RAP_ALLOW_*:

#define RAP_ALLOW_SRC_HOLES (1UL<<0)

SYSCALL_DEFINE4(remap_anon_pages,
unsigned long, dst_start, unsigned long, src_start,
unsigned long, len, unsigned long, flags)
[..]
long err = -EINVAL;
[..]
if (flags & ~RAP_ALLOW_SRC_HOLES)
return err;

Re: [Qemu-devel] [PATCH V4 4/8] pc: add cpu hotplug handler to PC_MACHINE

2014-10-07 Thread Igor Mammedov

On Mon, 29 Sep 2014 18:52:33 +0800
Gu Zheng  wrote:

> Add cpu hotplug handler to PC_MACHINE, which will perform the acpi
> cpu hotplug callback via hotplug_handler API.
> 
> v3:
>  -deal with start up cpus in a more neat way as Igor suggested.
> v2:
>  -just rebase.
> 
> Signed-off-by: Gu Zheng 
> ---
>  hw/i386/pc.c |   26 +-
>  1 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 82a7daa..dcb9332 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1616,11 +1616,34 @@ out:
>  error_propagate(errp, local_err);
>  }
>  
> +static void pc_cpu_plug(HotplugHandler *hotplug_dev,
> +DeviceState *dev, Error **errp)
> +{
> +HotplugHandlerClass *hhc;
> +Error *local_err = NULL;
> +PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> +
> +if (!pcms->acpi_dev) {
> +if (dev->hotplugged) {
it could be better to move this check out of acpi_dev block
so it wouldn't rely on acpi_dev being uninitialized for
initial CPUs.
 
> +error_setg(&local_err,
> +   "cpu hotplug is not enabled: missing acpi device");
> +}
> +goto out;
> +}

something like this:

if (!pcms->acpi_dev) {
error_setg(&local_err, "missing acpi device");
goto out;
}

if (!dev->hotplugged) {
goto out;
}

> +hhc = HOTPLUG_HANDLER_GET_CLASS(pcms->acpi_dev);
> +hhc->plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err);
> +out:
> +error_propagate(errp, local_err);
> +}
> +
>  static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>DeviceState *dev, Error **errp)
>  {
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>  pc_dimm_plug(hotplug_dev, dev, errp);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> +pc_cpu_plug(hotplug_dev, dev, errp);
>  }
>  }
>  
> @@ -1629,7 +1652,8 @@ static HotplugHandler 
> *pc_get_hotpug_handler(MachineState *machine,
>  {
>  PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
>  
> -if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
> +object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>  return HOTPLUG_HANDLER(machine);
>  }
>

[Qemu-devel] [PATCH v1 4/8] throttle: Prepare to have multiple timers for one ThrottleState

2014-10-07 Thread Benoît Canet

This patch transform the timer_pending call into two boolean values in the
ThrottleState structure.

This way we are sure that when multiple timers will be used only
one can be armed at a time.

Signed-off-by: Benoit Canet 
---
 block.c |  2 ++
 include/qemu/throttle.h |  3 +++
 util/throttle.c | 17 ++---
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index f209f55..079b0dc 100644
--- a/block.c
+++ b/block.c
@@ -168,12 +168,14 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 static void bdrv_throttle_read_timer_cb(void *opaque)
 {
 BlockDriverState *bs = opaque;
+throttle_timer_fired(&bs->throttle_state, false);
 qemu_co_enter_next(&bs->throttled_reqs[0]);
 }
 
 static void bdrv_throttle_write_timer_cb(void *opaque)
 {
 BlockDriverState *bs = opaque;
+throttle_timer_fired(&bs->throttle_state, true);
 qemu_co_enter_next(&bs->throttled_reqs[1]);
 }
 
diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
index b89d4d8..3aece3a 100644
--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -65,6 +65,7 @@ typedef struct ThrottleConfig {
 typedef struct ThrottleState {
 ThrottleConfig cfg;   /* configuration */
 int64_t previous_leak;/* timestamp of the last leak done */
+bool any_timer_armed[2];  /* is any timer armed for this throttle state */
 } ThrottleState;
 
 typedef struct ThrottleTimers {
@@ -125,6 +126,8 @@ bool throttle_schedule_timer(ThrottleState *ts,
  ThrottleTimers *tt,
  bool is_write);
 
+void throttle_timer_fired(ThrottleState *ts, bool is_write);
+
 void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
 
 #endif
diff --git a/util/throttle.c b/util/throttle.c
index 4d7c8d4..0e305e3 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -172,6 +172,7 @@ void throttle_timers_attach_aio_context(ThrottleTimers *tt,
 void throttle_init(ThrottleState *ts)
 {
 memset(ts, 0, sizeof(ThrottleState));
+ts->any_timer_armed[0] = ts->any_timer_armed[1] = false;
 }
 
 /* To be called first on the ThrottleTimers */
@@ -390,16 +391,26 @@ bool throttle_schedule_timer(ThrottleState *ts,
 return false;
 }
 
-/* request throttled and timer pending -> do nothing */
-if (timer_pending(tt->timers[is_write])) {
+/* request throttled and any timer pending -> do nothing */
+if (ts->any_timer_armed[is_write]) {
 return true;
 }
 
-/* request throttled and timer not pending -> arm timer */
+ts->any_timer_armed[is_write] = true;
 timer_mod(tt->timers[is_write], next_timestamp);
 return true;
 }
 
+/* Remember that now timers are currently armed
+ *
+ * @ts:   the throttle state we are working on
+ * @is_write: the type of operation (read/write)
+ */
+void throttle_timer_fired(ThrottleState *ts, bool is_write)
+{
+ts->any_timer_armed[is_write] = false;
+}
+
 /* do the accounting for this operation
  *
  * @is_write: the type of operation (read/write)
-- 
2.1.1

[Qemu-devel] [PATCH v1 1/8] throttle: Extract timers from ThrottleState into a separate ThrottleTimers structure

2014-10-07 Thread Benoît Canet

Group throttling will share ThrottleState between multiple bs.
As a consequence the ThrottleState will be accessed by multiple aio context.

Timers are tied to their aio context so they must go out of the ThrottleState 
structure.

This commit pave the way for each bs of a common ThrottleState to have it's own
timer.

Signed-off-by: Benoit Canet 
---
 block.c   | 35 
 include/block/block_int.h |  1 +
 include/qemu/throttle.h   | 36 +
 tests/test-throttle.c | 82 ++-
 util/throttle.c   | 73 -
 5 files changed, 134 insertions(+), 93 deletions(-)

diff --git a/block.c b/block.c
index d3aebeb..f209f55 100644
--- a/block.c
+++ b/block.c
@@ -129,7 +129,7 @@ void bdrv_set_io_limits(BlockDriverState *bs,
 {
 int i;
 
-throttle_config(&bs->throttle_state, cfg);
+throttle_config(&bs->throttle_state, &bs->throttle_timers, cfg);
 
 for (i = 0; i < 2; i++) {
 qemu_co_enter_next(&bs->throttled_reqs[i]);
@@ -162,7 +162,7 @@ void bdrv_io_limits_disable(BlockDriverState *bs)
 
 bdrv_start_throttled_reqs(bs);
 
-throttle_destroy(&bs->throttle_state);
+throttle_timers_destroy(&bs->throttle_timers);
 }
 
 static void bdrv_throttle_read_timer_cb(void *opaque)
@@ -181,12 +181,13 @@ static void bdrv_throttle_write_timer_cb(void *opaque)
 void bdrv_io_limits_enable(BlockDriverState *bs)
 {
 assert(!bs->io_limits_enabled);
-throttle_init(&bs->throttle_state,
-  bdrv_get_aio_context(bs),
-  QEMU_CLOCK_VIRTUAL,
-  bdrv_throttle_read_timer_cb,
-  bdrv_throttle_write_timer_cb,
-  bs);
+throttle_init(&bs->throttle_state);
+throttle_timers_init(&bs->throttle_timers,
+ bdrv_get_aio_context(bs),
+ QEMU_CLOCK_VIRTUAL,
+ bdrv_throttle_read_timer_cb,
+ bdrv_throttle_write_timer_cb,
+ bs);
 bs->io_limits_enabled = true;
 }
 
@@ -200,7 +201,9 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
  bool is_write)
 {
 /* does this io must wait */
-bool must_wait = throttle_schedule_timer(&bs->throttle_state, is_write);
+bool must_wait = throttle_schedule_timer(&bs->throttle_state,
+ &bs->throttle_timers,
+ is_write);
 
 /* if must wait or any request of this type throttled queue the IO */
 if (must_wait ||
@@ -213,7 +216,8 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 
 
 /* if the next request must wait -> do nothing */
-if (throttle_schedule_timer(&bs->throttle_state, is_write)) {
+if (throttle_schedule_timer(&bs->throttle_state, &bs->throttle_timers,
+is_write)) {
 return;
 }
 
@@ -1990,6 +1994,9 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 memcpy(&bs_dest->throttle_state,
&bs_src->throttle_state,
sizeof(ThrottleState));
+memcpy(&bs_dest->throttle_timers,
+   &bs_src->throttle_timers,
+   sizeof(ThrottleTimers));
 bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
 bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
 bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
@@ -2052,7 +2059,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState 
*bs_old)
 assert(bs_new->job == NULL);
 assert(bs_new->dev == NULL);
 assert(bs_new->io_limits_enabled == false);
-assert(!throttle_have_timer(&bs_new->throttle_state));
+assert(!throttle_timers_are_init(&bs_new->throttle_timers));
 
 tmp = *bs_new;
 *bs_new = *bs_old;
@@ -2070,7 +2077,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState 
*bs_old)
 assert(bs_new->dev == NULL);
 assert(bs_new->job == NULL);
 assert(bs_new->io_limits_enabled == false);
-assert(!throttle_have_timer(&bs_new->throttle_state));
+assert(!throttle_timers_are_init(&bs_new->throttle_timers));
 
 /* insert the nodes back into the graph node list if needed */
 if (bs_new->node_name[0] != '\0') {
@@ -5746,7 +5753,7 @@ void bdrv_detach_aio_context(BlockDriverState *bs)
 }
 
 if (bs->io_limits_enabled) {
-throttle_detach_aio_context(&bs->throttle_state);
+throttle_timers_detach_aio_context(&bs->throttle_timers);
 }
 if (bs->drv->bdrv_detach_aio_context) {
 bs->drv->bdrv_detach_aio_context(bs);
@@ -5782,7 +5789,7 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
 bs->drv->bdrv_attach_aio_context(bs, new_context);
 }
 if (bs->io_limits_enabled) {
-throttle_attach_aio_context(&bs->throttle_state, new_context);
+throttle_timers_attach_aio_context(&bs->throttle_timers, new_conte

[Qemu-devel] [PATCH v1 7/8] throttle: Add throttle group support

2014-10-07 Thread Benoît Canet

The throttle group support use a cooperative round robin scheduling algorithm.

The principle of the algorithm are simple:
- Each BDS of the group is used as a token in a circular way.
- The active BDS compute if a wait must be done and arm the right timer.
- If a wait must be done the token timer will be armed so the token will become
  the next active BDS.

Signed-off-by: Benoit Canet 
---
 block.c   | 191 --
 block/qapi.c  |   7 +-
 block/throttle-groups.c   |   2 +-
 blockdev.c|  19 -
 hmp.c |   4 +-
 include/block/block.h |   3 +-
 include/block/block_int.h |   9 ++-
 qapi/block-core.json  |   5 +-
 qemu-options.hx   |   1 +
 qmp-commands.hx   |   3 +-
 10 files changed, 209 insertions(+), 35 deletions(-)

diff --git a/block.c b/block.c
index 527ea48..e7e5607 100644
--- a/block.c
+++ b/block.c
@@ -36,6 +36,7 @@
 #include "qmp-commands.h"
 #include "qemu/timer.h"
 #include "qapi-event.h"
+#include "block/throttle-groups.h"
 
 #ifdef CONFIG_BSD
 #include 
@@ -129,7 +130,9 @@ void bdrv_set_io_limits(BlockDriverState *bs,
 {
 int i;
 
-throttle_config(&bs->throttle_state, &bs->throttle_timers, cfg);
+throttle_group_lock(bs->throttle_state);
+throttle_config(bs->throttle_state, &bs->throttle_timers, cfg);
+throttle_group_unlock(bs->throttle_state);
 
 for (i = 0; i < 2; i++) {
 qemu_co_enter_next(&bs->throttled_reqs[i]);
@@ -156,34 +159,99 @@ static bool bdrv_start_throttled_reqs(BlockDriverState 
*bs)
 return drained;
 }
 
+static void bdrv_throttle_group_add(BlockDriverState *bs)
+{
+int i;
+BlockDriverState *token;
+
+for (i = 0; i < 2; i++) {
+/* Get the BlockDriverState having the round robin token */
+token = throttle_group_token(bs->throttle_state, i);
+
+/* If the ThrottleGroup is new set the current BlockDriverState as
+ * token
+ */
+if (!token) {
+throttle_group_set_token(bs->throttle_state, bs, i);
+}
+
+}
+
+throttle_group_register_bs(bs->throttle_state, bs);
+}
+
+static void bdrv_throttle_group_remove(BlockDriverState *bs)
+{
+BlockDriverState *token;
+int i;
+
+for (i = 0; i < 2; i++) {
+/* Get the BlockDriverState having the round robin token */
+token = throttle_group_token(bs->throttle_state, i);
+/* if this bs is the current token set the next bs as token */
+if (token == bs) {
+token = throttle_group_next_bs(token);
+/* take care of the case where bs is the only bs of the group */
+if (token == bs) {
+token = NULL;
+}
+throttle_group_set_token(bs->throttle_state, token, i);
+}
+}
+
+/* remove the current bs from the list */
+QLIST_REMOVE(bs, round_robin);
+}
+
 void bdrv_io_limits_disable(BlockDriverState *bs)
 {
+
+throttle_group_lock(bs->throttle_state);
 bs->io_limits_enabled = false;
+throttle_group_unlock(bs->throttle_state);
 
 bdrv_start_throttled_reqs(bs);
 
+throttle_group_lock(bs->throttle_state);
+bdrv_throttle_group_remove(bs);
+throttle_group_unlock(bs->throttle_state);
+
+throttle_group_unref(bs->throttle_state);
+bs->throttle_state = NULL;
+
 throttle_timers_destroy(&bs->throttle_timers);
 }
 
 static void bdrv_throttle_read_timer_cb(void *opaque)
 {
 BlockDriverState *bs = opaque;
-throttle_timer_fired(&bs->throttle_state, false);
+
+throttle_group_lock(bs->throttle_state);
+throttle_timer_fired(bs->throttle_state, false);
+throttle_group_unlock(bs->throttle_state);
+
 qemu_co_enter_next(&bs->throttled_reqs[0]);
 }
 
 static void bdrv_throttle_write_timer_cb(void *opaque)
 {
 BlockDriverState *bs = opaque;
-throttle_timer_fired(&bs->throttle_state, true);
+
+throttle_group_lock(bs->throttle_state);
+throttle_timer_fired(bs->throttle_state, true);
+throttle_group_unlock(bs->throttle_state);
+
 qemu_co_enter_next(&bs->throttled_reqs[1]);
 }
 
 /* should be called before bdrv_set_io_limits if a limit is set */
-void bdrv_io_limits_enable(BlockDriverState *bs)
+void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)
 {
 assert(!bs->io_limits_enabled);
-throttle_init(&bs->throttle_state);
+bs->throttle_state = throttle_group_incref(group ? group : 
bs->device_name);
+
+throttle_group_lock(bs->throttle_state);
+bdrv_throttle_group_add(bs);
 throttle_timers_init(&bs->throttle_timers,
  bdrv_get_aio_context(bs),
  QEMU_CLOCK_VIRTUAL,
@@ -191,6 +259,53 @@ void bdrv_io_limits_enable(BlockDriverState *bs)
  bdrv_throttle_write_timer_cb,
  bs);
 bs->io_limits_enabled = true;
+throttle_group_unlock(bs->throttle_state);
+}
+
+void bdrv_io_limits_update_group(BlockDriv

[Qemu-devel] [PATCH v1 3/8] throttle: Add throttle group infrastructure tests

2014-10-07 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 tests/test-throttle.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index 3e52df3..ecb5504 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -15,6 +15,7 @@
 #include "block/aio.h"
 #include "qemu/throttle.h"
 #include "qemu/error-report.h"
+#include "block/throttle-groups.h"
 
 static AioContext *ctx;
 static LeakyBucketbkt;
@@ -500,6 +501,55 @@ static void test_accounting(void)
 (64.0 / 13)));
 }
 
+static void test_groups(void)
+{
+bool removed;
+
+ThrottleState *ts_foo, *ts_bar, *tmp;
+
+ts_bar = throttle_group_incref("bar");
+throttle_group_set_token(ts_bar, (BlockDriverState *) 0x5, false);
+ts_foo = throttle_group_incref("foo");
+
+tmp = throttle_group_incref("foo");
+throttle_group_set_token(tmp, (BlockDriverState *) 0x7, true);
+g_assert(tmp == ts_foo);
+
+tmp = throttle_group_incref("bar");
+g_assert(tmp == ts_bar);
+
+tmp = throttle_group_incref("bar");
+g_assert(tmp == ts_bar);
+
+g_assert((int64_t) throttle_group_token(ts_bar, false) == 0x5);
+g_assert((int64_t) throttle_group_token(ts_foo, true) == 0x7);
+
+removed = throttle_group_unref(ts_foo);
+g_assert(removed);
+removed = throttle_group_unref(ts_bar);
+g_assert(removed);
+
+g_assert((int64_t) throttle_group_token(ts_foo, true) == 0x7);
+
+removed = throttle_group_unref(ts_foo);
+g_assert(removed);
+removed = throttle_group_unref(ts_bar);
+g_assert(removed);
+
+/* "foo" group should be destroyed when reaching this */
+removed = throttle_group_unref(ts_foo);
+g_assert(!removed);
+
+g_assert((int64_t) throttle_group_token(ts_bar, false) == 0x5);
+
+removed = throttle_group_unref(ts_bar);
+g_assert(removed);
+
+/* "bar" group should be destroyed when reaching this */
+removed = throttle_group_unref(ts_bar);
+g_assert(!removed);
+}
+
 int main(int argc, char **argv)
 {
 GSource *src;
@@ -533,6 +583,7 @@ int main(int argc, char **argv)
 g_test_add_func("/throttle/config/is_valid",test_is_valid);
 g_test_add_func("/throttle/config_functions",   test_config_functions);
 g_test_add_func("/throttle/accounting", test_accounting);
+g_test_add_func("/throttle/groups", test_groups);
 return g_test_run();
 }
 
-- 
2.1.1

[Qemu-devel] [PATCH v1 6/8] throttle: Add a way to fire one of the timers asap like a bottom half

2014-10-07 Thread Benoît Canet

This will be needed by the group throttling algorithm.

Signed-off-by: Benoit Canet 
---
 include/qemu/throttle.h |  2 ++
 util/throttle.c | 11 +++
 2 files changed, 13 insertions(+)

diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
index 3a16c48..3b9d1b8 100644
--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -127,6 +127,8 @@ bool throttle_schedule_timer(ThrottleState *ts,
  bool is_write,
  bool *armed);
 
+void throttle_fire_timer(ThrottleTimers *tt, bool is_write);
+
 void throttle_timer_fired(ThrottleState *ts, bool is_write);
 
 void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
diff --git a/util/throttle.c b/util/throttle.c
index a273acb..163b9d0 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -403,6 +403,17 @@ bool throttle_schedule_timer(ThrottleState *ts,
 return true;
 }
 
+/* Schedule a throttle timer like a BH
+ *
+ * @tt:   The timers structure
+ * @is_write: the type of operation (read/write)
+ */
+void throttle_fire_timer(ThrottleTimers *tt, bool is_write)
+{
+int64_t now = qemu_clock_get_ns(tt->clock_type);
+timer_mod(tt->timers[is_write], now + 1);
+}
+
 /* Remember that now timers are currently armed
  *
  * @ts:   the throttle state we are working on
-- 
2.1.1

[Qemu-devel] [PATCH v1 8/8] throttle: Update throttle infrastructure copyright

2014-10-07 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 include/qemu/throttle.h | 4 ++--
 tests/test-throttle.c   | 4 ++--
 util/throttle.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
index 3b9d1b8..8abd94d 100644
--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -1,10 +1,10 @@
 /*
  * QEMU throttling infrastructure
  *
- * Copyright (C) Nodalink, SARL. 2013
+ * Copyright (C) Nodalink, EURL. 2013-2014
  *
  * Author:
- *   Benoît Canet 
+ *   Benoît Canet 
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as
diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index ecb5504..0efd372 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -1,10 +1,10 @@
 /*
  * Throttle infrastructure tests
  *
- * Copyright Nodalink, SARL. 2013
+ * Copyright Nodalink, EURL. 2013-2014
  *
  * Authors:
- *  Benoît Canet 
+ *  Benoît Canet 
  *
  * This work is licensed under the terms of the GNU LGPL, version 2 or later.
  * See the COPYING.LIB file in the top-level directory.
diff --git a/util/throttle.c b/util/throttle.c
index 163b9d0..4884c08 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -1,10 +1,10 @@
 /*
  * QEMU throttling infrastructure
  *
- * Copyright (C) Nodalink, SARL. 2013
+ * Copyright (C) Nodalink, EURL. 2013-2014
  *
  * Author:
- *   Benoît Canet 
+ *   Benoît Canet 
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as
-- 
2.1.1

[Qemu-devel] [PATCH v1 2/8] throttle: Add throttle group infrastructure

2014-10-07 Thread Benoît Canet

The throttle_group_incref increment the refcount of a throttle group given it's
name and return the associated throttle state.

The throttle_group_unref is the mirror function for cleaning up.

Signed-off-by: Benoit Canet 
---
 block/Makefile.objs |   1 +
 block/throttle-groups.c | 212 
 include/block/block_int.h   |   1 +
 include/block/throttle-groups.h |  45 +
 4 files changed, 259 insertions(+)
 create mode 100644 block/throttle-groups.c
 create mode 100644 include/block/throttle-groups.h

diff --git a/block/Makefile.objs b/block/Makefile.objs
index a833ed5..d257b05 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -10,6 +10,7 @@ block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 block-obj-y += null.o
+block-obj-y += throttle-groups.o
 
 block-obj-y += nbd.o nbd-client.o sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
new file mode 100644
index 000..ea5baca
--- /dev/null
+++ b/block/throttle-groups.c
@@ -0,0 +1,212 @@
+/*
+ * QEMU block throttling group infrastructure
+ *
+ * Copyright (C) Nodalink, EURL. 2014
+ *
+ * Author:
+ *   Benoît Canet 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "block/throttle-groups.h"
+#include "qemu/queue.h"
+#include "qemu/thread.h"
+
+typedef struct ThrottleGroup {
+char name[32];
+ThrottleState ts;
+uint64_t refcount;
+QTAILQ_ENTRY(ThrottleGroup) list;
+QLIST_HEAD(, BlockDriverState) head;
+BlockDriverState *tokens[2]; /* current round-robin tokens */
+QemuMutex lock; /* Used to synchronize all elements belonging to a group */
+} ThrottleGroup;
+
+static QTAILQ_HEAD(, ThrottleGroup) throttle_groups =
+QTAILQ_HEAD_INITIALIZER(throttle_groups);
+
+/* increments a ThrottleGroup reference count given it's name
+ *
+ * If no ThrottleGroup is found with the given name a new one is created.
+ *
+ * @name: the name of the ThrottleGroup
+ * @ret:  the ThrottleGroup's ThrottleState address
+ */
+ThrottleState *throttle_group_incref(const char *name)
+{
+ThrottleGroup *tg;
+
+/* return the correct ThrottleState if a group with this name exists */
+QTAILQ_FOREACH(tg, &throttle_groups, list) {
+/* group not found -> continue */
+if (strcmp(name, tg->name)) {
+continue;
+}
+/* group found -> increment it's refcount and return ThrottleState */
+tg->refcount++;
+return &tg->ts;
+}
+
+/* throttle group not found -> prepare new entry */
+tg = g_new0(ThrottleGroup, 1);
+pstrcpy(tg->name, sizeof(tg->name), name);
+qemu_mutex_init(&tg->lock);
+throttle_init(&tg->ts);
+QLIST_INIT(&tg->head);
+tg->refcount = 1;
+
+/* insert new entry in the list */
+QTAILQ_INSERT_TAIL(&throttle_groups, tg, list);
+
+/* return newly allocated ThrottleState */
+return &tg->ts;
+}
+
+/* decrement a ThrottleGroup given it's ThrottleState address
+ *
+ * When the refcount reach zero the ThrottleGroup is destroyed
+ *
+ * @ts:  The ThrottleState address belonging to the ThrottleGroup to unref
+ * @ret: true on success else false
+ */
+bool throttle_group_unref(ThrottleState *ts)
+{
+ThrottleGroup *tg;
+bool found = false;
+
+/* Find the ThrottleGroup of the given ThrottleState */
+QTAILQ_FOREACH(tg, &throttle_groups, list) {
+/* correct group found stop iterating */
+if (&tg->ts == ts) {
+qemu_mutex_lock(&tg->lock);
+found = true;
+break;
+}
+}
+
+/* If the ThrottleState was not found something is seriously broken */
+if (!found) {
+return false;
+}
+
+tg->refcount--;
+
+/* If ThrottleGroup is used keep it. */
+if (tg->refcount) {
+qemu_mutex_unlock(&tg->lock);
+return true;
+}
+
+/* Else destroy it */
+QTAILQ_REMOVE(&throttle_groups, tg, list);
+qemu_mutex_unlock(&tg->lock);
+qemu_mutex_destroy(&tg->lock);
+g_free(tg);
+return true;
+}
+
+/* Compare a name with a given ThrottleState group name
+ *
+ * @ts:   the throttle state whose group we are inspecting
+ * @name: the name to compare
+ * @ret:  true if names are equal else false
+ */
+bool throttle_group

[Qemu-devel] [PATCH v1 5/8] throttle: Add a way to know if throttle_schedule_timer had armed a timer

2014-10-07 Thread Benoît Canet

This will be needed by the group throttling patches for the algorithm to be
accurate.

Signed-off-by: Benoit Canet 
---
 block.c |  7 +--
 include/qemu/throttle.h |  3 ++-
 util/throttle.c | 12 +++-
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 079b0dc..527ea48 100644
--- a/block.c
+++ b/block.c
@@ -202,10 +202,13 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
  unsigned int bytes,
  bool is_write)
 {
+bool armed;
+
 /* does this io must wait */
 bool must_wait = throttle_schedule_timer(&bs->throttle_state,
  &bs->throttle_timers,
- is_write);
+ is_write,
+ &armed);
 
 /* if must wait or any request of this type throttled queue the IO */
 if (must_wait ||
@@ -219,7 +222,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 
 /* if the next request must wait -> do nothing */
 if (throttle_schedule_timer(&bs->throttle_state, &bs->throttle_timers,
-is_write)) {
+is_write, &armed)) {
 return;
 }
 
diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
index 3aece3a..3a16c48 100644
--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -124,7 +124,8 @@ void throttle_get_config(ThrottleState *ts, ThrottleConfig 
*cfg);
 /* usage */
 bool throttle_schedule_timer(ThrottleState *ts,
  ThrottleTimers *tt,
- bool is_write);
+ bool is_write,
+ bool *armed);
 
 void throttle_timer_fired(ThrottleState *ts, bool is_write);
 
diff --git a/util/throttle.c b/util/throttle.c
index 0e305e3..a273acb 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -375,11 +375,13 @@ void throttle_get_config(ThrottleState *ts, 
ThrottleConfig *cfg)
  */
 bool throttle_schedule_timer(ThrottleState *ts,
  ThrottleTimers *tt,
- bool is_write)
+ bool is_write,
+ bool *armed)
 {
 int64_t now = qemu_clock_get_ns(tt->clock_type);
 int64_t next_timestamp;
 bool must_wait;
+*armed = false;
 
 must_wait = throttle_compute_timer(ts,
is_write,
@@ -392,12 +394,12 @@ bool throttle_schedule_timer(ThrottleState *ts,
 }
 
 /* request throttled and any timer pending -> do nothing */
-if (ts->any_timer_armed[is_write]) {
-return true;
+if (!ts->any_timer_armed[is_write]) {
+*armed = true;
+ts->any_timer_armed[is_write] = true;
+timer_mod(tt->timers[is_write], next_timestamp);
 }
 
-ts->any_timer_armed[is_write] = true;
-timer_mod(tt->timers[is_write], next_timestamp);
 return true;
 }
 
-- 
2.1.1

[Qemu-devel] [PATCH v1 0/8] Block Throttle Group Support

2014-10-07 Thread Benoît Canet

Hi,

For the user interface I implemented Stefanha's idea proposed in Stuttgart.

For the throttling algorithm I use a cooperative round robin scheduler.

Classical round robin works with a fixed HZ ticks and it's totaly incompatible
with the throttling algorithm.

So the cooperative round robin scheduler is a way for each block device to 
decide
if a pause must be done and a timer be armed and most important of all which
other block device of the group must resume the work once the timer is fired.

The advantages of this algorigthm are:

-only one timer active at a given time (no more cpu usage than regular 
throttling)
-no central place didacting the sheduling policy like a didactureship:
 we love collaboration isn't it ?:)
-No need to deal with  incoming queues to collect requests before scheduling
 then with and dispatchs queues
-Compatible with the throttling code with almost no changes
-As you go scheduling

Best regards

Benoît

Benoît Canet (8):
  throttle: Extract timers from ThrottleState into a separate
ThrottleTimers structure
  throttle: Add throttle group infrastructure
  throttle: Add throttle group infrastructure tests
  throttle: Prepare to have multiple timers for one ThrottleState
  throttle: Add a way to know if throttle_schedule_timer had armed a
timer
  throttle: Add a way to fire one of the timers asap like a bottom half
  throttle: Add throttle group support
  throttle: Update throttle infrastructure copyright

 block.c | 211 ++-
 block/Makefile.objs |   1 +
 block/qapi.c|   7 +-
 block/throttle-groups.c | 212 
 blockdev.c  |  19 +++-
 hmp.c   |   4 +-
 include/block/block.h   |   3 +-
 include/block/block_int.h   |   9 +-
 include/block/throttle-groups.h |  45 +
 include/qemu/throttle.h |  46 ++---
 qapi/block-core.json|   5 +-
 qemu-options.hx |   1 +
 qmp-commands.hx |   3 +-
 tests/test-throttle.c   | 137 +++---
 util/throttle.c | 107 +---
 15 files changed, 685 insertions(+), 125 deletions(-)
 create mode 100644 block/throttle-groups.c
 create mode 100644 include/block/throttle-groups.h

-- 
2.1.1

Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT

2014-10-07 Thread Andrea Arcangeli

Hi Kirill,

On Tue, Oct 07, 2014 at 01:36:45PM +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote:
> > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the
> > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if
> > userland touches a still unmapped virtual address, a sigbus signal is
> > sent instead of allocating a new page. The sigbus signal handler will
> > then resolve the page fault in userland by calling the
> > remap_anon_pages syscall.
> 
> Hm. I wounder if this functionality really fits madvise(2) interface: as
> far as I understand it, it provides a way to give a *hint* to kernel which
> may or may not trigger an action from kernel side. I don't think an
> application will behaive reasonably if kernel ignore the *advise* and will
> not send SIGBUS, but allocate memory.
> 
> I would suggest to consider to use some other interface for the
> functionality: a new syscall or, perhaps, mprotect().

I didn't feel like adding PROT_USERFAULT to mprotect, which looks
hardwired to just these flags:

   PROT_NONE  The memory cannot be accessed at all.

   PROT_READ  The memory can be read.

   PROT_WRITE The memory can be modified.

   PROT_EXEC  The memory can be executed.

Normally mprotect doesn't just alter the vmas but it also alters
pte/hugepmds protection bits, that's something that is never needed
with VM_USERFAULT so I didn't feel like VM_USERFAULT is a protection
change to the VMA.

mprotect is also hardwired to mangle only the VM_READ|WRITE|EXEC
flags, while madvise is ideal to set arbitrary vma flags.

>From an implementation standpoint the perfect place to set a flag in a
vma is madvise. This is what MADV_DONTFORK (it sets VM_DONTCOPY)
already does too in an identical way to MADV_USERFAULT/VM_USERFAULT.

MADV_DONTFORK is as critical as MADV_USERFAULT because people depends
on it for example to prevent the O_DIRECT vs fork race condition that
results in silent data corruption during I/O with threads that may
fork. The other reason why MADV_DONTFORK is critical is that fork()
would otherwise fail with OOM unless full overcommit is enabled
(i.e. pci hotplug crashes the guest if you forget to set
MADV_DONTFORK).

Another madvise that would generate a failure if not obeyed by the
kernel is MADV_DONTNEED that if it does nothing it could run lead to
OOM killing. We don't inflate virt balloons using munmap just to make
an example. Various other apps (maybe JVM garbage collection too)
makes extensive use of MADV_DONTNEED and depend on it.

Said that I can change it to mprotect, the only thing that I don't
like is that it'll result in a less clean patch and I can't possibly
see a practical risk in keeping it simpler with madvise, as long as we
always return -EINVAL whenever we encounter a vma type that cannot
raise userfaults yet (that is something I already enforced).

Yet another option would be to drop MADV_USERFAULT and
vm_flags&VM_USERFAULT entirely and in turn the ability to handle
userfaults with SIGBUS, and retain only the userfaultfd. The new
userfaultfd protocol requires registering each created userfaultfd
into its own private virtual memory ranges (that is to allow an
unlimited number of userfaultfd per process). Currently the
userfaultfd engages iff the fault address intersects both the
MADV_USERFAULT range and the userfaultfd registered ranges. So I could
drop MADV_USERFAULT and VM_USERFAULT and just check for
vma->vm_userfaultfd_ctx!=NULL to know if the userfaultfd protocol
needs to be engaged during the first page fault for a still unmapped
virtual address. I just thought it would be more flexibile to also
allow SIGBUS without forcing people to use userfaultfd (that's in fact
the only reason to still retain madvise(MADV_USERFAULT)!).

Volatile pages earlier patches only supported SIGBUS behavior for
example.. and I didn't intend to force them to use userfaultfd if
they're guaranteed to access the memory with the CPU and never through
a kernel syscall (that is something the app can enforce by
design). userfaultfd becomes necessary the moment you want to handle
userfaults through syscalls/gup etc... qemu obviously requires
userfaultfd and it never uses the userfaultfd-less SIGBUS behavior as
it touches the memory in all possible ways (first and foremost with
the KVM page fault that uses almost all variants of gup..).

So here somebody should comment and choose between:

1) set VM_USERFAULT with mprotect(PROT_USERFAULT) instead of
   the current madvise(MADV_USERFAULT)

2) drop MADV_USERFAULT and VM_USERFAULT and force the usage of the
   userfaultfd protocol as the only way for userland to catch
   userfaults (each userfaultfd must already register itself into its
   own virtual memory ranges so it's a trivial change for userfaultfd
   users that deletes just 1 or 2 lines of userland code, but it would
   prevent to use the SIGBUS behavior with info->si_addr=faultaddr for
   other users)

3) keep

Re: [Qemu-devel] [PATCH v2 37/36] qdev: device_del: search for to be unplugged device in 'peripheral' container

2014-10-07 Thread Andreas Färber

Am 07.10.2014 um 14:10 schrieb Igor Mammedov:
> On Tue, 7 Oct 2014 19:59:51 +0800
> Zhu Guihua  wrote:
> 
>> On Thu, 2014-10-02 at 10:08 +, Igor Mammedov wrote:
>>> device_add puts every device with 'id' inside of 'peripheral'
>>> container using id's value as the last component name.
>>> Use it by replacing recursive search on sysbus with path
>>> lookup in 'peripheral' container, which could handle both
>>> BUS and BUS-less device cases.
>>>
>>
>> If I want to delete device without id inside of 'peripheral-anon'
>> container, the command 'device_del' does not work. 
>> My suggestion is deleting device by the last component name, is this
>> feasiable?
> So far device_del was designed to work only with id-ed devices.
> 
> What's a use-case for unplugging unnamed device from peripheral-anon?

I can think of use cases where you may want to balloon memory or CPUs.

But that seems orthogonal to this series.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

1 2 3 >

1 - 100 of 203 matches

Mail list logo