date:20151110

Re: [Qemu-devel] [PATCH v10 23/30] qapi: Check for qapi collisions of flat union branches

2015-11-10 Thread Markus Armbruster

Eric Blake  writes:

> On 11/09/2015 05:56 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> Right now, our ad hoc parser ensures that we cannot have a
>>> flat union that introduces any qapi member names that would
>>> conflict with the non-variant qapi members already present
>>> from the union's base type (see flat-union-clash-member.json).
>>> We want QAPISchemaObjectType.check() to make the same check,
>>> so we can later reduce some of the ad hoc checks.
>>>
>
>>> In general, a type used as a branch of a flat union cannot
>>> also be the base type of the flat union, so even though we are
>>> adding a call to variant.type.check() in order to populate
>>> variant.type.members, this is merely a case of gaining
>>> topological sorting of how types are visited (and type.check()
>>> is already set up to allow multiple calls due to base types).
>> 
>> Yes, a type cannot contain itself, neither as base nor as variant.
>> 
>> We have tests covering attempts to do the former
>> (struct-cycle-direct.json, struct-cycle-indirect.json).  As far as I can

Actually, these are just local, unpublished tests.  They both make
check_member_clash() recurse infinitely.

# Direct inheritance loop
# FIXME triggers infinite recursion
{ 'struct': 'Loopy', 'base': 'Loopy',
  'data': {} }

# we reject a loop in base classes
{ 'struct': 'Base1', 'base': 'Base2', 'data': {} }
{ 'struct': 'Base2', 'base': 'Base1', 'data': {} }

The latter is actually yours, proposed as base-cycle.json in
Subject: qapi: Detect collisions in C member names
Message-Id: <1442872682-6523-17-git-send-email-ebl...@redhat.com>

If I disable the recursive call, the cycle detection in
QAPISchemaObjectType.check() is reached, and works.

Completing the move of clash detection to check() methods should improve
things from "accidental infinite recursion" to "intentional assertion
failure", because it should get rid of check_member_clash() and should
not break the cycle detection.

Then we can turn the assertion into a proper error message, and add the
tests.

>> see, we don't have tests covering the latter.  Do we catch it?
>
> Yes, at least by virtue of the ad hoc tests: attempting to reuse a base
> type of the flat union as a variant member will cause the qapi members
> of the base type to appear more than once in the JSON object (that is,
> the checks that reject flat-union-clash-member.json would also reject
> this scenario). To test:
>
> diff --git i/tests/qapi-schema/qapi-schema-test.json
> w/tests/qapi-schema/qapi-schema-test.json
> index 44638da..16b2ffb 100644
> --- i/tests/qapi-schema/qapi-schema-test.json
> +++ w/tests/qapi-schema/qapi-schema-test.json
> @@ -67,7 +67,7 @@
>'discriminator': 'enum1',
>'data': { 'value1' : 'UserDefA',
>  'value2' : 'UserDefB',
> -'value3' : 'UserDefB' } }
> +'value3' : 'UserDefUnionBase' } }
>
>  { 'struct': 'UserDefUnionBase',
>'base': 'UserDefZero',
>
>   GEN   tests/test-qapi-types.h
> /home/eblake/qemu/tests/qapi-schema/qapi-schema-test.json:65: Member
> name 'string' of branch 'value3' clashes with base 'UserDefUnionBase'
> /home/eblake/qemu/tests/Makefile:415: recipe for target
> 'tests/test-qapi-types.h' failed
>
> But you have me curious if this collision is still caught when the ad
> hoc tests are gone.  If so, great; if not, I'll add a test here.  (I'll
> know later when I get through rebasing to all of your comments.)
>
>>> No change to generated code.
>>>
>>> Signed-off-by: Eric Blake 
>> 
>> Patch looks good.
>
> Yay; it's nice to see results after all our mental gymnastics over how
> collision testing should work.

Re: [Qemu-devel] [POC]colo-proxy in qemu

2015-11-10 Thread zhanghailiang


On 2015/11/10 15:35, Jason Wang wrote:



On 11/10/2015 01:26 PM, Tkid wrote:

Hi,all

We are planning to reimplement colo proxy in userspace (Here is in
qemu) to
cache and compare net packets.This module is one of the important
components
of COLO project and now it is still in early stage, so any comments and
feedback are warmly welcomed,thanks in advance.

## Background
COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
Service)
project is a high availability solution. Both Primary VM (PVM) and
Secondary VM
(SVM) run in parallel. They receive the same request from client, and
generate
responses in parallel too. If the response packets from PVM and SVM are
identical, they are released immediately. Otherwise, a VM checkpoint
(on demand)
is conducted.
Paper:
http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
COLO on Xen:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
COLO on Qemu/KVM:
http://wiki.qemu.org/Features/COLO

By the needs of capturing response packets from PVM and SVM and
finding out
whether they are identical, we introduce a new module to qemu
networking called
colo-proxy.

This document describes the design of the colo-proxy module

## Glossary
   PVM - Primary VM, which provides services to clients.
   SVM - Secondary VM, a hot standby and replication of PVM.
   PN - Primary Node, the host which PVM runs on
   SN - Secondary Node, the host which SVM runs on

## Our Idea ##

COLO-Proxy
COLO-Proxy is a part of COLO,based on qemu net filter and it's a
plugin for
qemu net filter.the function keep SVM connect normal to PVM and compare
PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.

== Workflow ==


+--+  +--+
|PN|  |SN|
+---+ +---+
| +---+ | | +---+ |
| |   | | | |   | |
| |PVM| | | |SVM| |
| |   | | | |   | |
| +--+-^--+ | | +-^++ |
|| || |   ||  |
|| | ++ | | +---+ ||  |
|| | |COLO| |(socket) | |COLO   | ||  |
|| | | CheckPoint +-> CheckPoint| ||  |
|| | || |  (6)| |   | ||  |
|| | +-^--+ | | +---+ ||  |
|| |   (5) || |   ||  |
|| |   || |   ||  |
| +--v-+--+ | Forward(socket) | +-+v+ |
| |COLO Proxy  |  +---+(1)+->seq&ack adjust(2)| | |
| |  +-+--+ | | +-+ | |
| |  | Compare(4) <---+(3)+-+ COLO Proxy| |
| +---+ | Forward(socket) | +---+ |
++Qemu+-+ ++Qemu+-+
| ^
| |
| |
   +v-++
   |   |
   |  Client   |
   |   |
   +---+




(1)When PN receive client packets,PN COLO-Proxy copy and forward
packets to
SN COLO-Proxy.
(2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
adjusted packets to SVM
(3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
COLO-Proxy.
(4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
compare PVM's packets data with SVM's packets data. If packets is
different, compare
module notify COLO CheckPoint module to do a checkpoint then send
PVM's packets to
client and drop SVM's packets, otherwise, just send PVM's packets to
client and
drop SVM's packets.
(5)notify COLO-Checkpoint module checkpoint is needed
(6)Do COLO-Checkpoint

### QEMU space TCP/IP stack(Based on SLIRP) ###
We need a QEMU space TCP/IP stack to help us to analysis packet. After
looking
into QEMU, we found that SLIRP

http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29

is a good choice for us. SLIRP proivdes a full TCP/IP stack within
QEMU, it can
help use to handle the packet written to/read from backend(tap) device
which is
just like a link layer(L2) packet.

### Packet enqueue and compare ###
Together with QEMU space TCP/IP stack, we enqueue all packets sent by
PVM and
SVM on Primary QEMU, and then compare the packet payload for each
connection.



Hi:

Just have the following questions in my mind (some has been raised in
the previous rounds of discussion without a conclusion):

- What's the plan for management layer? The setup seems complicated so
we could not simply depend on user to do each step. (And for security
reason, qemu was usually run as unprivileged user)


We will do most of the se

Re: [Qemu-devel] [PATCH v6 4/4] hmp: add monitor command to add/remove a child

2015-11-10 Thread Wen Congyang

On 11/09/2015 10:54 PM, Alberto Garcia wrote:
> On Fri 16 Oct 2015 10:57:46 AM CEST, Wen Congyang wrote:
> 
>> +.name   = "blockdev_change",
>> +.args_type  = "op:s,parent:B,child:B?,node:?",
>> +.params = "operation parent [child] [node]",
>   [...]
>> +/*
>> + * FIXME: we must specify the parameter child, otherwise,
>> + * we can't specify the parameter node.
>> + */
>> +if (op == CHANGE_OPERATION_ADD) {
>> +has_child = false;
>> +}
> 
> So if you want to perform the 'add' operation you must pass both 'child'
> and 'node' but the former will be discarded.
> 
> I don't think you really need to do this for the HMP interface, but it's
> anyway one more good reason to merge 'child' and 'node'.

Do you mean there is no need to implement the HMP interface?

Thanks
Wen Congyang

> 
> Berto
> .
>

[Qemu-devel] [PATCH v3 0/2] mirror: Improve zero write and discard

2015-11-10 Thread Fam Zheng

The first patch adds a lock between bdrv_set_dirty{,_bitmap} and non-atomic
(coroutine) readers,

The second patch makes use of it and fixes mirror thin writing.

Fam Zheng (2):
  block: Introduce coroutine lock to dirty bitmap
  mirror: Improve zero-write and discard with fragmented image

 block.c   |  26 ++--
 block/mirror.c| 160 --
 include/block/block.h |   6 +-
 include/block/block_int.h |   4 +-
 4 files changed, 156 insertions(+), 40 deletions(-)

-- 
2.4.3

[Qemu-devel] [PATCH v3 1/2] block: Introduce coroutine lock to dirty bitmap

2015-11-10 Thread Fam Zheng

Typically, what a dirty bit consumer does is 1) get the next dirty
sectors; 2) do something with the sectors; 3) clear the dirty bits; 4)
goto 1). This works as long as 2) is simple and atomic in the coroutine
sense.  Anything sophisticated requires either moving 3) before 2) or
using locks, because the dirty bits may get cleared in the middle when
the coroutine yield.

This will be the case for mirror.c in following patches, so introduce
CoMutex in BdrvDirtyBitmap to allowing blocking the producer.

Also mark all involved dirty bitmap functions as coroutine_fn.

Signed-off-by: Fam Zheng 
---
 block.c   | 26 +-
 include/block/block.h |  6 --
 include/block/block_int.h |  4 +++-
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index e9f40dc..34a4109 100644
--- a/block.c
+++ b/block.c
@@ -69,6 +69,7 @@ struct BdrvDirtyBitmap {
 int64_t size;   /* Size of the bitmap (Number of sectors) */
 bool disabled;  /* Bitmap is read-only */
 QLIST_ENTRY(BdrvDirtyBitmap) list;
+CoMutex lock;
 };
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
@@ -3173,6 +3174,7 @@ BdrvDirtyBitmap 
*bdrv_create_dirty_bitmap(BlockDriverState *bs,
 bitmap->size = bitmap_size;
 bitmap->name = g_strdup(name);
 bitmap->disabled = false;
+qemu_co_mutex_init(&bitmap->lock);
 QLIST_INSERT_HEAD(&bs->dirty_bitmaps, bitmap, list);
 return bitmap;
 }
@@ -3385,11 +3387,24 @@ void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, 
HBitmapIter *hbi)
 hbitmap_iter_init(hbi, bitmap->bitmap, 0);
 }
 
-void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
-   int64_t cur_sector, int nr_sectors)
+void coroutine_fn bdrv_lock_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+qemu_co_mutex_lock(&bitmap->lock);
+}
+
+void coroutine_fn bdrv_unlock_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+qemu_co_mutex_unlock(&bitmap->lock);
+}
+
+void coroutine_fn bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+int64_t cur_sector,
+int nr_sectors)
 {
 assert(bdrv_dirty_bitmap_enabled(bitmap));
+bdrv_lock_dirty_bitmap(bitmap);
 hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+bdrv_unlock_dirty_bitmap(bitmap);
 }
 
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
@@ -3405,15 +3420,16 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 hbitmap_reset_all(bitmap->bitmap);
 }
 
-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
-int nr_sectors)
+void coroutine_fn bdrv_set_dirty(BlockDriverState *bs,
+ int64_t cur_sector,
+ int nr_sectors)
 {
 BdrvDirtyBitmap *bitmap;
 QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
 if (!bdrv_dirty_bitmap_enabled(bitmap)) {
 continue;
 }
-hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+bdrv_set_dirty_bitmap(bitmap, cur_sector, nr_sectors);
 }
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 610db92..592f317 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -488,9 +488,11 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap);
 bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
 DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
+void coroutine_fn bdrv_lock_dirty_bitmap(BdrvDirtyBitmap *bitmap);
+void coroutine_fn bdrv_unlock_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t 
sector);
-void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
-   int64_t cur_sector, int nr_sectors);
+void coroutine_fn bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+int64_t cur_sector, int nr_sectors);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t cur_sector, int nr_sectors);
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3ceeb5a..e17712c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -672,7 +672,9 @@ bool blk_dev_is_tray_open(BlockBackend *blk);
 bool blk_dev_is_medium_locked(BlockBackend *blk);
 void blk_dev_resize_cb(BlockBackend *blk);
 
-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
+void coroutine_fn bdrv_set_dirty(BlockDriverState *bs,
+ int64_t cur_sector,
+ int nr_sectors);
 bool bdrv_requests_pending(BlockDriverState *bs);
 
 #endif /* BLOCK_INT_H */
-- 
2.4.3

[Qemu-devel] [PATCH v3 2/2] mirror: Improve zero-write and discard with fragmented image

2015-11-10 Thread Fam Zheng

The "pnum < nb_sectors" condition in deciding whether to actually copy
data is unnecessarily strict, and the qiov initialization is
unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
branches.

Reorganize mirror_iteration flow so that we:

1) Find the contiguous zero/discarded sectors with
bdrv_get_block_status_above() before deciding what to do. We query
s->buf_size sized blocks at a time.

2) If the sectors in question are zeroed/discarded and aligned to
target cluster, issue zero write or discard accordingly. It's done
in mirror_do_zero_or_discard, where we don't add buffer to qiov.

3) Otherwise, do the same loop as before in mirror_do_read.

Signed-off-by: Fam Zheng 
---
 block/mirror.c | 160 +
 1 file changed, 128 insertions(+), 32 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b1252a1..ade0412 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -157,23 +157,13 @@ static void mirror_read_complete(void *opaque, int ret)
 mirror_write_complete, op);
 }
 
-static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
+static uint64_t mirror_do_read(MirrorBlockJob *s)
 {
 BlockDriverState *source = s->common.bs;
-int nb_sectors, sectors_per_chunk, nb_chunks;
-int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
+int sectors_per_chunk, nb_sectors, nb_chunks;
+int64_t end, next_chunk, next_sector, hbitmap_next_sector, sector_num;
 uint64_t delay_ns = 0;
 MirrorOp *op;
-int pnum;
-int64_t ret;
-
-s->sector_num = hbitmap_iter_next(&s->hbi);
-if (s->sector_num < 0) {
-bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
-s->sector_num = hbitmap_iter_next(&s->hbi);
-trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
-assert(s->sector_num >= 0);
-}
 
 hbitmap_next_sector = s->sector_num;
 sector_num = s->sector_num;
@@ -198,14 +188,6 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 next_sector = sector_num;
 next_chunk = sector_num / sectors_per_chunk;
 
-/* Wait for I/O to this cluster (from a previous iteration) to be done.  */
-while (test_bit(next_chunk, s->in_flight_bitmap)) {
-trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
-s->waiting_for_io = true;
-qemu_coroutine_yield();
-s->waiting_for_io = false;
-}
-
 do {
 int added_sectors, added_chunks;
 
@@ -301,24 +283,138 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 s->sectors_in_flight += nb_sectors;
 trace_mirror_one_iteration(s, sector_num, nb_sectors);
 
-ret = bdrv_get_block_status_above(source, NULL, sector_num,
-  nb_sectors, &pnum);
-if (ret < 0 || pnum < nb_sectors ||
-(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
-bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
-   mirror_read_complete, op);
-} else if (ret & BDRV_BLOCK_ZERO) {
+bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
+   mirror_read_complete, op);
+return delay_ns;
+}
+
+static uint64_t mirror_do_zero_or_discard(MirrorBlockJob *s,
+  int64_t sector_num,
+  int nb_sectors,
+  bool is_discard)
+{
+int sectors_per_chunk, nb_chunks;
+int64_t next_chunk, next_sector, hbitmap_next_sector;
+uint64_t delay_ns = 0;
+MirrorOp *op;
+
+sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+assert(nb_sectors >= sectors_per_chunk);
+next_chunk = sector_num / sectors_per_chunk;
+nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
+bitmap_set(s->in_flight_bitmap, next_chunk, nb_chunks);
+delay_ns = ratelimit_calculate_delay(&s->limit, nb_sectors);
+
+/* Allocate a MirrorOp that is used as an AIO callback. The qiov is zeroed
+ * out so the freeing in iteration is nop. */
+op = g_new0(MirrorOp, 1);
+op->s = s;
+op->sector_num = sector_num;
+op->nb_sectors = nb_sectors;
+
+/* Advance the HBitmapIter in parallel, so that we do not examine
+ * the same sector twice.
+ */
+hbitmap_next_sector = sector_num;
+next_sector = sector_num + nb_sectors;
+while (next_sector > hbitmap_next_sector) {
+hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
+if (hbitmap_next_sector < 0) {
+break;
+}
+}
+
+bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num, nb_sectors);
+s->in_flight++;
+s->sectors_in_flight += nb_sectors;
+if (is_discard) {
+bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
+ mirror_write_complete, op);
+} else {
 bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,

Re: [Qemu-devel] [PATCH for-2.5] hw/timer/hpet.c: Avoid signed integer overflow which results in bugs on OSX

2015-11-10 Thread Laszlo Ersek

On 11/09/15 23:25, Laszlo Ersek wrote:
> On 11/09/15 15:56, Peter Maydell wrote:
>> Signed integer overflow in C is undefined behaviour, and the compiler
>> is at liberty to assume it can never happen and optimize accordingly.
>> In particular, the subtractions in hpet_time_after() and hpet_time_after64()
>> were causing OSX clang to optimize the code such that it was prone to
>> hangs and complaints about the main loop stalling (presumably because
>> we were spending all our time trying to service very high frequency
>> HPET timer callbacks). The clang sanitizer confirms the UB:
>>
>> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 
>> - 2147003978 cannot be represented in type 'int'
>>
>> Fix this by doing the subtraction as an unsigned operation and then
>> converting to signed for the comparison.
>>
>> Reported-by: Aaron Elkins 
>> Signed-off-by: Peter Maydell 
>> ---
>>  hw/timer/hpet.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
>> index 3037bef..7f0391c 100644
>> --- a/hw/timer/hpet.c
>> +++ b/hw/timer/hpet.c
>> @@ -116,12 +116,12 @@ static uint32_t timer_enabled(HPETTimer *t)
>>  
>>  static uint32_t hpet_time_after(uint64_t a, uint64_t b)
>>  {
>> -return ((int32_t)(b) - (int32_t)(a) < 0);
>> +return ((int32_t)(b - a) < 0);
>>  }
>>  
>>  static uint32_t hpet_time_after64(uint64_t a, uint64_t b)
>>  {
>> -return ((int64_t)(b) - (int64_t)(a) < 0);
>> +return ((int64_t)(b - a) < 0);
>>  }
>>  
>>  static uint64_t ticks_to_ns(uint64_t value)
>>
> 
> I'm late to the discussion, but I cannot imagine what would speak against:
> 
> return (b < a);
> 
> The post-patch code still converts a uint64_t difference to int32_t.
> According to the C standard(s), such a conversion (i.e., when the
> integer value being converted doesn't fit in the target signed integer)
> results in an implementation-defined value, or an implementation-defined
> signal is raised.
> 
> On our platforms, the impl-def value is determined by "truncate to 32
> bits, then reinterpret the bit pattern as two's complement signed
> int32_t". Meaning, if:
> 
> (b > a) && ((b - a) & (1u << 31))
> 
> (that is, "b" is so much larger than "a" that bit#31 is set in the (b-a)
> difference), then hpet_time_after() will now incorrectly return 1.
> (Because bit#31 will be interpreted as the sign bit, turned on.)
> 
> Again, what speaks against
> 
> return (b < a);
> 
> ?
> 
> (The pre-patch code dates back to commit 16b29ae1 (year 2008), which
> offers precious little justification for the formula.)

An hour or so after sending this email, I think I got an idea about the
code's intent. (Knowing practically nothing about HPET.) I guess the
HPET provides counters that can wrap around, so if you don't look
frequently enough, you won't know if the value is actually smaller or
greater (because you can't use raw magnitude to tell that).

So I *guess* this code implemented the following idea: assume you have a
"last value", and a reading (?) from "just a bit later". You take the
neighborhood (with radius 2^31, or 2^63) of the "last value", and if the
new reading falls into the upper half of that neighborhood, you say "the
value has grown".

This idea is actually very well suited for uintN_t modular arithmetic,
because the (x - y) difference expresses the number of times you have to
increment y to make it fall into the same remainder class as x, modulo 2^N.

Hence, ((x - y) < 2^(N-1)) expresses "x is later than or equal to y"
(with both x and y being uintN_t variables). Equivalently, we have ((x -
y) >= 2^(N-1)) meaning "x is strictly earlier than y", which can also be
said as "y is strictly after x".

And I think that's exactly what these functions implement:

- Their names say "time after".

- The condition

  (x - y) >= 2^(N-1)

  tests exactly whether the most significant bit is set in the
  difference.

  When the bit pattern of the difference is reinterpreted as intN_t,
  that in turn means

  (intN_t)(x - y) < 0

So the functions seem to check if "a is strictly after b".

... The call sites seem to confirm this:

if (t->config & HPET_TN_32BIT) {
while (hpet_time_after(cur_tick, t->cmp)) {
t->cmp = (uint32_t)(t->cmp + t->period);
}
} else {
while (hpet_time_after64(cur_tick, t->cmp)) {
t->cmp += period;
}
}

The loops increment "t->cmp" as long as "cur_tick is strictly after
t->cmp"; in other words, the loops make "t->cmp" catch up with "cur_tick".

... I think the functions are right after all, it's just that the
following would have matched my personal taste more:

  b - a >= 1u << 31

and

  b - a >= 1ull << 63

(Because they don't have any impl-def parts in them, plus to me they
make the intent, with the modular arithmetic and the "neighborhoods",
clearer.)

I guess for others it's the opposite... :)

Cheers
Laszlo

Re: [Qemu-devel] [v2 RESEND 2/2] configure: add options to config avx2

2015-11-10 Thread Juan Quintela

Liang Li  wrote:
> Add the '--enable-avx2' & '--disable-avx2' option so as to config
> the AVX2 instruction optimization.
>
> By default, avx2 optimization is enabled, if '--disable-avx2' is not
> set, configure will detect if the compiler can support AVX2 option,
> if yes, AVX2 optimization is eabled, else disabled.
>
> Signed-off-by: Liang Li 
> ---
>  configure | 29 +
>  1 file changed, 29 insertions(+)
>
> diff --git a/configure b/configure
> index 42e57c0..4d81be2 100755
> --- a/configure
> +++ b/configure
> @@ -310,6 +310,7 @@ smartcard=""
>  libusb=""
>  usb_redir=""
>  opengl=""
> +avx2="yes"
>  zlib="yes"
>  lzo=""
>  snappy=""
> @@ -1057,6 +1058,10 @@ for opt do
>;;
>--enable-usb-redir) usb_redir="yes"
>;;
> +  --disable-avx2) avx2="no"
> +  ;;
> +  --enable-avx2) avx2="yes"
> +  ;;
>--disable-zlib-test) zlib="no"
>;;
>--disable-lzo) lzo="no"
> @@ -1373,6 +1378,7 @@ disabled with --disable-FEATURE, default is enabled if 
> available:
>smartcard   smartcard support (libcacard)
>libusb  libusb (for usb passthrough)
>usb-redir   usb network redirection support
> +  avx2support of avx2 instruction
>lzo support of lzo compression library
>snappy  support of snappy compression library
>bzip2   support of bzip2 compression library
> @@ -1809,6 +1815,24 @@ EOF
>fi
>  fi
>  
> +
> +# avx2 check
> +
> +if test "$avx2" != "no" ; then
> +cat > $TMPC << EOF
> +int main(void) { return 0; }
> +EOF
> +if compile_prog "" "-mavx2" ; then
> +avx2="yes"
> +else
> +avx2="no"

the else bit shouldn't be:

  if test "$avx2" = "yes"; then
  feature_not_found "avx2" "Your compiler don't support avx2"
  fi
  avx=2="no"

??
> +fi
> +fi
> +
> +if test "$avx2" = "yes" ; then
> +avx2_cflags=" -mavx2"
> +fi
> +
>  ##
>  # zlib check
>  
> @@ -4782,6 +4806,7 @@ echo "libssh2 support   $libssh2"
>  echo "TPM passthrough   $tpm_passthrough"
>  echo "QOM debugging $qom_cast_debug"
>  echo "vhdx  $vhdx"
> +echo "avx2 support  $avx2"
>  echo "lzo support   $lzo"
>  echo "snappy support$snappy"
>  echo "bzip2 support $bzip2"
> @@ -5166,6 +5191,10 @@ if test "$opengl" = "yes" ; then
>echo "OPENGL_LIBS=$opengl_libs" >> $config_host_mak
>  fi
>  
> +if test "$avx2" = "yes" ; then
> +  echo "AVX2_CFLAGS=$avx2_cflags" >> $config_host_mak
> +fi
> +
>  if test "$lzo" = "yes" ; then
>echo "CONFIG_LZO=y" >> $config_host_mak
>  fi

Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image

2015-11-10 Thread Paolo Bonzini



On 10/11/2015 07:14, Fam Zheng wrote:
> On Mon, 11/09 17:29, Kevin Wolf wrote:
>> Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
>>>
>>>
>>> On 09/11/2015 17:04, Kevin Wolf wrote:
 Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
> The "pnum < nb_sectors" condition in deciding whether to actually copy
> data is unnecessarily strict, and the qiov initialization is
> unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> branches.
>
> Reorganize mirror_iteration flow so that we:
>
> 1) Find the contiguous zero/discarded sectors with
> bdrv_get_block_status_above() before deciding what to do. We query
> s->buf_size sized blocks at a time.
>
> 2) If the sectors in question are zeroed/discarded and aligned to
> target cluster, issue zero write or discard accordingly. It's done
> in mirror_do_zero_or_discard, where we don't add buffer to qiov.
>
> 3) Otherwise, do the same loop as before in mirror_do_read.
>
> Signed-off-by: Fam Zheng 

 I'm not sure where in the patch to comment on this, so I'll just do it
 here right in the beginning.

 I'm concerned that we need to be more careful about races in this patch,
 in particular regarding the bitmaps. I think the conditions for the two
 bitmaps are:

 * Dirty bitmap: We must clear the bit after finding the next piece of
   data to be mirrored, but before we yield after getting information
   that we use for the decision which kind of operation we need.

   In other words, we need to clear the dirty bitmap bit before calling
   bdrv_get_block_status_above(), because that's both the function that
   retrieves information about the next chunk and also a function that
   can yield.

   If after this point the data is written to, we need to mirror it
   again.
>>>
>>> With Fam's patch, that's not trivial for two reasons:
>>>
>>> 1) bdrv_get_block_status_above() can return a smaller amount than what
>>> is asked.
>>>
>>> 2) the "read and write" case can handle s->granularity sectors per
>>> iteration (many of them can be coalesced, but still that's how the
>>> iteration works).
>>>
>>> The simplest solution is to perform the query with s->granularity size
>>> rather than s->buf_size.
>>
>> Then we end up with many small operations, that's not what we want.
>>
>> Why can't we mark up to s->buf_size dirty clusters as clean first, then
>> query the status, and mark all of those that we can't handle dirty
>> again?
> 
> Then we may end up marking more clusters as dirty than it should be.

You're both right.

> Because all bdrv_set_dirty() and bdrv_set_dirty_bitmap() callers are 
> coroutine,
> we can introduce a CoMutex to let bitmap reader block bdrv_set_dirty and
> bdrv_set_dirty_bitmap.

I think this is not necessary.

I think the following is safe:

1) before calling bdrv_get_block_status_above(), find out how many
consecutive bits in the dirty bitmap are 1

2) zero all those bits in the dirty bitmap

3) call bdrv_get_block_status_above() with a size equivalent to the
number of dirty bits

4) if bdrv_get_block_status_above() only returns a partial result, loop
step (3) until all the dirty bits are processed

For full mirroring, this strategy will probably make the first
incremental iteration more expensive.

Paolo

Re: [Qemu-devel] [PATCH 0/7] int128: reparing broken 128 bit memory calculations

2015-11-10 Thread Pierre Morel




On 11/09/2015 01:20 PM, Paolo Bonzini wrote:


On 09/11/2015 13:01, Pierre Morel wrote:

This leads to have UINT64_MAX represented with {1, 0} instead of
{0, UINT64_MAX} while {1, 0} is 2^64. This again leads to have
unnecessary and obfuscating transformations with int128_2_64() to
test for UINT64_MAX and return {1,0} in memory_region_init()
while using inverse translation test{1,0} and return UINT64_MAX
in memory_region_size()>>

Yes, the use of UINT64_MAX for 2^64 is a hack, but it is unrelated to
the signedness of Int128.

OK, we agree it is a hack,
but sorry, I should have missed something,
because I do not understand what this hack is useful for.

It's used in the size argument of memory_region_init*, so that it can
remain an uint64_t.  The size is usually small (up to 2^40, say) unless
it is 2^64 meaning "the whole address space".  The latter case is
covered by UINT64_MAX.

Paolo



OK, I understand, thanks for having taking time for me.

To sum-up size is a size :-) and not an offset in memory.

Size of UINT64_MAX does not exist but we can live without it, having
a description for "whole address space", 2^64, can be useful.

Even there may be other solutions like taking 0 for 2^64,
if a memory size of 0 has no meaning,
but it could be misleading too.

So I do not see better solution for this interesting problematic.

[Qemu-devel] [PATCH 0/3] block/gluster: add support for multiple gluster servers

2015-11-10 Thread Prasanna Kumar Kalever

This release is rebased on qemu master branch.
In this series of patches 1/3 and 2/3 are unchanged.

Prasanna Kumar Kalever (3):
  block/gluster: rename [server, volname, image] -> [host, volume, path]
  block/gluster: code cleanup
  block/gluster: add support for multiple gluster servers

 block/gluster.c  | 597 ---
 qapi/block-core.json |  60 +-
 2 files changed, 529 insertions(+), 128 deletions(-)

-- 
2.1.0

[Qemu-devel] [PATCH v2 1/3] block/gluster: rename [server, volname, image] -> [host, volume, path]

2015-11-10 Thread Prasanna Kumar Kalever

this patch is very much be meaningful after next patch which adds multiple
gluster servers support. After that,

an example is, in  'servers' tuple values we use 'server' variable for key
'host' in the code, it will be quite messy to have colliding names for
variables, so to maintain better readability and makes it consistent with other
existing code as well as the input keys/options, this patch renames the
following variables
'server'  -> 'host'
'image'   -> 'path'
'volname' -> 'volume'

Signed-off-by: Prasanna Kumar Kalever 
---
 block/gluster.c | 54 +++---
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 1eb3a8c..513a774 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -25,19 +25,19 @@ typedef struct BDRVGlusterState {
 } BDRVGlusterState;
 
 typedef struct GlusterConf {
-char *server;
+char *host;
 int port;
-char *volname;
-char *image;
+char *volume;
+char *path;
 char *transport;
 } GlusterConf;
 
 static void qemu_gluster_gconf_free(GlusterConf *gconf)
 {
 if (gconf) {
-g_free(gconf->server);
-g_free(gconf->volname);
-g_free(gconf->image);
+g_free(gconf->host);
+g_free(gconf->volume);
+g_free(gconf->path);
 g_free(gconf->transport);
 g_free(gconf);
 }
@@ -57,19 +57,19 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
 if (*p == '\0') {
 return -EINVAL;
 }
-gconf->volname = g_strndup(q, p - q);
+gconf->volume = g_strndup(q, p - q);
 
-/* image */
+/* path */
 p += strspn(p, "/");
 if (*p == '\0') {
 return -EINVAL;
 }
-gconf->image = g_strdup(p);
+gconf->path = g_strdup(p);
 return 0;
 }
 
 /*
- * file=gluster[+transport]://[server[:port]]/volname/image[?socket=...]
+ * file=gluster[+transport]://[host[:port]]/volume/path[?socket=...]
  *
  * 'gluster' is the protocol.
  *
@@ -78,10 +78,10 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
  * tcp, unix and rdma. If a transport type isn't specified, then tcp
  * type is assumed.
  *
- * 'server' specifies the server where the volume file specification for
+ * 'host' specifies the host where the volume file specification for
  * the given volume resides. This can be either hostname, ipv4 address
  * or ipv6 address. ipv6 address needs to be within square brackets [ ].
- * If transport type is 'unix', then 'server' field should not be specified.
+ * If transport type is 'unix', then 'host' field should not be specified.
  * The 'socket' field needs to be populated with the path to unix domain
  * socket.
  *
@@ -90,9 +90,9 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
  * default port. If the transport type is unix, then 'port' should not be
  * specified.
  *
- * 'volname' is the name of the gluster volume which contains the VM image.
+ * 'volume' is the name of the gluster volume which contains the VM image.
  *
- * 'image' is the path to the actual VM image that resides on gluster volume.
+ * 'path' is the path to the actual VM image that resides on gluster volume.
  *
  * Examples:
  *
@@ -101,7 +101,7 @@ static int parse_volume_options(GlusterConf *gconf, char 
*path)
  * file=gluster+tcp://1.2.3.4:24007/testvol/dir/a.img
  * file=gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img
  * file=gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img
- * file=gluster+tcp://server.domain.com:24007/testvol/dir/a.img
+ * file=gluster+tcp://host.domain.com:24007/testvol/dir/a.img
  * file=gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
  * file=gluster+rdma://1.2.3.4:24007/testvol/a.img
  */
@@ -152,9 +152,9 @@ static int qemu_gluster_parseuri(GlusterConf *gconf, const 
char *filename)
 ret = -EINVAL;
 goto out;
 }
-gconf->server = g_strdup(qp->p[0].value);
+gconf->host = g_strdup(qp->p[0].value);
 } else {
-gconf->server = g_strdup(uri->server ? uri->server : "localhost");
+gconf->host = g_strdup(uri->server ? uri->server : "localhost");
 gconf->port = uri->port;
 }
 
@@ -175,18 +175,18 @@ static struct glfs *qemu_gluster_init(GlusterConf *gconf, 
const char *filename,
 
 ret = qemu_gluster_parseuri(gconf, filename);
 if (ret < 0) {
-error_setg(errp, "Usage: file=gluster[+transport]://[server[:port]]/"
-   "volname/image[?socket=...]");
+error_setg(errp, "Usage: file=gluster[+transport]://[host[:port]]/"
+   "volume/path[?socket=...]");
 errno = -ret;
 goto out;
 }
 
-glfs = glfs_new(gconf->volname);
+glfs = glfs_new(gconf->volume);
 if (!glfs) {
 goto out;
 }
 
-ret = glfs_set_volfile_server(glfs, gconf->transport, gconf->server,
+ret = glfs_set_volfile_server(glfs, gconf->transport, gconf->host,
 gconf->port);
 i

[Qemu-devel] [PATCH v2 2/3] block/gluster: code cleanup

2015-11-10 Thread Prasanna Kumar Kalever

unified coding styles of multiline function arguments and other error functions
moved random declarations of structures and other list variables

Signed-off-by: Prasanna Kumar Kalever 
---
 block/gluster.c | 113 ++--
 1 file changed, 60 insertions(+), 53 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 513a774..ededda2 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -24,6 +24,11 @@ typedef struct BDRVGlusterState {
 struct glfs_fd *fd;
 } BDRVGlusterState;
 
+typedef struct BDRVGlusterReopenState {
+struct glfs *glfs;
+struct glfs_fd *fd;
+} BDRVGlusterReopenState;
+
 typedef struct GlusterConf {
 char *host;
 int port;
@@ -32,6 +37,39 @@ typedef struct GlusterConf {
 char *transport;
 } GlusterConf;
 
+
+static QemuOptsList qemu_gluster_create_opts = {
+.name = "qemu-gluster-create-opts",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
+.desc = {
+{
+.name = BLOCK_OPT_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Virtual disk size"
+},
+{
+.name = BLOCK_OPT_PREALLOC,
+.type = QEMU_OPT_STRING,
+.help = "Preallocation mode (allowed values: off, full)"
+},
+{ /* end of list */ }
+}
+};
+
+static QemuOptsList runtime_opts = {
+.name = "gluster",
+.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+.desc = {
+{
+.name = "filename",
+.type = QEMU_OPT_STRING,
+.help = "URL to the gluster image",
+},
+{ /* end of list */ }
+},
+};
+
+
 static void qemu_gluster_gconf_free(GlusterConf *gconf)
 {
 if (gconf) {
@@ -176,7 +214,7 @@ static struct glfs *qemu_gluster_init(GlusterConf *gconf, 
const char *filename,
 ret = qemu_gluster_parseuri(gconf, filename);
 if (ret < 0) {
 error_setg(errp, "Usage: file=gluster[+transport]://[host[:port]]/"
-   "volume/path[?socket=...]");
+ "volume/path[?socket=...]");
 errno = -ret;
 goto out;
 }
@@ -254,20 +292,6 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, 
ssize_t ret, void *arg)
 qemu_bh_schedule(acb->bh);
 }
 
-/* TODO Convert to fine grained options */
-static QemuOptsList runtime_opts = {
-.name = "gluster",
-.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
-.desc = {
-{
-.name = "filename",
-.type = QEMU_OPT_STRING,
-.help = "URL to the gluster image",
-},
-{ /* end of list */ }
-},
-};
-
 static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
 {
 assert(open_flags != NULL);
@@ -285,7 +309,7 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int 
*open_flags)
 }
 }
 
-static int qemu_gluster_open(BlockDriverState *bs,  QDict *options,
+static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
  int bdrv_flags, Error **errp)
 {
 BDRVGlusterState *s = bs->opaque;
@@ -334,12 +358,6 @@ out:
 return ret;
 }
 
-typedef struct BDRVGlusterReopenState {
-struct glfs *glfs;
-struct glfs_fd *fd;
-} BDRVGlusterReopenState;
-
-
 static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
BlockReopenQueue *queue, Error **errp)
 {
@@ -426,7 +444,9 @@ static void qemu_gluster_reopen_abort(BDRVReopenState 
*state)
 
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 static coroutine_fn int qemu_gluster_co_write_zeroes(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
+ int64_t sector_num,
+ int nb_sectors,
+ BdrvRequestFlags flags)
 {
 int ret;
 GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
@@ -459,7 +479,7 @@ static inline bool gluster_supports_zerofill(void)
 }
 
 static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset,
-int64_t size)
+int64_t size)
 {
 return glfs_zerofill(fd, offset, size);
 }
@@ -471,7 +491,7 @@ static inline bool gluster_supports_zerofill(void)
 }
 
 static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset,
-int64_t size)
+int64_t size)
 {
 return 0;
 }
@@ -500,19 +520,17 @@ static int qemu_gluster_create(const char *filename,
 tmp = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
 if (!tmp || !strcmp(tmp, "off")) {
 prealloc = 0;
-} else if (!strcmp(tmp, "full") &&
-   gluster_supports_zerofill()) {
+} else if (!strcmp(tmp, "full") && gluster_supports_zerofill()) {
 prealloc = 1;
 } else {
 error_setg(errp, "Invalid preallocation mode: '%s'"
-" or GlusterFS doesn't support zerof

[Qemu-devel] [PATCH v13 3/3] block/gluster: add support for multiple gluster servers

2015-11-10 Thread Prasanna Kumar Kalever

This patch adds a way to specify multiple volfile servers to the gluster
block backend of QEMU with tcp|rdma transport types and their port numbers.

Problem:

Currently VM Image on gluster volume is specified like this:

file=gluster[+tcp]://host[:port]/testvol/a.img

Assuming we have three hosts in trusted pool with replica 3 volume
in action and unfortunately host (mentioned in the command above) went down
for some reason, since the volume is replica 3 we now have other 2 hosts
active from which we can boot the VM.

But currently there is no mechanism to pass the other 2 gluster host
addresses to qemu.

Solution:

New way of specifying VM Image on gluster volume with volfile servers:
(We still support old syntax to maintain backward compatibility)

Basic command line syntax looks like:

Pattern I:
 -drive driver=gluster,
volume=testvol,path=/path/a.raw,
servers.0.host=1.2.3.4,
   [servers.0.port=24007,]
   [servers.0.transport=tcp,]
servers.1.host=5.6.7.8,
   [servers.1.port=24008,]
   [servers.1.transport=rdma,] ...

Pattern II:
 'json:{"driver":"qcow2","file":{"driver":"gluster",
   "volume":"testvol","path":"/path/a.qcow2",
   "servers":[{tuple0},{tuple1}, ...{tupleN}]}}'

   driver  => 'gluster' (protocol name)
   volume  => name of gluster volume where our VM image resides
   path=> absolute path of image in gluster volume

  {tuple}  => {"host":"1.2.3.4"[,"port":"24007","transport":"tcp"]}

   host=> host address (hostname/ipv4/ipv6 addresses)
   port=> port number on which glusterd is listening. (default 24007)
   transport   => transport type used to connect to gluster management daemon,
   it can be tcp|rdma (default 'tcp')

Examples:
1.
 -drive driver=qcow2,file.driver=gluster,
file.volume=testvol,file.path=/path/a.qcow2,
file.servers.0.host=1.2.3.4,
file.servers.0.port=24007,
file.servers.0.transport=tcp,
file.servers.1.host=5.6.7.8,
file.servers.1.port=24008,
file.servers.1.transport=rdma
2.
 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
 "path":"/path/a.qcow2","servers":
 [{"host":"1.2.3.4","port":"24007","transport":"tcp"},
  {"host":"4.5.6.7","port":"24008","transport":"rdma"}] } }'

This patch gives a mechanism to provide all the server addresses, which are in
replica set, so in case host1 is down VM can still boot from any of the
active hosts.

This is equivalent to the backup-volfile-servers option supported by
mount.glusterfs (FUSE way of mounting gluster volume)

Credits: Sincere thanks to Kevin Wolf  and
"Deepak C Shetty"  for inputs and all their support

Signed-off-by: Prasanna Kumar Kalever 
---
v1:
multiple host addresses but common port number and transport type
pattern: URI syntax with query (?) delimitor
syntax:
file=gluster[+transport-type]://host1:24007/testvol/a.img\
 ?servers=host2&servers=host3

v2:
multiple host addresses each have their own port number, but all use
 common transport type
pattern: URI syntax  with query (?) delimiter
syntax:
file=gluster[+transport-type]://[host[:port]]/testvol/a.img\
 [?servers=host1[:port]\
  &servers=host2[:port]]

v3:
multiple host addresses each have their own port number and transport type
pattern: changed to json
syntax:
'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
   "path":"/path/a.qcow2","servers":
 [{"host":"1.2.3.4","port":"24007","transport":"tcp"},
  {"host":"4.5.6.7","port":"24008","transport":"rdma"}] } }'

v4, v5:
address comments from "Eric Blake" 
renamed:
'backup-volfile-servers' -> 'volfile-servers'

v6:
address comments from Peter Krempa 
renamed:
 'volname'->  'volume'
 'image-path' ->  'path'
 'server' ->  'host'

v7:
fix for v6 (initialize num_servers to 1 and other typos)

v8:
split patch set v7 into series of 3 as per Peter Krempa 
review comments

v9:
reorder the series of patches addressing "Eric Blake" 
review comments

v10:
fix mem-leak as per Peter Krempa  review comments

v11:
using qapi-types* defined structures as per "Eric Blake" 
review comments.

v12:
fix crash caused in qapi_free_BlockdevOptionsGluster

v13:
address comments from "Jeff Cody" 
---
 block/gluster.c  | 468 ---
 qapi/block-core.json |  60 ++-
 2 files changed, 461 insertions(+), 67 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index ededda2..8939072 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -11,6 +11,19 @@
 #include "block/block_int.h"
 #include "qemu/uri.h"
 
+#define GLUSTER_OPT_FILENAME"filename"
+#define GLUSTER_OPT_VOLUME  "volume"
+#define GLUSTER_OPT_PATH"path"
+#define GLUSTER_OPT_HOST"host"
+#define GLUSTER_OPT_PORT"port"
+#define GLUSTER_OPT_TRA

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Juan Quintela

"Li, Liang Z"  wrote:
>> Rather than trying to cater to multiple assembly instruction implementations
>> ourselves, have you tried taking the ideas in this earlier thread?
>> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05298.html
>> 
>> Ideally, libc's memcmp() will already be using the most efficient assembly
>> instructions without us having to reproduce the work of picking the 
>> instructions
>> that work best.
>> 
>
> Eric, thanks for you information. I didn't notice that discussion before.
>
>
> I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo 
> length'
> then write a test program to check a large amount of zero pages, and
> use the 'time' to
> recode the time takes by different optimization. Test result is like this:
>
> SSE2:
> --
>   |test 1 | test 2
> 
> Time(S):|   13.696| 13.533  
> 
>
>
> AVX2:
> ---
>   |test 1 | test 2
> ---
> Time (S):|  10.583  |  10.306
> ---
>
> memeqzero4_paolo:
> ---
>   |test 1 | test 2
> ---
> Time (S):|  9.718 |  9.817
> 
>
>
> Paolo's implementation has the best performance. It seems that we can
> remove the SSE2 related Intrinsics.

How should I understand that comment?  That you are about to send an
email to remove the sse2 support and that I can forget about this patch?

Thanks, Juan.


>
> Liang
>> --
>> Eric Blake   eblake redhat com+1-919-301-3266
>> Libvirt virtualization library http://libvirt.org

Re: [Qemu-devel] [PATCH v10 24/30] qapi: Factor out QAPISchemaObjectType.check_clash()

2015-11-10 Thread Markus Armbruster

Eric Blake  writes:

> On 11/09/2015 07:49 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> Consolidate two common sequences of clash detection into a
>>> new QAPISchemaObjectType.check_clash() helper method.
>>>
>>> No change to generated code.
>>>
>>> Signed-off-by: Eric Blake 
>>>
>
>>> @@ -980,11 +980,7 @@ class QAPISchemaObjectType(QAPISchemaType):
>>>  seen = OrderedDict()
>>>  if self._base_name:
>>>  self.base = schema.lookup_type(self._base_name)
>>> -assert isinstance(self.base, QAPISchemaObjectType)
>>> -assert not self.base.variants   # not implemented
>>> -self.base.check(schema)
>>> -for m in self.base.members:
>>> -m.check_clash(seen)
>>> +self.base.check_clash(schema, seen)
>>>  for m in self.local_members:
>>>  m.check(schema)
>>>  m.check_clash(seen)
>>> @@ -994,6 +990,12 @@ class QAPISchemaObjectType(QAPISchemaType):
>>>  assert self.variants.tag_member in self.members
>>>  self.variants.check_clash(schema, seen)
>>>
>>> +def check_clash(self, schema, seen):
>>> +self.check(schema)
>> 
>> Do we want to hide this .check() inside .check_clash()?
>> 
>> QAPISchemaObjectTypeMember.check() doesn't.  I think the two better
>> behave the same.
>> 
>>> +assert not self.variants   # not implemented
>>> +for m in self.members:
>>> +m.check_clash(seen)
>
> The self.check(schema) call is necessary to get self.members populated.
>  We cannot iterate over self.members if the type has not had check()
> called; this is true for both callers of type.check_clash()
> (ObjectType.check(), and Variants.check_clash()).

Yes.

We have a common protocol for QAPISchemaFOO objects, namely that certain
instance variables and methods are only valid after .check().

> You are correct that neither Member.check() nor Member.check_clash()
> call a form of type.check() - but that's because at that level, there is
> no need to populate a type.members list.
>
> On the other hand, we've been arguing that check() should populate
> everything after construction prior to anything else being run; and not
> running Variant.type.check() during Variants.check() of flat unions
> feels like we may have a hole (a flat union will have to inline its
> types to the overall JSON object, and inlining types requires access to
> type.members - but as written, we aren't populating them until
> Variants.check_clash()).  I can play with hoisting the type.check() out
> of type.check_clash() and instead keep base.check() in type.check(), and
> add variant.type.check() in Variants.check() (but only for unions, not
> for alternates), if you are interested.

My "qapi: Factor out QAPISchemaObjectTypeMember.check_clash()" added
QAPISchemaObjectTypeMember.check_clash() without changing the common
protocol.  The new QAPISchemaObjectTypeMember.check_clash() is merely a
helper for QAPISchemaObjectType.check().

Your 
Gcc: nnml:mail.redhat.xlst.qemu-devel
From: Markus Armbruster 
--text follows this line--
Eric Blake  writes:

> On 11/09/2015 07:49 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> Consolidate two common sequences of clash detection into a
>>> new QAPISchemaObjectType.check_clash() helper method.
>>>
>>> No change to generated code.
>>>
>>> Signed-off-by: Eric Blake 
>>>
>
>>> @@ -980,11 +980,7 @@ class QAPISchemaObjectType(QAPISchemaType):
>>>  seen = OrderedDict()
>>>  if self._base_name:
>>>  self.base = schema.lookup_type(self._base_name)
>>> -assert isinstance(self.base, QAPISchemaObjectType)
>>> -assert not self.base.variants   # not implemented
>>> -self.base.check(schema)
>>> -for m in self.base.members:
>>> -m.check_clash(seen)
>>> +self.base.check_clash(schema, seen)
>>>  for m in self.local_members:
>>>  m.check(schema)
>>>  m.check_clash(seen)
>>> @@ -994,6 +990,12 @@ class QAPISchemaObjectType(QAPISchemaType):
>>>  assert self.variants.tag_member in self.members
>>>  self.variants.check_clash(schema, seen)
>>>
>>> +def check_clash(self, schema, seen):
>>> +self.check(schema)
>> 
>> Do we want to hide this .check() inside .check_clash()?
>> 
>> QAPISchemaObjectTypeMember.check() doesn't.  I think the two better
>> behave the same.
>> 
>>> +assert not self.variants   # not implemented
>>> +for m in self.members:
>>> +m.check_clash(seen)
>
> The self.check(schema) call is necessary to get self.members populated.
>  We cannot iterate over self.members if the type has not had check()
> called; this is true for both callers of type.check_clash()
> (ObjectType.check(), and Variants.check_clash()).

Yes.

We have a common protocol for QAPISchemaFOO objects, namely that certain
instance variables and metho

Re: [Qemu-devel] [PATCH v6 3/4] qmp: add monitor command to add/remove a child

2015-11-10 Thread Markus Armbruster

Wen Congyang  writes:

> On 11/09/2015 10:42 PM, Alberto Garcia wrote:
>> Sorry again for the late review, here are my comments:
>> 
>> On Fri 16 Oct 2015 10:57:45 AM CEST, Wen Congyang wrote:
>>> +void qmp_x_blockdev_change(ChangeOperation op, const char *parent,
>>> +   bool has_child, const char *child,
>>> +   bool has_new_node, const char *new_node,
>>> +   Error **errp)
>> 
>> You are using different names for the parameters here: 'op', 'parent',
>> 'child', 'new_node'; in the JSON file the first and last one are named
>> 'operation' and 'node'.
>
> OK, I will fix it in the next version
>
>> 
>>> +parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
>>> +if (!parent_bs) {
>>> +error_propagate(errp, local_err);
>>> +return;
>>> +}
>> 
>> You don't need to change it if you don't want but you can use errp
>> directly here and spare the error_propagate() call.
>
> Too many codes in qemu use local_err and error_propagate(). I think
> errp can be NOT NULL here(in which case?).

It's usually advisable not to rely on "all callers pass non-null value
to parameter errp" arguments, because they're non-local and tend to be
brittle.

error.h attempts to provide guidance:

 * Receive an error and pass it on to the caller:
 * Error *err = NULL;
 * foo(arg, &err);
 * if (err) {
 * handle the error...
 * error_propagate(errp, err);
 * }
 * where Error **errp is a parameter, by convention the last one.
 *
 * Do *not* "optimize" this to
 * foo(arg, errp);
 * if (*errp) { // WRONG!
 * handle the error...
 * }
 * because errp may be NULL!
 *
 * But when all you do with the error is pass it on, please use
 * foo(arg, errp);
 * for readability.

Since all you do with local_err in the quoted code snippet is pass it
on, the last paragraph applies, and you can simplify to:

parent_bs = bdrv_lookup_bs(parent, parent, errp);
if (!parent_bs) {
return;
}

Whether errp can be null doesn't matter.

[...]

Re: [Qemu-devel] [PATCH for-2.5] hw/timer/hpet.c: Avoid signed integer overflow which results in bugs on OSX

2015-11-10 Thread Paolo Bonzini



On 10/11/2015 09:57, Laszlo Ersek wrote:
> On 11/09/15 23:25, Laszlo Ersek wrote:
>> On 11/09/15 15:56, Peter Maydell wrote:
>>> Signed integer overflow in C is undefined behaviour, and the compiler
>>> is at liberty to assume it can never happen and optimize accordingly.
>>> In particular, the subtractions in hpet_time_after() and hpet_time_after64()
>>> were causing OSX clang to optimize the code such that it was prone to
>>> hangs and complaints about the main loop stalling (presumably because
>>> we were spending all our time trying to service very high frequency
>>> HPET timer callbacks). The clang sanitizer confirms the UB:
>>>
>>> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 
>>> - 2147003978 cannot be represented in type 'int'
>>>
>>> Fix this by doing the subtraction as an unsigned operation and then
>>> converting to signed for the comparison.
>>>
>>> Reported-by: Aaron Elkins 
>>> Signed-off-by: Peter Maydell 
>>> ---
>>>  hw/timer/hpet.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
>>> index 3037bef..7f0391c 100644
>>> --- a/hw/timer/hpet.c
>>> +++ b/hw/timer/hpet.c
>>> @@ -116,12 +116,12 @@ static uint32_t timer_enabled(HPETTimer *t)
>>>  
>>>  static uint32_t hpet_time_after(uint64_t a, uint64_t b)
>>>  {
>>> -return ((int32_t)(b) - (int32_t)(a) < 0);
>>> +return ((int32_t)(b - a) < 0);
>>>  }
>>>  
>>>  static uint32_t hpet_time_after64(uint64_t a, uint64_t b)
>>>  {
>>> -return ((int64_t)(b) - (int64_t)(a) < 0);
>>> +return ((int64_t)(b - a) < 0);
>>>  }
>>>  
>>>  static uint64_t ticks_to_ns(uint64_t value)
>>>
>>
>> I'm late to the discussion, but I cannot imagine what would speak against:
>>
>> return (b < a);

With uint32_t, b < a is wrong if b has just overflowed and a is just
below 2^32.

With int32_t, b < a is wrong if b is just above 2^31 and a is just below
2^31.

Basically you want to consider a sliding window around (a+b)/2 (where
a+b is computed with "infinite" precision), and see whether it's a or b
that comes before the average.

For int64_t/uint64_t it is indeed moot, because it takes centuries
before you get close to 2^63 ticks (QEMU's emulated HPET has a 100 MHz
frequency; one year is 86400*365.25*10^8 ticks, or about 2^51.5).

Paolo

>> The post-patch code still converts a uint64_t difference to int32_t.
>> According to the C standard(s), such a conversion (i.e., when the
>> integer value being converted doesn't fit in the target signed integer)
>> results in an implementation-defined value, or an implementation-defined
>> signal is raised.
>>
>> On our platforms, the impl-def value is determined by "truncate to 32
>> bits, then reinterpret the bit pattern as two's complement signed
>> int32_t". Meaning, if:
>>
>> (b > a) && ((b - a) & (1u << 31))
>>
>> (that is, "b" is so much larger than "a" that bit#31 is set in the (b-a)
>> difference), then hpet_time_after() will now incorrectly return 1.
>> (Because bit#31 will be interpreted as the sign bit, turned on.)
>>
>> Again, what speaks against
>>
>> return (b < a);
>>
>> ?
>>
>> (The pre-patch code dates back to commit 16b29ae1 (year 2008), which
>> offers precious little justification for the formula.)
> 
> An hour or so after sending this email, I think I got an idea about the
> code's intent. (Knowing practically nothing about HPET.) I guess the
> HPET provides counters that can wrap around, so if you don't look
> frequently enough, you won't know if the value is actually smaller or
> greater (because you can't use raw magnitude to tell that).
> 
> So I *guess* this code implemented the following idea: assume you have a
> "last value", and a reading (?) from "just a bit later". You take the
> neighborhood (with radius 2^31, or 2^63) of the "last value", and if the
> new reading falls into the upper half of that neighborhood, you say "the
> value has grown".
> 
> This idea is actually very well suited for uintN_t modular arithmetic,
> because the (x - y) difference expresses the number of times you have to
> increment y to make it fall into the same remainder class as x, modulo 2^N.
> 
> Hence, ((x - y) < 2^(N-1)) expresses "x is later than or equal to y"
> (with both x and y being uintN_t variables). Equivalently, we have ((x -
> y) >= 2^(N-1)) meaning "x is strictly earlier than y", which can also be
> said as "y is strictly after x".
> 
> And I think that's exactly what these functions implement:
> 
> - Their names say "time after".
> 
> - The condition
> 
>   (x - y) >= 2^(N-1)
> 
>   tests exactly whether the most significant bit is set in the
>   difference.
> 
>   When the bit pattern of the difference is reinterpreted as intN_t,
>   that in turn means
> 
>   (intN_t)(x - y) < 0
> 
> So the functions seem to check if "a is strictly after b".
> 
> ... The call sites seem to confirm this:
> 
> if (t->config & HPET_TN_32BIT) {
> while (hpet_time_after(cur_tick, t->cmp

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini



On 10/11/2015 10:13, Juan Quintela wrote:
>> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo 
>> > length'
>> > then write a test program to check a large amount of zero pages, and
>> > use the 'time' to
>> > recode the time takes by different optimization. Test result is like this:
>> >
>> > SSE2:
>> > --
>> >   |test 1 | test 2
>> > 
>> > Time(S):|   13.696| 13.533  
>> > 
>> >
>> >
>> > AVX2:
>> > ---
>> >   |test 1 | test 2
>> > ---
>> > Time (S):|  10.583  |  10.306
>> > ---
>> >
>> > memeqzero4_paolo:
>> > ---
>> >   |test 1 | test 2
>> > ---
>> > Time (S):|  9.718 |  9.817
>> > 
>> >
>> >
>> > Paolo's implementation has the best performance. It seems that we can
>> > remove the SSE2 related Intrinsics.

Note that you can simplify my implementation a lot, because
buffer_find_nonzero_offset already assumes that the buffer is aligned to
sizeof(VECTYPE), i.e. 16 bytes.  For example you can just check the
first 4 unsigned longs against zero and then call memcmp.

Paolo

> How should I understand that comment?  That you are about to send an
> email to remove the sse2 support and that I can forget about this patch?

Re: [Qemu-devel] [PATCH] hw/arm/virt: error_report cleanups

2015-11-10 Thread Peter Maydell

On 9 November 2015 at 18:52, Markus Armbruster  wrote:
> Peter Maydell  writes:
>> Thanks, I had missed this useful improvement to the API.
>> How does it work in cases like this where we don't have
>> an Error* to fill in?
>
> You do what error_report_err() would do had you had an Error *err to
> fill in:

> In other words, you print the error message proper with error_report(),
> and the additional information with error_printf().

...so in conclusion Andrew's patch is correct as it stands
and I should just apply it? :-)

thanks
-- PMM

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z

> > Eric, thanks for you information. I didn't notice that discussion before.
> >
> >
> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo
> length'
> > then write a test program to check a large amount of zero pages, and
> > use the 'time' to recode the time takes by different optimization.
> > Test result is like this:
> >
> > SSE2:
> > --
> >   |test 1 | test 2
> > 
> > Time(S):|   13.696| 13.533
> > 
> >
> >
> > AVX2:
> > ---
> >   |test 1 | test 2
> > ---
> > Time (S):|  10.583  |  10.306
> > ---
> >
> > memeqzero4_paolo:
> > ---
> >   |test 1 | test 2
> > ---
> > Time (S):|  9.718 |  9.817
> > 
> >
> >
> > Paolo's implementation has the best performance. It seems that we can
> > remove the SSE2 related Intrinsics.
> 
> How should I understand that comment?  That you are about to send an email
> to remove the sse2 support and that I can forget about this patch?
> 
> Thanks, Juan.
> 

I don't know Paolo's opinion about how to deal with the SSE2 Intrinsics, he is 
the author. From my personal view, 
now that we have found a better way, why to use such low level SSE2/AVX2 
Intrinsics. I don't know if someone else
is working on this. if not, and the related maintainer agrees to remove them, I 
am happy to send out a new patch.

Let's forget my patch at the moment.

Liang

Re: [Qemu-devel] [POC]colo-proxy in qemu

2015-11-10 Thread Tkid




On 11/10/2015 03:35 PM, Jason Wang wrote:

On 11/10/2015 01:26 PM, Tkid wrote:

Hi,all

We are planning to reimplement colo proxy in userspace (Here is in
qemu) to
cache and compare net packets.This module is one of the important
components
of COLO project and now it is still in early stage, so any comments and
feedback are warmly welcomed,thanks in advance.

## Background
COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
Service)
project is a high availability solution. Both Primary VM (PVM) and
Secondary VM
(SVM) run in parallel. They receive the same request from client, and
generate
responses in parallel too. If the response packets from PVM and SVM are
identical, they are released immediately. Otherwise, a VM checkpoint
(on demand)
is conducted.
Paper:
http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
COLO on Xen:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
COLO on Qemu/KVM:
http://wiki.qemu.org/Features/COLO

By the needs of capturing response packets from PVM and SVM and
finding out
whether they are identical, we introduce a new module to qemu
networking called
colo-proxy.

This document describes the design of the colo-proxy module

## Glossary
   PVM - Primary VM, which provides services to clients.
   SVM - Secondary VM, a hot standby and replication of PVM.
   PN - Primary Node, the host which PVM runs on
   SN - Secondary Node, the host which SVM runs on

## Our Idea ##

COLO-Proxy
COLO-Proxy is a part of COLO,based on qemu net filter and it's a
plugin for
qemu net filter.the function keep SVM connect normal to PVM and compare
PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.

== Workflow ==

+--+  +--+
|PN|  |SN|
+---+ +---+
| +---+ | | +---+ |
| |   | | | |   | |
| |PVM| | | |SVM| |
| |   | | | |   | |
| +--+-^--+ | | +-^++ |
|| || |   ||  |
|| | ++ | | +---+ ||  |
|| | |COLO| |(socket) | |COLO   | ||  |
|| | | CheckPoint +-> CheckPoint| ||  |
|| | || |  (6)| |   | ||  |
|| | +-^--+ | | +---+ ||  |
|| |   (5) || |   ||  |
|| |   || |   ||  |
| +--v-+--+ | Forward(socket) | +-+v+ |
| |COLO Proxy  |  +---+(1)+->seq&ack adjust(2)| | |
| |  +-+--+ | | +-+ | |
| |  | Compare(4) <---+(3)+-+ COLO Proxy| |
| +---+ | Forward(socket) | +---+ |
++Qemu+-+ ++Qemu+-+
| ^
| |
| |
   +v-++
   |   |
   |  Client   |
   |   |
   +---+


(1)When PN receive client packets,PN COLO-Proxy copy and forward
packets to
SN COLO-Proxy.
(2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
adjusted packets to SVM
(3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
COLO-Proxy.
(4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
compare PVM's packets data with SVM's packets data. If packets is
different, compare
module notify COLO CheckPoint module to do a checkpoint then send
PVM's packets to
client and drop SVM's packets, otherwise, just send PVM's packets to
client and
drop SVM's packets.
(5)notify COLO-Checkpoint module checkpoint is needed
(6)Do COLO-Checkpoint

### QEMU space TCP/IP stack(Based on SLIRP) ###
We need a QEMU space TCP/IP stack to help us to analysis packet. After
looking
into QEMU, we found that SLIRP

http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29

is a good choice for us. SLIRP proivdes a full TCP/IP stack within
QEMU, it can
help use to handle the packet written to/read from backend(tap) device
which is
just like a link layer(L2) packet.

### Packet enqueue and compare ###
Together with QEMU space TCP/IP stack, we enqueue all packets sent by
PVM and
SVM on Primary QEMU, and then compare the packet payload for each
connection.


Thanks for review ~

Hi:

Just have the following questions in my mind (some has been raised in
the previous rounds of discussion without a conclusion):

- What's the plan for management layer? The setup seems complicated so
we could not simply depend on user to do each step. (And for security
reason, qemu was usually run as unprivileged user)
-We don

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini



On 10/11/2015 10:26, Li, Liang Z wrote:
> I don't know Paolo's opinion about how to deal with the SSE2
> Intrinsics, he is the author. From my personal view, now that we have
> found a better way, why to use such low level SSE2/AVX2 Intrinsics.

I totally agree. :)

Paolo

Re: [Qemu-devel] [PATCH] hw/arm/virt: error_report cleanups

2015-11-10 Thread Markus Armbruster

Peter Maydell  writes:

> On 9 November 2015 at 18:52, Markus Armbruster  wrote:
>> Peter Maydell  writes:
>>> Thanks, I had missed this useful improvement to the API.
>>> How does it work in cases like this where we don't have
>>> an Error* to fill in?
>>
>> You do what error_report_err() would do had you had an Error *err to
>> fill in:
>
>> In other words, you print the error message proper with error_report(),
>> and the additional information with error_printf().
>
> ...so in conclusion Andrew's patch is correct as it stands
> and I should just apply it? :-)

Yes.  It got my R-by :)

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z

> On 10/11/2015 10:26, Li, Liang Z wrote:
> > I don't know Paolo's opinion about how to deal with the SSE2
> > Intrinsics, he is the author. From my personal view, now that we have
> > found a better way, why to use such low level SSE2/AVX2 Intrinsics.
> 
> I totally agree. :)
> 
> Paolo

Hi Paolo,

It seems you are the right person to remove them, you are the author for both 
the 'SSE2 Intrinsics' and 'memeqzero4_paolo'.
Please forget my patch totally.

Liang

Re: [Qemu-devel] [POC]colo-proxy in qemu

2015-11-10 Thread Dr. David Alan Gilbert

* Jason Wang (jasow...@redhat.com) wrote:
> 
> 
> On 11/10/2015 01:26 PM, Tkid wrote:
> > Hi,all
> >
> > We are planning to reimplement colo proxy in userspace (Here is in
> > qemu) to
> > cache and compare net packets.This module is one of the important
> > components
> > of COLO project and now it is still in early stage, so any comments and
> > feedback are warmly welcomed,thanks in advance.
> >
> > ## Background
> > COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
> > Service)
> > project is a high availability solution. Both Primary VM (PVM) and
> > Secondary VM
> > (SVM) run in parallel. They receive the same request from client, and
> > generate
> > responses in parallel too. If the response packets from PVM and SVM are
> > identical, they are released immediately. Otherwise, a VM checkpoint
> > (on demand)
> > is conducted.
> > Paper:
> > http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> > COLO on Xen:
> > http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> > COLO on Qemu/KVM:
> > http://wiki.qemu.org/Features/COLO
> >
> > By the needs of capturing response packets from PVM and SVM and
> > finding out
> > whether they are identical, we introduce a new module to qemu
> > networking called
> > colo-proxy.
> >
> > This document describes the design of the colo-proxy module
> >
> > ## Glossary
> >   PVM - Primary VM, which provides services to clients.
> >   SVM - Secondary VM, a hot standby and replication of PVM.
> >   PN - Primary Node, the host which PVM runs on
> >   SN - Secondary Node, the host which SVM runs on
> >
> > ## Our Idea ##
> >
> > COLO-Proxy
> > COLO-Proxy is a part of COLO,based on qemu net filter and it's a
> > plugin for
> > qemu net filter.the function keep SVM connect normal to PVM and compare
> > PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
> >
> > == Workflow ==
> >
> >
> > +--+  +--+
> > |PN|  |SN|
> > +---+ +---+
> > | +---+ | | +---+ |
> > | |   | | | |   | |
> > | |PVM| | | |SVM| |
> > | |   | | | |   | |
> > | +--+-^--+ | | +-^++ |
> > || || |   ||  |
> > || | ++ | | +---+ ||  |
> > || | |COLO| |(socket) | |COLO   | ||  |
> > || | | CheckPoint +-> CheckPoint| ||  |
> > || | || |  (6)| |   | ||  |
> > || | +-^--+ | | +---+ ||  |
> > || |   (5) || |   ||  |
> > || |   || |   ||  |
> > | +--v-+--+ | Forward(socket) | +-+v+ |
> > | |COLO Proxy  |  +---+(1)+->seq&ack adjust(2)| | |
> > | |  +-+--+ | | +-+ | |
> > | |  | Compare(4) <---+(3)+-+ COLO Proxy| |
> > | +---+ | Forward(socket) | +---+ |
> > ++Qemu+-+ ++Qemu+-+
> >| ^
> >| |
> >| |
> >   +v-++
> >   |   |
> >   |  Client   |
> >   |   |
> >   +---+
> >
> >
> >
> >
> > (1)When PN receive client packets,PN COLO-Proxy copy and forward
> > packets to
> > SN COLO-Proxy.
> > (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
> > adjusted packets to SVM
> > (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
> > COLO-Proxy.
> > (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
> > compare PVM's packets data with SVM's packets data. If packets is
> > different, compare
> > module notify COLO CheckPoint module to do a checkpoint then send
> > PVM's packets to
> > client and drop SVM's packets, otherwise, just send PVM's packets to
> > client and
> > drop SVM's packets.
> > (5)notify COLO-Checkpoint module checkpoint is needed
> > (6)Do COLO-Checkpoint
> >
> > ### QEMU space TCP/IP stack(Based on SLIRP) ###
> > We need a QEMU space TCP/IP stack to help us to analysis packet. After
> > looking
> > into QEMU, we found that SLIRP
> >
> > http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
> >
> > is a good choice for us. SLIRP proivdes a full TCP/IP stack within
> > QEMU, it can
> > help use to handle the packet written to/read from backend(tap) device
> > which is
> > just like a link layer(L2) packet.
> >
> > ### Packet enqueue and compare ###
> > Together with QEMU space TCP/IP stack, we enqueue all packe

Re: [Qemu-devel] [PATCH for 2.5 v6 0/10] dataplane snapshot fixes

2015-11-10 Thread Stefan Hajnoczi

On Mon, Nov 09, 2015 at 08:57:26PM +0300, Denis V. Lunev wrote:
> On 11/09/2015 08:37 PM, Stefan Hajnoczi wrote:
> >On Sat, Nov 07, 2015 at 06:54:50PM +0300, Denis V. Lunev wrote:
> >>with test
> >> while /bin/true ; do
> >> virsh snapshot-create rhel7
> >> sleep 10
> >> virsh snapshot-delete rhel7 --current
> >> done
> >>with enabled iothreads on a running VM leads to a lot of troubles: hangs,
> >>asserts, errors.
> >>
> >>Anyway, I think that the construction like
> >> assert(aio_context_is_locked(aio_context));
> >>should be widely used to ensure proper locking.
> >>
> >>Changes from v5:
> >>- dropped already merged patch 11
> >>- fixed spelling in patch 1
> >>- changed order of condition in loops in all patches. Thank you Stefan :)
> >>- dropped patch 9
> >>- aio_context is not acquired any more in bdrv_all_find_vmstate_bs by 
> >>request
> >>   of Stefan
> >>- patch 10 is implemented in completely different way
> >I left comments on specific patches.  Besides that, I'm happy.
> OK. that sounds good enough to me. These changes
> are not a problem at all.
> 
> Should we ask Juan that this is good for him?

I think the requested changes won't be controversial.  Sending the next
revision should be fine.

Stefan


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v3 1/2] net: netmap: Fix compilation issue

2015-11-10 Thread Vincenzo Maffione

Reorganization of struct NetClientOptions (commit e4ba22b) caused a
compilation failure of the netmap backend. This patch fixes the issue
by properly accessing the union field.

Signed-off-by: Vincenzo Maffione 
---
 net/netmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netmap.c b/net/netmap.c
index 508b829..4197a9c 100644
--- a/net/netmap.c
+++ b/net/netmap.c
@@ -439,7 +439,7 @@ int net_init_netmap(const NetClientOptions *opts,
 const char *name, NetClientState *peer, Error **errp)
 {
 /* FIXME error_setg(errp, ...) on failure */
-const NetdevNetmapOptions *netmap_opts = opts->netmap;
+const NetdevNetmapOptions *netmap_opts = opts->u.netmap;
 NetClientState *nc;
 NetmapPriv me;
 NetmapState *s;
-- 
2.6.2

[Qemu-devel] [PATCH v3 0/2] Fix compilation of netmap backend

2015-11-10 Thread Vincenzo Maffione

This patch series adds some fixes to the netmap net backend. It contains
two changes:
(1) Fix compilation issue of netmap.c introduced by the reorganization
of struct NetClientOptions
(2) Address the FIXME comment that was asking to use error_setg()
variants in place of error_report()

CHANGELOG:
- removed dead return and use error_setg_file_open() in place
  of error_setg_errno()
- I noticed that net_init_netmap() has to return int, so I restored
  the return statements in that function

Vincenzo Maffione (2):
  net: netmap: Fix compilation issue
  net: netmap: use error_setg() helpers in place of error_report()

 net/netmap.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

-- 
2.6.2

Re: [Qemu-devel] [PATCH 0/2] checkpatch: Fixing two cases of false positives in checkpatch.pl

2015-11-10 Thread Leonid Bloch

ping

http://patchwork.ozlabs.org/patch/537763
http://patchwork.ozlabs.org/patch/537762

On Thu, Oct 29, 2015 at 11:48 AM, Leonid Bloch  wrote:
> This series addresses two cases where errors were printed if whitespaces
> appeared in front of a square bracket in places where there should be no
> problem with such placements (please see messages of individual commits).
>
> Leonid Bloch (2):
>   checkpatch: Eliminate false positive in case of comma-space-square
> bracket
>   checkpatch: Eliminate false positive in case of space before square
> bracket in a definition
>
>  scripts/checkpatch.pl | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> --
> 2.4.3
>

[Qemu-devel] [PATCH v3 2/2] net: netmap: use error_setg() helpers in place of error_report()

2015-11-10 Thread Vincenzo Maffione

This update was required to align error reporting of netmap backend
initialization to the modifications introduced by commit a30ecde.

Signed-off-by: Vincenzo Maffione 
---
 net/netmap.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/netmap.c b/net/netmap.c
index 4197a9c..5558368 100644
--- a/net/netmap.c
+++ b/net/netmap.c
@@ -90,7 +90,7 @@ pkt_copy(const void *_src, void *_dst, int l)
  * Open a netmap device. We assume there is only one queue
  * (which is the case for the VALE bridge).
  */
-static int netmap_open(NetmapPriv *me)
+static void netmap_open(NetmapPriv *me, Error **errp)
 {
 int fd;
 int err;
@@ -99,9 +99,8 @@ static int netmap_open(NetmapPriv *me)
 
 me->fd = fd = open(me->fdname, O_RDWR);
 if (fd < 0) {
-error_report("Unable to open netmap device '%s' (%s)",
-me->fdname, strerror(errno));
-return -1;
+error_setg_file_open(errp, errno, me->fdname);
+return;
 }
 memset(&req, 0, sizeof(req));
 pstrcpy(req.nr_name, sizeof(req.nr_name), me->ifname);
@@ -109,15 +108,14 @@ static int netmap_open(NetmapPriv *me)
 req.nr_version = NETMAP_API;
 err = ioctl(fd, NIOCREGIF, &req);
 if (err) {
-error_report("Unable to register %s: %s", me->ifname, strerror(errno));
+error_setg_errno(errp, errno, "Unable to register %s", me->ifname);
 goto error;
 }
 l = me->memsize = req.nr_memsize;
 
 me->mem = mmap(0, l, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0);
 if (me->mem == MAP_FAILED) {
-error_report("Unable to mmap netmap shared memory: %s",
-strerror(errno));
+error_setg_errno(errp, errno, "Unable to mmap netmap shared memory");
 me->mem = NULL;
 goto error;
 }
@@ -125,11 +123,11 @@ static int netmap_open(NetmapPriv *me)
 me->nifp = NETMAP_IF(me->mem, req.nr_offset);
 me->tx = NETMAP_TXRING(me->nifp, 0);
 me->rx = NETMAP_RXRING(me->nifp, 0);
-return 0;
+
+return;
 
 error:
 close(me->fd);
-return -1;
 }
 
 static void netmap_send(void *opaque);
@@ -438,9 +436,9 @@ static NetClientInfo net_netmap_info = {
 int net_init_netmap(const NetClientOptions *opts,
 const char *name, NetClientState *peer, Error **errp)
 {
-/* FIXME error_setg(errp, ...) on failure */
 const NetdevNetmapOptions *netmap_opts = opts->u.netmap;
 NetClientState *nc;
+Error *err = NULL;
 NetmapPriv me;
 NetmapState *s;
 
@@ -448,7 +446,9 @@ int net_init_netmap(const NetClientOptions *opts,
 netmap_opts->has_devname ? netmap_opts->devname : "/dev/netmap");
 /* Set default name for the port if not supplied. */
 pstrcpy(me.ifname, sizeof(me.ifname), netmap_opts->ifname);
-if (netmap_open(&me)) {
+netmap_open(&me, &err);
+if (err) {
+error_propagate(errp, err);
 return -1;
 }
 /* Create the object. */
-- 
2.6.2

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini



On 10/11/2015 10:41, Li, Liang Z wrote:
>> On 10/11/2015 10:26, Li, Liang Z wrote:
>>> I don't know Paolo's opinion about how to deal with the SSE2 
>>> Intrinsics, he is the author. From my personal view, now that we
>>> have found a better way, why to use such low level SSE2/AVX2
>>> Intrinsics.
>> 
>> I totally agree. :)
> 
> It seems you are the right person to remove them, you are the author
> for both the 'SSE2 Intrinsics' and 'memeqzero4_paolo'. Please forget
> my patch totally.

I agree that your patch can be dropped, but go ahead and submit your
improvements!

Paolo

Re: [Qemu-devel] [PATCH for-2.5] hw/timer/hpet.c: Avoid signed integer overflow which results in bugs on OSX

2015-11-10 Thread Laszlo Ersek

On 11/10/15 10:26, Paolo Bonzini wrote:
> 
> 
> On 10/11/2015 09:57, Laszlo Ersek wrote:
>> On 11/09/15 23:25, Laszlo Ersek wrote:
>>> On 11/09/15 15:56, Peter Maydell wrote:
 Signed integer overflow in C is undefined behaviour, and the compiler
 is at liberty to assume it can never happen and optimize accordingly.
 In particular, the subtractions in hpet_time_after() and 
 hpet_time_after64()
 were causing OSX clang to optimize the code such that it was prone to
 hangs and complaints about the main loop stalling (presumably because
 we were spending all our time trying to service very high frequency
 HPET timer callbacks). The clang sanitizer confirms the UB:

 hw/timer/hpet.c:119:26: runtime error: signed integer overflow: 
 -2146967296 - 2147003978 cannot be represented in type 'int'

 Fix this by doing the subtraction as an unsigned operation and then
 converting to signed for the comparison.

 Reported-by: Aaron Elkins 
 Signed-off-by: Peter Maydell 
 ---
  hw/timer/hpet.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
 index 3037bef..7f0391c 100644
 --- a/hw/timer/hpet.c
 +++ b/hw/timer/hpet.c
 @@ -116,12 +116,12 @@ static uint32_t timer_enabled(HPETTimer *t)
  
  static uint32_t hpet_time_after(uint64_t a, uint64_t b)
  {
 -return ((int32_t)(b) - (int32_t)(a) < 0);
 +return ((int32_t)(b - a) < 0);
  }
  
  static uint32_t hpet_time_after64(uint64_t a, uint64_t b)
  {
 -return ((int64_t)(b) - (int64_t)(a) < 0);
 +return ((int64_t)(b - a) < 0);
  }
  
  static uint64_t ticks_to_ns(uint64_t value)

>>>
>>> I'm late to the discussion, but I cannot imagine what would speak against:
>>>
>>> return (b < a);
> 
> With uint32_t, b < a is wrong if b has just overflowed and a is just
> below 2^32.
> 
> With int32_t, b < a is wrong if b is just above 2^31 and a is just below
> 2^31.
> 
> Basically you want to consider a sliding window around (a+b)/2 (where
> a+b is computed with "infinite" precision), and see whether it's a or b
> that comes before the average.

Thanks!

(I guess / hope this is about the same that I managed to realize on my
own in my other email :))

> For int64_t/uint64_t it is indeed moot, because it takes centuries
> before you get close to 2^63 ticks (QEMU's emulated HPET has a 100 MHz
> frequency; one year is 86400*365.25*10^8 ticks, or about 2^51.5).

Finally! I resisted the urge to write "yet another hardware clock /
counter that overflows within a humanly observable interval, *groan*".
But, now that you say that the 64-bit HPET fixes (or may fix) that, I
don't have to hold back. :)

Thanks
Laszlo

> 
> Paolo
> 
>>> The post-patch code still converts a uint64_t difference to int32_t.
>>> According to the C standard(s), such a conversion (i.e., when the
>>> integer value being converted doesn't fit in the target signed integer)
>>> results in an implementation-defined value, or an implementation-defined
>>> signal is raised.
>>>
>>> On our platforms, the impl-def value is determined by "truncate to 32
>>> bits, then reinterpret the bit pattern as two's complement signed
>>> int32_t". Meaning, if:
>>>
>>> (b > a) && ((b - a) & (1u << 31))
>>>
>>> (that is, "b" is so much larger than "a" that bit#31 is set in the (b-a)
>>> difference), then hpet_time_after() will now incorrectly return 1.
>>> (Because bit#31 will be interpreted as the sign bit, turned on.)
>>>
>>> Again, what speaks against
>>>
>>> return (b < a);
>>>
>>> ?
>>>
>>> (The pre-patch code dates back to commit 16b29ae1 (year 2008), which
>>> offers precious little justification for the formula.)
>>
>> An hour or so after sending this email, I think I got an idea about the
>> code's intent. (Knowing practically nothing about HPET.) I guess the
>> HPET provides counters that can wrap around, so if you don't look
>> frequently enough, you won't know if the value is actually smaller or
>> greater (because you can't use raw magnitude to tell that).
>>
>> So I *guess* this code implemented the following idea: assume you have a
>> "last value", and a reading (?) from "just a bit later". You take the
>> neighborhood (with radius 2^31, or 2^63) of the "last value", and if the
>> new reading falls into the upper half of that neighborhood, you say "the
>> value has grown".
>>
>> This idea is actually very well suited for uintN_t modular arithmetic,
>> because the (x - y) difference expresses the number of times you have to
>> increment y to make it fall into the same remainder class as x, modulo 2^N.
>>
>> Hence, ((x - y) < 2^(N-1)) expresses "x is later than or equal to y"
>> (with both x and y being uintN_t variables). Equivalently, we have ((x -
>> y) >= 2^(N-1)) meaning "x is strictly earlier than y", which can also be
>> said as "y is strictly after x".
>>
>> And I think that's exactly

Re: [Qemu-devel] [PATCH v3] virtio-blk: trivial code optimization

2015-11-10 Thread Paolo Bonzini



On 10/11/2015 07:35, Gonglei wrote:
>> > nb_sectors - int
>> > max_xfer_len - int
>> > req->qiov.size - size_t
>> > BDRV_SECTOR_SIZE - unsigned long long
>> > 
>> > Therefore this expression is an int > unsigned long long comparison.
>> > 
> Sorry, I'm confused.
> max_xfer_len is int,
> "req->qiov.size / BDRV_SECTOR_SIZE" is  unsigned long long,
> so, "max_xfer_len - req->qiov.size / BDRV_SECTOR_SIZE" will be int,

No, the result will be unsigned long long, and the comparison is wrong
if max_xfer_len < req->qiov.size / BDRV_SECTOR_SIZE.

Paolo

> and nb_sectors is int too, so this comparison is right. Am I wrong?
>

Re: [Qemu-devel] [PATCH v2 1/1] target-ppc: Implement rtas_get_sysparm(PROCESSOR_MODULE_INFO)

2015-11-10 Thread Thomas Huth

On 10/11/15 05:22, Sukadev Bhattiprolu wrote:
[...]
> | > +static int file_read_buf(char *file_name, char *buf, int len)
> | > +{
> | > +int rc;
> | > +FILE *fp;
> | > +
> | > +fp = fopen(file_name, "r");
> | > +if (!fp) {
> | > +error_report("%s: Error opening %s\n", __func__, file_name);
> | > +return -1;
> | > +}
> | > +
> | > +rc = fread(buf, 1, len, fp);
> | > +fclose(fp);
> | > +
> | > +if (rc != len) {
> | > +return -1;
> | > +}
> | > +
> | > +return 0;
> | > +}

Could you maybe use g_file_get_contents() instead?

> | > +/*
> | > + * Each core in the system is represented by a directory with the
> | > + * prefix 'PowerPC,POWER' in the directory /proc/device-tree/cpus/.
> | > + * Process that directory and count the number of cores in the system.
> | 
> | True on IBM POWER systems, but not necessarily everywhere - e.g. PR
> | KVM on an embedded PowerPC host.
> 
> What is PR KVM?

On PPC, there are multiple kinds of KVM kernel modules, e.g. KVM-HV and
KVM-PR (and further implementations for embedded PPCs, too). KVM-HV is
using the hypervisor hardware feature of the current POWER7 and POWER8
chips, while KVM-PR is using the PRoblem state to emulate a virtual
machine. KVM-PR thus also works on older PPC hardware.
So there are multiple PPC environments where QEMU can run on, and you
must not assume that you always have nodes like "PowerPC,POWER" in the
device tree.
(BTW, you can also build kernels without the /proc/device-tree file
system as far as I know ... so you never should fully rely on that
without a fallback strategy)

 Thomas

Re: [Qemu-devel] [PATCH v3] virtio-blk: trivial code optimization

2015-11-10 Thread Stefan Hajnoczi

On Tue, Nov 10, 2015 at 02:35:19PM +0800, Gonglei wrote:
> On 2015/11/9 21:57, Stefan Hajnoczi wrote:
> > On Mon, Nov 09, 2015 at 05:03:30PM +0800, arei.gong...@huawei.com wrote:
> >> From: Gonglei 
> >>
> >> 1. avoid possible superflous checking
> >> 2. make code more robustness
> >>
> >> Signed-off-by: Gonglei 
> >> Reviewed-by: Fam Zheng 
> >> ---
> >> v3: change the third condition too [Paolo]
> >> add Fam's R-by
> >> ---
> >>  hw/block/virtio-blk.c | 27 +--
> >>  1 file changed, 9 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> >> index 093e475..9124358 100644
> >> --- a/hw/block/virtio-blk.c
> >> +++ b/hw/block/virtio-blk.c
> >> @@ -404,24 +404,15 @@ void virtio_blk_submit_multireq(BlockBackend *blk, 
> >> MultiReqBuffer *mrb)
> >>  for (i = 0; i < mrb->num_reqs; i++) {
> >>  VirtIOBlockReq *req = mrb->reqs[i];
> >>  if (num_reqs > 0) {
> >> -bool merge = true;
> >> -
> >> -/* merge would exceed maximum number of IOVs */
> >> -if (niov + req->qiov.niov > IOV_MAX) {
> >> -merge = false;
> >> -}
> >> -
> >> -/* merge would exceed maximum transfer length of backend 
> >> device */
> >> -if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > 
> >> max_xfer_len) {
> >> -merge = false;
> >> -}
> >> -
> >> -/* requests are not sequential */
> >> -if (sector_num + nb_sectors != req->sector_num) {
> >> -merge = false;
> >> -}
> >> -
> >> -if (!merge) {
> >> +/*
> >> + * NOTE: We cannot merge the requests in below situations:
> >> + * 1. requests are not sequential
> >> + * 2. merge would exceed maximum number of IOVs
> >> + * 3. merge would exceed maximum transfer length of backend 
> >> device
> >> + */
> >> +if (sector_num + nb_sectors != req->sector_num ||
> >> +niov > IOV_MAX - req->qiov.niov ||
> >> +nb_sectors > max_xfer_len - req->qiov.size / 
> >> BDRV_SECTOR_SIZE) {
> > 
> > nb_sectors - int
> > max_xfer_len - int
> > req->qiov.size - size_t
> > BDRV_SECTOR_SIZE - unsigned long long
> > 
> > Therefore this expression is an int > unsigned long long comparison.
> > 
> Sorry, I'm confused.
> max_xfer_len is int,
> "req->qiov.size / BDRV_SECTOR_SIZE" is  unsigned long long,
> so, "max_xfer_len - req->qiov.size / BDRV_SECTOR_SIZE" will be int,

The type of "max_xfer_len - req->qiov.size / BDRV_SECTOR_SIZE" cannot be
int because you said req->qiov.size / BDRV_SECTOR_SIZE" is  unsigned
long long.

The C99 standard says:

6.3.1.1 Boolean, characters, and integers

- The rank of a signed integer type shall be greater than the rank of
any signed integer type with less precision.

...

- The rank of any unsigned integer type shall equal the rank of the
corresponding signed integer type, if any.

6.3.1.8 Usual arithmetic conversions

Otherwise, if the operand that has unsigned integer type has rank
greater or equal to the rank of the type of the other operand, then the
operand with signed integer type is converted to the type of the operand
with unsigned integer type.

So the max_xfer_len int operand must be converted to the higher ranking
unsigned long long.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z

> On 10/11/2015 10:41, Li, Liang Z wrote:
> >> On 10/11/2015 10:26, Li, Liang Z wrote:
> >>> I don't know Paolo's opinion about how to deal with the SSE2
> >>> Intrinsics, he is the author. From my personal view, now that we
> >>> have found a better way, why to use such low level SSE2/AVX2
> >>> Intrinsics.
> >>
> >> I totally agree. :)
> >
> > It seems you are the right person to remove them, you are the author
> > for both the 'SSE2 Intrinsics' and 'memeqzero4_paolo'. Please forget
> > my patch totally.
> 
> I agree that your patch can be dropped, but go ahead and submit your
> improvements!
> 
> Paolo

You mean I do this work? 
If you are busy, I can do this. I really hope the related improvement can be 
merged into QEMU 2.5.0.

Liang

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Paolo Bonzini

On 10/11/2015 10:56, Li, Liang Z wrote:
> > I agree that your patch can be dropped, but go ahead and submit your
> > improvements!
> 
> You mean I do this work? 
> If you are busy, I can do this.

It's not that I'm busy, it's that it's your idea.  It doesn't matter if
I (and Peter Lieven too, actually) originally did the optimizations.

You also have the infrastructure to benchmark the improvements.

Paolo

Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization

2015-11-10 Thread Li, Liang Z

> On 10/11/2015 10:56, Li, Liang Z wrote:
> > > I agree that your patch can be dropped, but go ahead and submit your
> > > improvements!
> >
> > You mean I do this work?
> > If you are busy, I can do this.
> 
> It's not that I'm busy, it's that it's your idea.  It doesn't matter if I 
> (and Peter
> Lieven too, actually) originally did the optimizations.
> 
> You also have the infrastructure to benchmark the improvements.
> 
> Paolo

OK. I will rework and send a new patch.

Liang

Re: [Qemu-devel] [PATCH for-2.5] hw/timer/hpet.c: Avoid signed integer overflow which results in bugs on OSX

2015-11-10 Thread Peter Maydell

On 9 November 2015 at 20:17, Michael S. Tsirkin  wrote:
> On Mon, Nov 09, 2015 at 02:56:31PM +, Peter Maydell wrote:
>> Signed integer overflow in C is undefined behaviour, and the compiler
>> is at liberty to assume it can never happen and optimize accordingly.
>> In particular, the subtractions in hpet_time_after() and hpet_time_after64()
>> were causing OSX clang to optimize the code such that it was prone to
>> hangs and complaints about the main loop stalling (presumably because
>> we were spending all our time trying to service very high frequency
>> HPET timer callbacks). The clang sanitizer confirms the UB:
>>
>> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 
>> - 2147003978 cannot be represented in type 'int'
>>
>> Fix this by doing the subtraction as an unsigned operation and then
>> converting to signed for the comparison.
>>
>> Reported-by: Aaron Elkins 
>> Signed-off-by: Peter Maydell 
>
> Agree, this makes no sense the way it's written.
>
> Reviewed-by: Michael S. Tsirkin 
>
> I'll pick this up in the next pull if Paolo doesn't
> beat me to it.

I went ahead and committed it to master yesterday; sorry
if that was a bit hasty of me.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image

2015-11-10 Thread Kevin Wolf

Am 10.11.2015 um 10:01 hat Paolo Bonzini geschrieben:
> 
> 
> On 10/11/2015 07:14, Fam Zheng wrote:
> > On Mon, 11/09 17:29, Kevin Wolf wrote:
> >> Am 09.11.2015 um 17:18 hat Paolo Bonzini geschrieben:
> >>>
> >>>
> >>> On 09/11/2015 17:04, Kevin Wolf wrote:
>  Am 06.11.2015 um 11:22 hat Fam Zheng geschrieben:
> > The "pnum < nb_sectors" condition in deciding whether to actually copy
> > data is unnecessarily strict, and the qiov initialization is
> > unnecessarily too, for both bdrv_aio_write_zeroes and bdrv_aio_discard
> > branches.
> >
> > Reorganize mirror_iteration flow so that we:
> >
> > 1) Find the contiguous zero/discarded sectors with
> > bdrv_get_block_status_above() before deciding what to do. We query
> > s->buf_size sized blocks at a time.
> >
> > 2) If the sectors in question are zeroed/discarded and aligned to
> > target cluster, issue zero write or discard accordingly. It's done
> > in mirror_do_zero_or_discard, where we don't add buffer to qiov.
> >
> > 3) Otherwise, do the same loop as before in mirror_do_read.
> >
> > Signed-off-by: Fam Zheng 
> 
>  I'm not sure where in the patch to comment on this, so I'll just do it
>  here right in the beginning.
> 
>  I'm concerned that we need to be more careful about races in this patch,
>  in particular regarding the bitmaps. I think the conditions for the two
>  bitmaps are:
> 
>  * Dirty bitmap: We must clear the bit after finding the next piece of
>    data to be mirrored, but before we yield after getting information
>    that we use for the decision which kind of operation we need.
> 
>    In other words, we need to clear the dirty bitmap bit before calling
>    bdrv_get_block_status_above(), because that's both the function that
>    retrieves information about the next chunk and also a function that
>    can yield.
> 
>    If after this point the data is written to, we need to mirror it
>    again.
> >>>
> >>> With Fam's patch, that's not trivial for two reasons:
> >>>
> >>> 1) bdrv_get_block_status_above() can return a smaller amount than what
> >>> is asked.
> >>>
> >>> 2) the "read and write" case can handle s->granularity sectors per
> >>> iteration (many of them can be coalesced, but still that's how the
> >>> iteration works).
> >>>
> >>> The simplest solution is to perform the query with s->granularity size
> >>> rather than s->buf_size.
> >>
> >> Then we end up with many small operations, that's not what we want.
> >>
> >> Why can't we mark up to s->buf_size dirty clusters as clean first, then
> >> query the status, and mark all of those that we can't handle dirty
> >> again?
> > 
> > Then we may end up marking more clusters as dirty than it should be.
> 
> You're both right.
> 
> > Because all bdrv_set_dirty() and bdrv_set_dirty_bitmap() callers are 
> > coroutine,
> > we can introduce a CoMutex to let bitmap reader block bdrv_set_dirty and
> > bdrv_set_dirty_bitmap.
> 
> I think this is not necessary.
> 
> I think the following is safe:
> 
> 1) before calling bdrv_get_block_status_above(), find out how many
> consecutive bits in the dirty bitmap are 1
> 
> 2) zero all those bits in the dirty bitmap
> 
> 3) call bdrv_get_block_status_above() with a size equivalent to the
> number of dirty bits
> 
> 4) if bdrv_get_block_status_above() only returns a partial result, loop
> step (3) until all the dirty bits are processed

Right, you can always implement one iteration with more than one I/O
request. And maybe that would be the time to start a coroutine for the
requests already in the mirror code instead of complicating the AIO
state machine and letting block.c start coroutines.

> For full mirroring, this strategy will probably make the first
> incremental iteration more expensive.

You mean because we issue smaller, interleaved write and write_zeroes
requests now instead of only large writes? That's probably right, but
getting the right result should be more important than speed. :-)

Kevin

Re: [Qemu-devel] [PATCH v6 04/15] iotests: Move _filter_nbd into common.filter

2015-11-10 Thread Kevin Wolf

Am 09.11.2015 um 19:17 hat Max Reitz geschrieben:
> On 09.11.2015 17:04, Kevin Wolf wrote:
> > Am 04.11.2015 um 19:57 hat Max Reitz geschrieben:
> >> _filter_nbd can be useful for other NBD tests, too, therefore it should
> >> reside in common.filter, and it should support URLs of the "nbd://"
> >> format and export names.
> >>
> >> The NBD log lines ("/your/source/dir/nbd.c:function():line: error")
> >> should not be converted to empty lines but removed altogether.
> >>
> >> Signed-off-by: Max Reitz 
> > 
> > Code motion and modification in the same patch is bad style. The changes
> > look good, though.
> 
> Considering splitting this into two patches will result basically in
> both of them each changing just as much as this single patch does
> (because test 083 uses tabs instead of spaces) I'm inclined to just
> change the commit title to "Remove filter_nbd and add _filter_nbd" instead.

You're confusing "changing much" with "touching many lines". What I'm
asking is to split this in two: One mostly mechanical patch that doesn't
change semantics, and one patch for the functional change. This makes it
much easier to spot the functional changes that are actually made.

For example, whitespace changes during code motion are not a problem at
all. I use git show -w routinely to review those. I can also cope with
other minor things like style changes during code motion.

The hard part during review is just finding the 10% of actual functional
change in the middle of the 90% that change nothing semantically,
especially if multiple hunks are involved in the functional change.

Kevin

pgpJXFwFxJA1P.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] mirror: Improve zero-write and discard with fragmented image

2015-11-10 Thread Paolo Bonzini

On 10/11/2015 11:12, Kevin Wolf wrote:
> > For full mirroring, this strategy will probably make the first
> > incremental iteration more expensive.
>
> You mean because we issue smaller, interleaved write and write_zeroes
> requests now instead of only large writes? That's probably right, but
> getting the right result should be more important than speed. :-)

No, because you might end up clearing the whole dirty bitmap before
issuing the first bdrv_get_block_status_above().  Blocks are actually
read much later; if someone sets the dirty bitmap in between, you will
re-write those blocks unnecessarily during the first incremental
iteration.  It's not specific to the first iteration, it's just more likely.

However, it may be enough to clamp the number of dirty bitmap bits that
you process in one go (e.g. to 100 MB of so).

Paolo

Re: [Qemu-devel] [PULL 29/57] migrate_start_postcopy: Command to trigger transition to postcopy

2015-11-10 Thread Dr. David Alan Gilbert

* Eric Blake (ebl...@redhat.com) wrote:
> [adding Markus for a qapi question]
> 
> On 11/09/2015 10:28 AM, Juan Quintela wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Once postcopy is enabled (with migrate_set_capability), the migration
> > will still start on precopy mode.  To cause a transition into postcopy
> > the:
> > 
> >   migrate_start_postcopy
> > 
> > command must be issued.  Postcopy will start sometime after this
> > (when it's next checked in the migration loop).
> > 
> > Issuing the command before migration has started will error,
> > and issuing after it has finished is ignored.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > Reviewed-by: Eric Blake 
> > Reviewed-by: Juan Quintela 
> > Reviewed-by: Amit Shah 
> > Signed-off-by: Juan Quintela 
> > ---
> 
> I know I reviewed an earlier version of this patch, but that was
> probably before 24/57 of this pull request spelled the migration
> capability bit as x-postcopy-ram.

It's been x-postcopy-ram since my first post, before we had
migrate_set_capability 
( https://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg00869.html )

> > +++ b/qapi-schema.json
> > @@ -702,6 +702,14 @@
> >  '*tls-port': 'int', '*cert-subject': 'str' } }
> > 
> >  ##
> > +# @migrate-start-postcopy
> > +#
> > +# Switch migration to postcopy mode
> 
> No documentation on the relation to the [x-]postcopy-ram capability bit?

docs/migration.txt does have an explanation, but I'm happy to expand
this if you think it would be helpful.

>  Will this command always fail if that bit is not set?

Yes:
if (!migrate_postcopy_ram()) {
error_setg(errp, "Enable postcopy with migration_set_capability before"
 " the start of migration");
return;
}


One alternative piece of text would be 
'Switch current migration to postcopy mode; the x-postcopy-ram capability must
be set before issuing this command.'

> > +#
> > +# Since: 2.5
> > +{ 'command': 'migrate-start-postcopy' }
> 
> Should we rename this command to 'x-migrate-start-postcopy' until we are
> ready to rename the entire feature to the stable namespace?

If you think it's best we could; however I took the 'x-' on the capability
just to be a flag to indicate it wasn't yet marked as stable;  I don't
think we're actually worrying about changes to naming.

> If so, I'm okay with that as a followup patch (so as not to delay the
> pull request), but we should really make up our minds what 2.5 will
> provide on this front.


Thanks,

Dave

> 
> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PULL v2 00/12] QAPI patches

2015-11-10 Thread Peter Maydell

On 10 November 2015 at 07:16, Markus Armbruster  wrote:
> v2:
> * PATCH 07: fix a comment typo [Eric]
> * PATCH 12: tweak commit message [Eric]
>
> The following changes since commit 9d5c1dc117d1ad881bbc76f6990ee1f9e9f8ef7f:
>
>   Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' 
> into staging (2015-11-09 11:20:51 +)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/armbru.git tags/pull-qapi-2015-11-10
>
> for you to fetch changes up to f5455044201747fd72531f5e8c1b1e9c56573d9c:
>
>   qapi-introspect: Document lack of sorting (2015-11-10 08:10:28 +0100)
>
> 
> QAPI patches

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PULL 00/57] Migration pull

2015-11-10 Thread Peter Maydell

On 9 November 2015 at 22:36, Eric Blake  wrote:
> The only POSIX-ly correct portable way to print ssize_t is via casts
> (yes, quite ugly), as in:
>
> printf("%zu", (size_t)(ssize_t_value));

I'm running a test build using this approach.

> I wish %zd were portably useful for printing ssize_t, but POSIX hasn't
> yet made that requirement.  And while I argue that mingw headers are
> broken (because they aren't doing the obvious implementation of size_t
> and ssize_t based on the same underlying type), it's also hard to argue
> that it is violating POSIX (since POSIX doesn't yet require the same
> underlying type).

I think at some point after 2.5 releases I will look at updating
my now-very-ancient w32 build setup to something slightly less
terrible (in particular, moving to mingw-w64; I'm pretty sure the
mingw32 toolchain I have now produces binaries that wouldn't run
due to TLS bugs if we tried them, and at some point it would be
neat to see if 'make check' can be made to work via WINE.) But I'd
rather not do that mid-release process.

thanks
-- PMM

Re: [Qemu-devel] [POC]colo-proxy in qemu

2015-11-10 Thread Dr. David Alan Gilbert

* Tkid (zhangchen.f...@cn.fujitsu.com) wrote:
> Hi,all
> 
> We are planning to reimplement colo proxy in userspace (Here is in qemu) to
> cache and compare net packets.This module is one of the important components
> of COLO project and now it is still in early stage, so any comments and
> feedback are warmly welcomed,thanks in advance.
> 
> ## Background
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
> Service)
> project is a high availability solution. Both Primary VM (PVM) and Secondary
> VM
> (SVM) run in parallel. They receive the same request from client, and
> generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint (on
> demand)
> is conducted.
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
> 
> By the needs of capturing response packets from PVM and SVM and finding out
> whether they are identical, we introduce a new module to qemu networking
> called
> colo-proxy.
> 
> This document describes the design of the colo-proxy module
> 
> ## Glossary
>   PVM - Primary VM, which provides services to clients.
>   SVM - Secondary VM, a hot standby and replication of PVM.
>   PN - Primary Node, the host which PVM runs on
>   SN - Secondary Node, the host which SVM runs on
> 
> ## Our Idea ##
> 
> COLO-Proxy
> COLO-Proxy is a part of COLO,based on qemu net filter and it's a plugin for
> qemu net filter.the function keep SVM connect normal to PVM and compare
> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
> 
> == Workflow ==
> 
> 
> +--+  +--+
> |PN|  |SN|
> +---+ +---+
> | +---+ | | +---+ |
> | |   | | | |   | |
> | |PVM| | | |SVM| |
> | |   | | | |   | |
> | +--+-^--+ | | +-^++ |
> || || |   ||  |
> || | ++ | | +---+ ||  |
> || | |COLO| |(socket) | |COLO   | ||  |
> || | | CheckPoint +-> CheckPoint| ||  |
> || | || |  (6)| |   | ||  |
> || | +-^--+ | | +---+ ||  |
> || |   (5) || |   ||  |
> || |   || |   ||  |
> | +--v-+--+ | Forward(socket) | +-+v+ |
> | |COLO Proxy  |  +---+(1)+->seq&ack adjust(2)| | |
> | |  +-+--+ | | +-+ | |
> | |  | Compare(4) <---+(3)+-+ COLO Proxy| |
> | +---+ | Forward(socket) | +---+ |
> ++Qemu+-+ ++Qemu+-+
>| ^
>| |
>| |
>   +v-++
>   |   |
>   |  Client   |
>   |   |
>   +---+
> 
> 
> (1)When PN receive client packets,PN COLO-Proxy copy and forward packets to
> SN COLO-Proxy.
> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
> adjusted packets to SVM
> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
> COLO-Proxy.

What protocol are you using for the data carried over the Forward(socket)?
I'm just wondering if there's an existing layer2 tunneling protocol that
it would be best to use.

> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
> compare PVM's packets data with SVM's packets data. If packets is different,
> compare
> module notify COLO CheckPoint module to do a checkpoint then send PVM's
> packets to
> client and drop SVM's packets, otherwise, just send PVM's packets to client
> and
> drop SVM's packets.
> (5)notify COLO-Checkpoint module checkpoint is needed
> (6)Do COLO-Checkpoint
> 
> ### QEMU space TCP/IP stack(Based on SLIRP) ###
> We need a QEMU space TCP/IP stack to help us to analysis packet. After
> looking
> into QEMU, we found that SLIRP
> 
> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
> 
> is a good choice for us. SLIRP proivdes a full TCP/IP stack within QEMU, it
> can
> help use to handle the packet written to/read from backend(tap) device which
> is
> just like a link layer(L2) packet.

I still think SLIRP might be painful; but it might be an easy one to start
with.

> ### Packet enqueue and compare ###
> Together with QEMU space TCP/IP stack, we enqueue all packets sent by PVM
> and

Re: [Qemu-devel] assert during internal snapshot

2015-11-10 Thread Stefan Hajnoczi

On Mon, Nov 09, 2015 at 03:29:13AM +, Li, Liang Z wrote:
> > -Original Message-
> > From: Denis V. Lunev [mailto:d...@openvz.org]
> > Sent: Saturday, November 07, 2015 11:20 PM
> > To: Li, Liang Z; Paolo Bonzini; Juan Quintela; Amit Shah
> > Cc: QEMU
> > Subject: assert during internal snapshot
> > 
> > Hello, All!
> > 
> > This commit
> > 
> > commit 94f5a43704129ca4995aa3385303c5ae225bde42
> > Author: Liang Li 
> > Date:   Mon Nov 2 15:37:00 2015 +0800
> > 
> >  migration: defer migration_end & blk_mig_cleanup
> > 
> >  Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
> >  lazy collapsing of small sptes into large sptes mechanism, now
> >  migration_end() is a time consuming operation because it calls
> >  memroy_global_dirty_log_stop(), which will trigger the dropping of 
> > small
> >  sptes operation and takes about dozens of milliseconds, so call
> >  migration_end() before all the vmsate data has already been transferred
> >  to the destination will prolong VM downtime. This operation should be
> >  deferred after all the data has been transferred to the destination.
> > 
> >  blk_mig_cleanup() can be deferred too.
> > 
> >  For a VM with 8G RAM, this patch can reduce the VM downtime about
> > 30 ms.
> > 
> >  Signed-off-by: Liang Li 
> >  Reviewed-by: Paolo Bonzini 
> >  Reviewed-by: Juan Quintela al3
> >  Reviewed-by: Amit Shah al3
> >  Signed-off-by: Juan Quintela al3
> > 
> > introduces the following regression
> > 
> > (gdb) bt
> > #0  0x7fd5d314a267 in __GI_raise (sig=sig@entry=6)
> >  at ../sysdeps/unix/sysv/linux/raise.c:55
> > #1  0x7fd5d314beca in __GI_abort () at abort.c:89
> > #2  0x7fd5d314303d in __assert_fail_base (
> >  fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
> >  assertion=assertion@entry=0x557288ed5b69 "i != mr->ioeventfd_nb",
> >  file=file@entry=0x557288ed5a36 "/home/den/src/qemu/memory.c",
> >  line=line@entry=1731,
> >  function=function@entry=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> > "memory_region_del_eventfd") at assert.c:92
> > #3  0x7fd5d31430f2 in __GI___assert_fail (
> >  assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
> >  file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
> >  function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> > "memory_region_del_eventfd") at assert.c:101
> > #4  0x557288b108fa in memory_region_del_eventfd
> > (mr=0x55728ad83700,
> >  addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
> >  at /home/den/src/qemu/memory.c:1731
> > #5  0x557288d9fc18 in virtio_pci_set_host_notifier_internal (
> >  proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
> >  at hw/virtio/virtio-pci.c:178
> > #6  0x557288da19a9 in virtio_pci_set_host_notifier (d=0x55728ad82e80,
> > n=0,
> >  assign=false) at hw/virtio/virtio-pci.c:984
> > #7  0x557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
> >  at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
> > #8  0x557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
> >  vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
> > #9  0x557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
> >  at /home/den/src/qemu/hw/virtio/virtio.c:966
> > #10 0x557288b67bbf in virtio_queue_host_notifier_read
> > (n=0x55728b220010)
> >  at /home/den/src/qemu/hw/virtio/virtio.c:1643
> > #11 0x557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
> > aio-posix.c:160
> > #12 0x557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
> >  callback=0x0, user_data=0x0) at async.c:226
> > #13 0x7fd5d409fff7 in g_main_context_dispatch ()
> > from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > ---Type  to continue, or q  to quit---
> > #14 0x557288e1110d in glib_pollfds_poll () at main-loop.c:211
> > #15 0x557288e111e8 in os_host_main_loop_wait (timeout=0) at
> > main-loop.c:256
> > #16 0x557288e11295 in main_loop_wait (nonblocking=0) at main-
> > loop.c:504
> > #17 0x557288c1c31c in main_loop () at vl.c:1890
> > #18 0x557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
> >  envp=0x7ffca9a6fd58) at vl.c:4644
> > (gdb)
> > 
> > during 'virsh create-snapshot' operation over alive VM.
> > It happens 100% of time when VM is run using the following command line:
> > 
> >   7498 ?tl 0:37 qemu-system-x86_64 -enable-kvm -name rhel7
> > -S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024 -
> > realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
> > iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
> > -no-user-config -nodefaults -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
> > -mon chardev=charmonitor,id=monitor,mode=control -rtc
> > base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet 
> > -no-

Re: [Qemu-devel] [PATCH v5 3/8] e1000: Introduced an array to control the access to the MAC registers

2015-11-10 Thread Leonid Bloch

On Tue, Nov 10, 2015 at 7:37 AM, Jason Wang  wrote:
>
>
> On 11/09/2015 10:59 PM, Leonid Bloch wrote:
>> The array of uint8_t's which is introduced here, contains useful metadata
>> about the MAC registers: if a register should be always accessible, or if
>> it is accessible, but partly implemented, or if the register requires a
>> certain compatibility flag to be accessed. Currently, 5 hypothetical flags
>> are supported (3 exist for e1000 so far) but if in the future more than 5
>> flags will be needed, the datatype of this array can simply be swapped for
>> a larger one.
>>
>> This patch is intended to solve the following current problems:
>>
>> 1) On migration between different versions of QEMU, which differ by the
>> MAC registers implemented in them, some registers need not to be active if
>> a compatibility flag is set, in order to preserve the machine's state
>> perfectly for the older version. Checking this for each register
>> individually, would create a lot of clutter in the code.
>>
>> 2) Some registers are (or may be) only partly implemented (e.g.
>> placeholders that allow reading and writing, but lack other functions).
>> In such cases it is better to print a debug warning on read/write attempts.
>> As above, dealing with this functionality on a per-register level, would
>> require longer and more messy code.
>>
>> Signed-off-by: Leonid Bloch 
>> Signed-off-by: Dmitry Fleytman 
>> ---
>>  hw/net/e1000.c | 85 
>> +-
>>  1 file changed, 73 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index 0e00afa..2bc533f 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -142,6 +142,8 @@ typedef struct E1000State_st {
>>  uint32_t compat_flags;
>>  } E1000State;
>>
>> +#define chkflag(x) (s->compat_flags & E1000_FLAG_##x)
>> +
>>  typedef struct E1000BaseClass {
>>  PCIDeviceClass parent_class;
>>  uint16_t phy_id2;
>> @@ -195,8 +197,7 @@ e1000_link_up(E1000State *s)
>>  static bool
>>  have_autoneg(E1000State *s)
>>  {
>> -return (s->compat_flags & E1000_FLAG_AUTONEG) &&
>> -   (s->phy_reg[PHY_CTRL] & MII_CR_AUTO_NEG_EN);
>> +return chkflag(AUTONEG) && (s->phy_reg[PHY_CTRL] & MII_CR_AUTO_NEG_EN);
>>  }
>>
>>  static void
>> @@ -321,7 +322,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t 
>> val)
>>  if (s->mit_timer_on) {
>>  return;
>>  }
>> -if (s->compat_flags & E1000_FLAG_MIT) {
>> +if (chkflag(MIT)) {
>>  /* Compute the next mitigation delay according to pending
>>   * interrupts and the current values of RADV (provided
>>   * RDTR!=0), TADV and ITR.
>> @@ -1258,6 +1259,43 @@ static void (*macreg_writeops[])(E1000State *, int, 
>> uint32_t) = {
>>
>>  enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
>>
>> +enum { MAC_ACCESS_ALWAYS = 1, MAC_ACCESS_PARTIAL = 2,
>> +   MAC_ACCESS_FLAG_NEEDED = 4 };
>> +
>> +#define markflag(x)((E1000_FLAG_##x << 3) | MAC_ACCESS_FLAG_NEEDED)
>> +/* In the array below the meaning of the bits is: [f|f|f|f|f|n|p|a]
>> + * f - flag bits (up to 5 possible flags)
>> + * n - flag needed
>> + * p - partially implenented
>> + * a - access enabled always
>> + * n=p=a=0 - not implemented or unknown */
>
> Looks like n=p=0 implies a=0? If yes we can probably get rid of bit 'a'
> and save lots of lines below?

n=p=0 does not quite imply that a=0: a counter example would be a
register that is fully implemented, and does not require a special
flag to be accessed.
But that "a" bit is redundant indeed - the check if the register is
implemented is already performed by looking in
macreg_readops/macreg_writeops. I included it just so that the
information on which registers are implemented will be present in
mac_reg_chk, if the need for it will raise in the future.

But when thinking of it now, you are right - I will remove it, and
rename the new array to "mac_reg_access", to better represent its
purpose.
>
> [...]

Re: [Qemu-devel] [PATCH v5 0/8] e1000: Various fixes and registers' implementation

2015-11-10 Thread Leonid Bloch

On Tue, Nov 10, 2015 at 8:21 AM, Jason Wang  wrote:
>
>
> On 11/09/2015 10:59 PM, Leonid Bloch wrote:
>> This series fixes issues with packet/octet counting in e1000's Statistic
>> registers, fixes a bug in the packet address filtering procedure, and
>> implements many MAC registers that were absent before, some Statistic
>> counters among them.
>>
>> Besides this, the series introduces a parameter which, if set to "on"
>> (default), will cause the entire MAC registers' array to migrate during
>> live migration (please see patch #2 for details). The rational behind
>> this is the ability to implement additional MAC registers in the future,
>> without worrying about migration compatibility between future versions.
>> For compatibility with previous versions, the above mentioned parameter
>> can be set to "off".
>>
>> Also, a new array is introduced to control the access to the various MAC
>> registers. This takes care of situations when a MAC register requires a
>> certain parameter to be accessed, or is partially implemented, and
>> requires a debug warning to be printed on access attempts.
>>
>> Additionally, several cosmetic changes are made.
>>
>> Differences v1-2:
>> 
>> * Wording of several commit messages corrected.
>> * For trivially implemented Diagnostic registers, a debug message is
>>   added on read/write attempts, alerting of incomplete implementation.
>> * Following testing on a physical device, only the lower 16 bits can now
>>   be read from AIT, and only the lower 4 - from FFMT*.
>> * The grow_8reg_if_not_full function is rewritten.
>> * inc_tx_bcast_or_mcast_count and increase_size_stats are now called
>>   from within e1000_send_packet, to avoid code duplication.
>>
>> Differences v2-3:
>> 
>> * Minor rewordings of some commit messages (0002, 0003).
>> * Live migration capability is added to the newly implemented registers.
>>
>> Differences v3-4:
>> 
>> * Introduction of the "full_mac_registers" parameter (see above).
>> * Reversion of the live migration handling introduced in v3.
>> * Small alignment changes in patch #1 to correspond with the following
>>   patches.
>>
>> Differences v4-v5:
>> 
>> * Introduction of an array to control the access to the MAC registers.
>> * Removal of the specific functions that warned of partial
>>   implementation on read/write from patch 4.
>> * Adequate changes to patches 4 and 8: mainly adding the registers
>>   introduced there to the new array.
>>
>> The majority of these changes result from Jason Wang's review - thank
>> you, Jason!
>
> Thanks a lot for the patches. Almost done with two minor concerns:
>
> 1) to unbreak bisection we'd better enable the extra_mac_registers (and
> compatibility stuffs) in patch 8 or patch 9

Do you mean by that changing patch 2, so that the compatibility would
be "on" by default, and setting it to "off" by default only in patch
8, or an additional patch 9?
> 2) looks like we could save some lines of codes in patch 3, see the
> comment in that patch
>
> Since we're near to soft freeze (12th), want to ask whether or not you
> want to send a v6 or I can fix 1 my self. (if 2 is correct, we can do
> optimizations on top).

Will send a v6 with a fix to 2 today. Regarding 1 - awaiting your answer.

Thanks,
Leonid.
>
>>
>> Leonid Bloch (8):
>>   e1000: Cosmetic and alignment fixes
>>   e1000: Add support for migrating the entire MAC registers' array
>>   e1000: Introduced an array to control the access to the MAC registers
>>   e1000: Trivial implementation of various MAC registers
>>   e1000: Fixing the received/transmitted packets' counters
>>   e1000: Fixing the received/transmitted octets' counters
>>   e1000: Fixing the packet address filtering procedure
>>   e1000: Implementing various counters
>>
>>  hw/net/e1000.c  | 503 
>> +---
>>  hw/net/e1000_regs.h |   8 +-
>>  include/hw/compat.h |   4 +
>>  3 files changed, 406 insertions(+), 109 deletions(-)
>>
>

[Qemu-devel] [PATCH V5] hw/virtio: Add PCIe capability to virtio devices

2015-11-10 Thread Marcel Apfelbaum

The virtio devices are converted to PCI-Express
if they are plugged into a PCI-Express bus and
the 'modern' protocol is enabled.

Devices plugged directly into the Root Complex as
Integrated Endpoints remain PCI.

Signed-off-by: Marcel Apfelbaum 
---
v4 -> v5:
 - Addressed Michael S. Tsirkin's comments:
   - Renamed disable-pcie => x-disable-pcie
 - Rebased on master
 
v3 -> v4:
 - Addressed Eduardo Habkost's comments:
  - used a single virtio-pci.disable-pcie=on entry for HW_COMPAT,
instead of one entry for each subclass

v2 -> v3:
 - Addressed Michael S. Tsirkin's comments:
   - enable pcie only for 2.5+ machines.

v1 -> v2:
 - Addressed Michael S. Tsirkin's comments:
   - Added the minimum required capabilities for PCIe devices
   - Integrated Endpoints remain PCI

 - Use pcie_endpoint_cap_init instead of manually creating the pcie capability.

 - Regarding Gerd Hoffman's comments:
   - Creating virtio-pcie devices:
 For the moment I prefer to not duplicate the virtio definitions,
 at least until we don't have a consensus (Personally I don't like it)
   - Removing the IO bar:
 This would be my next patch on the "virtio to express" series, I plan
 to remove it only for "modern" devices.

Thanks,
Marcel

 hw/virtio/virtio-pci.c | 22 ++
 hw/virtio/virtio-pci.h |  2 ++
 include/hw/compat.h|  4 
 3 files changed, 28 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index f55dd2b..ba02e25 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1592,6 +1592,26 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 address_space_init(&proxy->modern_as, &proxy->modern_cfg, 
"virtio-pci-cfg-as");
 
+if (!(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_PCIE)
+&& !(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_MODERN)
+&& pci_bus_is_express(pci_dev->bus)
+&& !pci_bus_is_root(pci_dev->bus)) {
+int pos;
+
+pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
+pos = pcie_endpoint_cap_init(pci_dev, 0);
+assert(pos > 0);
+
+pos = pci_add_capability(pci_dev, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF);
+assert(pos > 0);
+
+/*
+ * Indicates that this function complies with revision 1.2 of the
+ * PCI Power Management Interface Specification.
+ */
+pci_set_word(pci_dev->config + pos + PCI_PM_PMC, 0x3);
+}
+
 virtio_pci_bus_new(&proxy->bus, sizeof(proxy->bus), proxy);
 if (k->realize) {
 k->realize(proxy, errp);
@@ -1622,6 +1642,8 @@ static Property virtio_pci_properties[] = {
 VIRTIO_PCI_FLAG_DISABLE_LEGACY_BIT, false),
 DEFINE_PROP_BIT("disable-modern", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_DISABLE_MODERN_BIT, true),
+DEFINE_PROP_BIT("x-disable-pcie", VirtIOPCIProxy, flags,
+VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 801c23a..1a487fc 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -72,8 +72,10 @@ typedef struct VirtioBusClass VirtioPCIBusClass;
 /* virtio version flags */
 #define VIRTIO_PCI_FLAG_DISABLE_LEGACY_BIT 2
 #define VIRTIO_PCI_FLAG_DISABLE_MODERN_BIT 3
+#define VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT 4
 #define VIRTIO_PCI_FLAG_DISABLE_LEGACY (1 << 
VIRTIO_PCI_FLAG_DISABLE_LEGACY_BIT)
 #define VIRTIO_PCI_FLAG_DISABLE_MODERN (1 << 
VIRTIO_PCI_FLAG_DISABLE_MODERN_BIT)
+#define VIRTIO_PCI_FLAG_DISABLE_PCIE (1 << VIRTIO_PCI_FLAG_DISABLE_PCIE_BIT)
 
 typedef struct {
 MSIMessage msg;
diff --git a/include/hw/compat.h b/include/hw/compat.h
index 93e71af..3dfdb60 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -6,6 +6,10 @@
 .driver   = "virtio-blk-device",\
 .property = "scsi",\
 .value= "true",\
+},{\
+.driver   = "virtio-pci",\
+.property = "x-disable-pcie",\
+.value= "on",\
 },
 
 #define HW_COMPAT_2_3 \
-- 
2.1.0

Re: [Qemu-devel] assert during internal snapshot

2015-11-10 Thread Denis V. Lunev


On 11/10/2015 02:00 PM, Stefan Hajnoczi wrote:

On Mon, Nov 09, 2015 at 03:29:13AM +, Li, Liang Z wrote:

-Original Message-
From: Denis V. Lunev [mailto:d...@openvz.org]
Sent: Saturday, November 07, 2015 11:20 PM
To: Li, Liang Z; Paolo Bonzini; Juan Quintela; Amit Shah
Cc: QEMU
Subject: assert during internal snapshot

Hello, All!

This commit

commit 94f5a43704129ca4995aa3385303c5ae225bde42
Author: Liang Li 
Date:   Mon Nov 2 15:37:00 2015 +0800

  migration: defer migration_end & blk_mig_cleanup

  Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
  lazy collapsing of small sptes into large sptes mechanism, now
  migration_end() is a time consuming operation because it calls
  memroy_global_dirty_log_stop(), which will trigger the dropping of small
  sptes operation and takes about dozens of milliseconds, so call
  migration_end() before all the vmsate data has already been transferred
  to the destination will prolong VM downtime. This operation should be
  deferred after all the data has been transferred to the destination.

  blk_mig_cleanup() can be deferred too.

  For a VM with 8G RAM, this patch can reduce the VM downtime about
30 ms.

  Signed-off-by: Liang Li 
  Reviewed-by: Paolo Bonzini 
  Reviewed-by: Juan Quintela al3
  Reviewed-by: Amit Shah al3
  Signed-off-by: Juan Quintela al3

introduces the following regression

(gdb) bt
#0  0x7fd5d314a267 in __GI_raise (sig=sig@entry=6)
  at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x7fd5d314beca in __GI_abort () at abort.c:89
#2  0x7fd5d314303d in __assert_fail_base (
  fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
  assertion=assertion@entry=0x557288ed5b69 "i != mr->ioeventfd_nb",
  file=file@entry=0x557288ed5a36 "/home/den/src/qemu/memory.c",
  line=line@entry=1731,
  function=function@entry=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
"memory_region_del_eventfd") at assert.c:92
#3  0x7fd5d31430f2 in __GI___assert_fail (
  assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
  file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
  function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
"memory_region_del_eventfd") at assert.c:101
#4  0x557288b108fa in memory_region_del_eventfd
(mr=0x55728ad83700,
  addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
  at /home/den/src/qemu/memory.c:1731
#5  0x557288d9fc18 in virtio_pci_set_host_notifier_internal (
  proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
  at hw/virtio/virtio-pci.c:178
#6  0x557288da19a9 in virtio_pci_set_host_notifier (d=0x55728ad82e80,
n=0,
  assign=false) at hw/virtio/virtio-pci.c:984
#7  0x557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
  at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
#8  0x557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
  vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
#9  0x557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
  at /home/den/src/qemu/hw/virtio/virtio.c:966
#10 0x557288b67bbf in virtio_queue_host_notifier_read
(n=0x55728b220010)
  at /home/den/src/qemu/hw/virtio/virtio.c:1643
#11 0x557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
aio-posix.c:160
#12 0x557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
  callback=0x0, user_data=0x0) at async.c:226
#13 0x7fd5d409fff7 in g_main_context_dispatch ()
 from /lib/x86_64-linux-gnu/libglib-2.0.so.0
---Type  to continue, or q  to quit---
#14 0x557288e1110d in glib_pollfds_poll () at main-loop.c:211
#15 0x557288e111e8 in os_host_main_loop_wait (timeout=0) at
main-loop.c:256
#16 0x557288e11295 in main_loop_wait (nonblocking=0) at main-
loop.c:504
#17 0x557288c1c31c in main_loop () at vl.c:1890
#18 0x557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
  envp=0x7ffca9a6fd58) at vl.c:4644
(gdb)

during 'virsh create-snapshot' operation over alive VM.
It happens 100% of time when VM is run using the following command line:

   7498 ?tl 0:37 qemu-system-x86_64 -enable-kvm -name rhel7
-S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024 -
realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-
shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot
strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7
-device
ich9-usb-
uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
-device
ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
-device
ich9-usb-uhci3,masterb

Re: [Qemu-devel] [PATCH] hw/arm/virt: error_report cleanups

2015-11-10 Thread Peter Maydell

On 10 November 2015 at 09:39, Markus Armbruster  wrote:
> Peter Maydell  writes:
>> ...so in conclusion Andrew's patch is correct as it stands
>> and I should just apply it? :-)
>
> Yes.  It got my R-by :)

OK, applied to target-arm.next. Thanks for walking me through this.

-- PMM

Re: [Qemu-devel] [PATCH v10 00/30] qapi member collision (post-introspection cleanups, subset C')

2015-11-10 Thread Markus Armbruster

Markus Armbruster  writes:

> Eric Blake  writes:
>
>> On 11/09/2015 02:59 AM, Markus Armbruster wrote:
>>> Eric Blake  writes:
>>> 
 On 11/06/2015 09:03 AM, Markus Armbruster wrote:
> Eric Blake  writes:
>>> [...]
>> Hopefully, we are converging on something that will be ready
>> for a pull request, especially for the earlier patches of this
>> subset.
>
> I guess you mean PATCH 01-12.  I had a few questions, but the most
> likely outcome seems to be minor touchups I could apply in my tree.
>
> I'm okay with trying to get more patches in, but let's get these out of
> the way meanwhile.

 Yes, 01-12 seems like a good first set, if you want to make those
 touchups (I've supplied some potential text improvements in reply to
 some of your comments); I'm happy, as always, to take a peek over your
 staging repo to double check what you are prepping for the pull request.
>>> 
>>> Please have a look at my qapi-next branch.
>>
>> Close, but commit 5f72bb85 on that branch is missing the one-line summary:
>>
>> qapi: Simplify error cleanup in test-qmp-*
>
> Fixed & sent pull request, thanks!

Pulled.

I think I'll close the QAPI floodgates for 2.5 now.  Bug fixes are of
course exempted, and if we find something that impacts ABI, I'm willing
to consider patches.  Work on our backlog can continue uninterrupted;
I'm happy to collect patches that are ready, and will take care of
getting them into master once 2.6 opens.

Re: [Qemu-devel] [PATCH for-2.5] hw/timer/hpet.c: Avoid signed integer overflow which results in bugs on OSX

2015-11-10 Thread Michael S. Tsirkin

On Tue, Nov 10, 2015 at 10:04:40AM +, Peter Maydell wrote:
> On 9 November 2015 at 20:17, Michael S. Tsirkin  wrote:
> > On Mon, Nov 09, 2015 at 02:56:31PM +, Peter Maydell wrote:
> >> Signed integer overflow in C is undefined behaviour, and the compiler
> >> is at liberty to assume it can never happen and optimize accordingly.
> >> In particular, the subtractions in hpet_time_after() and 
> >> hpet_time_after64()
> >> were causing OSX clang to optimize the code such that it was prone to
> >> hangs and complaints about the main loop stalling (presumably because
> >> we were spending all our time trying to service very high frequency
> >> HPET timer callbacks). The clang sanitizer confirms the UB:
> >>
> >> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: 
> >> -2146967296 - 2147003978 cannot be represented in type 'int'
> >>
> >> Fix this by doing the subtraction as an unsigned operation and then
> >> converting to signed for the comparison.
> >>
> >> Reported-by: Aaron Elkins 
> >> Signed-off-by: Peter Maydell 
> >
> > Agree, this makes no sense the way it's written.
> >
> > Reviewed-by: Michael S. Tsirkin 
> >
> > I'll pick this up in the next pull if Paolo doesn't
> > beat me to it.
> 
> I went ahead and committed it to master yesterday; sorry
> if that was a bit hasty of me.


That's fine too.

> thanks
> -- PMM

Re: [Qemu-devel] [PATCH 0/7] int128: reparing broken 128 bit memory calculations

2015-11-10 Thread Pierre Morel




On 11/10/2015 10:08 AM, Pierre Morel wrote:



On 11/09/2015 01:20 PM, Paolo Bonzini wrote:


On 09/11/2015 13:01, Pierre Morel wrote:

This leads to have UINT64_MAX represented with {1, 0} instead of
{0, UINT64_MAX} while {1, 0} is 2^64. This again leads to have
unnecessary and obfuscating transformations with int128_2_64() to
test for UINT64_MAX and return {1,0} in memory_region_init()
while using inverse translation test{1,0} and return UINT64_MAX
in memory_region_size()>>

Yes, the use of UINT64_MAX for 2^64 is a hack, but it is unrelated to
the signedness of Int128.

OK, we agree it is a hack,
but sorry, I should have missed something,
because I do not understand what this hack is useful for.

It's used in the size argument of memory_region_init*, so that it can
remain an uint64_t.  The size is usually small (up to 2^40, say) unless
it is 2^64 meaning "the whole address space".  The latter case is
covered by UINT64_MAX.

Paolo



OK, I understand, thanks for having taking time for me.

To sum-up size is a size :-) and not an offset in memory.

Size of UINT64_MAX does not exist but we can live without it, having
a description for "whole address space", 2^64, can be useful.

Even there may be other solutions like taking 0 for 2^64,
if a memory size of 0 has no meaning,
but it could be misleading too.

So I do not see better solution for this interesting problematic.





My problem with this came because usually, on hardware, region are easier
described by start/end rather than by start/size.
But  changing this in the actual implementation would be too much.

Re: [Qemu-devel] [PATCH v2] target-arm: Clean up DISAS_UPDATE usage in AArch32 translation code

2015-11-10 Thread Peter Maydell

On 9 November 2015 at 19:37, Sergey Fedorov  wrote:
> Though I don't clearly understand how singlestepping is done here, I just do
> what Peter suggested in his commnets for v1 and send this patch for review. 
> I'm
> going to get into this while the patch is in review process...

So the way the 32-bit code works for singlestep is complicated
because of the need to handle the conditional instructions,
which means you get a lot more cases like "this is a conditional
SWI" that need to be handled. A quick summary of some of the
possible cases:

 * unconditional normal instruction:
-- need to write the PC and condexec bits back to the CPU state
-- then take a singlestep insn (either the architectural one
   or the EXCP_DEBUG one depending on which sort of step we are doing)
 * unconditional exception-generating instruction
-- for architectural step of SWI/HVC/SMC we need to advance the
   singlestep state machine so that they behave correctly
-- generate the relevant exception and then no point writing the
   code to take EXCP_DEBUG &c because we won't get to it
 * conditional instruction (including cond. branches):
-- earlier code has already written back the PC for the
   "condition passed" case
-- write out the code which takes the singlestep exception for
   the "condition passed" case
-- then do gen_set_label(dc->condlabel)
-- then the code to take the single step exception after
   executing for the "condition failed" case

In particular in this bit:
if (dc->condjmp || !dc->is_jmp) {
gen_set_pc_im(dc, dc->pc);
dc->condjmp = 0;
}
the cases when we need to update the PC are
(a) for the condition-failed codepath of a conditional insn
(the condition-passed codepath will already have written PC)
(b) for a non-conditional insn that hasn't already written PC

The A64 equivalent is much simpler because the only cases we
need to handle are:
 * exception already generated (no point writing anything)
 * jumps (PC already written, just write code to take the step exception)
 * everything else (write PC then take step exception)

I'll review the patch after lunch.

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/57] Migration pull

2015-11-10 Thread Peter Maydell

On 10 November 2015 at 10:53, Peter Maydell  wrote:
> On 9 November 2015 at 22:36, Eric Blake  wrote:
>> The only POSIX-ly correct portable way to print ssize_t is via casts
>> (yes, quite ugly), as in:
>>
>> printf("%zu", (size_t)(ssize_t_value));
>
> I'm running a test build using this approach.

The following fixup patch was sufficient to get the pull through
my tests.

Signed-off-by: Peter Maydell 

diff --git a/migration/migration.c b/migration/migration.c
index 58eb099..c5c977e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1284,9 +1284,9 @@ static void *source_return_path_thread(void *opaque)
 header_len != rp_cmd_args[header_type].len) ||
 header_len > max_len) {
 error_report("RP: Received '%s' message (0x%04x) with"
-"incorrect length %d expecting %zd",
+"incorrect length %d expecting %zu",
 rp_cmd_args[header_type].name, header_type, header_len,
-rp_cmd_args[header_type].len);
+(size_t)rp_cmd_args[header_type].len);
 mark_source_rp_bad(ms);
 goto out;
 }
diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index 7ccdf69..c503b02 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -55,8 +55,8 @@ static ssize_t socket_writev_buffer(void *opaque,
struct iovec *iov, int iovcnt,
 err = socket_error();

 if (err != EAGAIN && err != EWOULDBLOCK) {
-error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
- err, size, len);
+error_report("socket_writev_buffer: Got err=%d for (%zu/%zu)",
+ err, (size_t)size, (size_t)len);
 /*
  * If I've already sent some but only just got the error, I
  * could return the amount validly sent so far and wait for the
diff --git a/migration/savevm.c b/migration/savevm.c
index fad34b8..be52314 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1602,8 +1602,9 @@ static int loadvm_process_command(QEMUFile *f)
 }

 if (mig_cmd_args[cmd].len != -1 && mig_cmd_args[cmd].len != len) {
-error_report("%s received with bad length - expecting %zd, got %d",
-  mig_cmd_args[cmd].name, mig_cmd_args[cmd].len, len);
+error_report("%s received with bad length - expecting %zu, got %d",
+ mig_cmd_args[cmd].name,
+ (size_t)mig_cmd_args[cmd].len, len);
 return -ERANGE;
 }

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/57] Migration pull

2015-11-10 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On 10 November 2015 at 10:53, Peter Maydell  wrote:
> > On 9 November 2015 at 22:36, Eric Blake  wrote:
> >> The only POSIX-ly correct portable way to print ssize_t is via casts
> >> (yes, quite ugly), as in:
> >>
> >> printf("%zu", (size_t)(ssize_t_value));
> >
> > I'm running a test build using this approach.
> 
> The following fixup patch was sufficient to get the pull through
> my tests.
> 
> Signed-off-by: Peter Maydell 

OK, it's just error messages anyway (we'll probably get afew
odd big values in there in the -ve cases).

Reviewed-by: Dr. David Alan Gilbert 

Is this OK or are you expecting Juan to send you a new pull?

Dave

> diff --git a/migration/migration.c b/migration/migration.c
> index 58eb099..c5c977e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1284,9 +1284,9 @@ static void *source_return_path_thread(void *opaque)
>  header_len != rp_cmd_args[header_type].len) ||
>  header_len > max_len) {
>  error_report("RP: Received '%s' message (0x%04x) with"
> -"incorrect length %d expecting %zd",
> +"incorrect length %d expecting %zu",
>  rp_cmd_args[header_type].name, header_type, header_len,
> -rp_cmd_args[header_type].len);
> +(size_t)rp_cmd_args[header_type].len);
>  mark_source_rp_bad(ms);
>  goto out;
>  }
> diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
> index 7ccdf69..c503b02 100644
> --- a/migration/qemu-file-unix.c
> +++ b/migration/qemu-file-unix.c
> @@ -55,8 +55,8 @@ static ssize_t socket_writev_buffer(void *opaque,
> struct iovec *iov, int iovcnt,
>  err = socket_error();
> 
>  if (err != EAGAIN && err != EWOULDBLOCK) {
> -error_report("socket_writev_buffer: Got err=%d for 
> (%zd/%zd)",
> - err, size, len);
> +error_report("socket_writev_buffer: Got err=%d for 
> (%zu/%zu)",
> + err, (size_t)size, (size_t)len);
>  /*
>   * If I've already sent some but only just got the error, I
>   * could return the amount validly sent so far and wait for 
> the
> diff --git a/migration/savevm.c b/migration/savevm.c
> index fad34b8..be52314 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1602,8 +1602,9 @@ static int loadvm_process_command(QEMUFile *f)
>  }
> 
>  if (mig_cmd_args[cmd].len != -1 && mig_cmd_args[cmd].len != len) {
> -error_report("%s received with bad length - expecting %zd, got %d",
> -  mig_cmd_args[cmd].name, mig_cmd_args[cmd].len, len);
> +error_report("%s received with bad length - expecting %zu, got %d",
> + mig_cmd_args[cmd].name,
> + (size_t)mig_cmd_args[cmd].len, len);
>  return -ERANGE;
>  }
> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 0/4] s390: Allow hotplug of s390 CPUs

2015-11-10 Thread Bharata B Rao

On Mon, Nov 09, 2015 at 09:04:27PM +0100, Christian Borntraeger wrote:
> 
> > Bharata did implement device_add for pseries, I thought.

Yes. For pseries, my patchset did CPU hotplug via device_add and
device_del commands. And that's what I am planning to stick to
going forward.

> 
> Seems that the patches did not make it into upstream yet.
> Bharata, is cpu hotplug on pseries still missing?

Yes, the last version I posted is at
https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg00650.html

Andreas mentioned that he would like to see some generic changes for
supporting device_add for CPU, socket vs core level hotplug, defining the
semantics etc. There was some discussion towards that and this is last I
heard about it:

https://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05347.html

Regards,
Bharata.

Re: [Qemu-devel] [PULL 00/57] Migration pull

2015-11-10 Thread Peter Maydell

On 10 November 2015 at 12:22, Dr. David Alan Gilbert
 wrote:
> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>> On 10 November 2015 at 10:53, Peter Maydell  wrote:
>> > On 9 November 2015 at 22:36, Eric Blake  wrote:
>> >> The only POSIX-ly correct portable way to print ssize_t is via casts
>> >> (yes, quite ugly), as in:
>> >>
>> >> printf("%zu", (size_t)(ssize_t_value));
>> >
>> > I'm running a test build using this approach.
>>
>> The following fixup patch was sufficient to get the pull through
>> my tests.
>>
>> Signed-off-by: Peter Maydell 
>
> OK, it's just error messages anyway (we'll probably get afew
> odd big values in there in the -ve cases).
>
> Reviewed-by: Dr. David Alan Gilbert 
>
> Is this OK or are you expecting Juan to send you a new pull?

I need a new pull, yes.

-- PMM

Re: [Qemu-devel] [PATCH v9 11/15] qmp: Introduce blockdev-change-medium

2015-11-10 Thread Kevin Wolf

Am 06.11.2015 um 16:27 hat Max Reitz geschrieben:
> Introduce a new QMP command 'blockdev-change-medium' which is intended
> to replace the 'change' command for block devices. The existing function
> qmp_change_blockdev() is accordingly renamed to
> qmp_blockdev_change_medium().
> 
> Signed-off-by: Max Reitz 

Thanks, updated the queued patch with this one.

Kevin

Re: [Qemu-devel] [PATCH QEMU-XEN v5 9/9] xen: make it possible to build without the Xen PV domain builder

2015-11-10 Thread Stefano Stabellini

On Mon, 9 Nov 2015, Ian Campbell wrote:
> Until the previous patch this relied on xc_fd(), which was only
> implemented for Xen 4.0 and earlier.
> 
> Given this wasn't working since Xen 4.0 I have marked this as disabled
> by default.
> 
> Removing this support drops the use of a bunch of symbols from
> libxenctrl, specifically:
> 
>   - xc_domain_create
>   - xc_domain_destroy
>   - xc_domain_getinfo
>   - xc_domain_max_vcpus
>   - xc_domain_setmaxmem
>   - xc_domain_unpause
>   - xc_evtchn_alloc_unbound
>   - xc_linux_build
> 
> This is another step towards only using Xen libraries which provide a
> stable inteface.
> 
> Signed-off-by: Ian Campbell 

Reviewed-by: Stefano Stabellini 


> v5: XEN_CREATE entirely wihtin CONFIG_XEN_PV_DOMAIN_BUILD ifdef.
> Simplify configure'ry.
> 
> v4: Fixed all checkpatch errors.
> Disabled by default.
> ---
>  configure | 15 +++
>  hw/xenpv/Makefile.objs|  4 +++-
>  hw/xenpv/xen_machine_pv.c | 15 +++
>  3 files changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/configure b/configure
> index 0253cc9..4ab6d9c 100755
> --- a/configure
> +++ b/configure
> @@ -247,6 +247,7 @@ vnc_jpeg=""
>  vnc_png=""
>  xen=""
>  xen_ctrl_version=""
> +xen_pv_domain_build="no"
>  xen_pci_passthrough=""
>  linux_aio=""
>  cap_ng=""
> @@ -917,6 +918,10 @@ for opt do
>;;
>--enable-xen-pci-passthrough) xen_pci_passthrough="yes"
>;;
> +  --disable-xen-pv-domain-build) xen_pv_domain_build="no"
> +  ;;
> +  --enable-xen-pv-domain-build) xen_pv_domain_build="yes"
> +  ;;
>--disable-brlapi) brlapi="no"
>;;
>--enable-brlapi) brlapi="yes"
> @@ -2172,6 +2177,12 @@ if test "$xen_pci_passthrough" != "no"; then
>fi
>  fi
>  
> +if test "$xen_pv_domain_build" = "yes" &&
> +   test "$xen" != "yes"; then
> +error_exit "User requested Xen PV domain builder support" \
> +"which requires Xen support."
> +fi
> +
>  ##
>  # libtool probe
>  
> @@ -4762,6 +4773,7 @@ fi
>  echo "xen support   $xen"
>  if test "$xen" = "yes" ; then
>echo "xen ctrl version  $xen_ctrl_version"
> +  echo "pv dom build  $xen_pv_domain_build"
>  fi
>  echo "brlapi support$brlapi"
>  echo "bluez  support$bluez"
> @@ -5130,6 +5142,9 @@ fi
>  if test "$xen" = "yes" ; then
>echo "CONFIG_XEN_BACKEND=y" >> $config_host_mak
>echo "CONFIG_XEN_CTRL_INTERFACE_VERSION=$xen_ctrl_version" >> 
> $config_host_mak
> +  if test "$xen_pv_domain_build" = "yes" ; then
> +echo "CONFIG_XEN_PV_DOMAIN_BUILD=y" >> $config_host_mak
> +  fi
>  fi
>  if test "$linux_aio" = "yes" ; then
>echo "CONFIG_LINUX_AIO=y" >> $config_host_mak
> diff --git a/hw/xenpv/Makefile.objs b/hw/xenpv/Makefile.objs
> index 49f6e9e..bbf5873 100644
> --- a/hw/xenpv/Makefile.objs
> +++ b/hw/xenpv/Makefile.objs
> @@ -1,2 +1,4 @@
>  # Xen PV machine support
> -obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o
> +obj-$(CONFIG_XEN) += xen_machine_pv.o
> +# Xen PV machine builder support
> +obj-$(CONFIG_XEN_PV_DOMAIN_BUILD) += xen_domainbuild.o
> diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c
> index 23d6ef0..3250b94 100644
> --- a/hw/xenpv/xen_machine_pv.c
> +++ b/hw/xenpv/xen_machine_pv.c
> @@ -30,9 +30,6 @@
>  
>  static void xen_init_pv(MachineState *machine)
>  {
> -const char *kernel_filename = machine->kernel_filename;
> -const char *kernel_cmdline = machine->kernel_cmdline;
> -const char *initrd_filename = machine->initrd_filename;
>  DriveInfo *dinfo;
>  int i;
>  
> @@ -46,17 +43,27 @@ static void xen_init_pv(MachineState *machine)
>  case XEN_ATTACH:
>  /* nothing to do, xend handles everything */
>  break;
> -case XEN_CREATE:
> +#ifdef CONFIG_XEN_PV_DOMAIN_BUILD
> +case XEN_CREATE: {
> +const char *kernel_filename = machine->kernel_filename;
> +const char *kernel_cmdline = machine->kernel_cmdline;
> +const char *initrd_filename = machine->initrd_filename;
>  if (xen_domain_build_pv(kernel_filename, initrd_filename,
>  kernel_cmdline) < 0) {
>  fprintf(stderr, "xen pv domain creation failed\n");
>  exit(1);
>  }
>  break;
> +}
> +#endif
>  case XEN_EMULATE:
>  fprintf(stderr, "xen emulation not implemented (yet)\n");
>  exit(1);
>  break;
> +default:
> +fprintf(stderr, "unhandled xen_mode %d\n", xen_mode);
> +exit(1);
> +break;
>  }
>  
>  xen_be_register("console", &xen_console_ops);
> -- 
> 2.1.4
>

Re: [Qemu-devel] [PATCH 2.5 v5 0/11] dataplane snapshot fixes

2015-11-10 Thread Denis V. Lunev


On 11/10/2015 12:05 AM, Eric Blake wrote:

On 11/06/2015 02:13 PM, Denis V. Lunev wrote:


That is a case of using libvirt to trigger internal snapshots...


The HMP monitor is legacy and also not used by modern libvirt.

...and libvirt is forced to use HMP for internal snapshots, since we
_still_ haven't exposed internal snapshots as a QMP command.

Eric,

by the way, there is a user visible bug with this too :))

EFI based VM with pflash storage for NVRAM could not
be snapshoted as libvirt configures storage as 'raw'.
OK, this is a libvirt bug. The patch will be sent next
week or in a week by my colleague switching storage
type from 'raw' to 'qcow2'.

Not necessarily a bug in libvirt, so much as a limitation that the
current qemu implementation of internal snapshots requires qcow2 for all
storage devices (even though it might not be technically necessary, if
there were an easy way to snapshot state of one non-qcow2 storage
alongside the rest of the machine state stored elsewhere).  But a
libvirt patch would certainly be useful.


For a QEMU this results in the following:
- QEMU receives command via HMP to make a snapshot
- it fails, but QEMU does not see that fact (error code
   is not delivered to libvirt in HMP AFAIK)

Libvirt is still using QMP to deliver the HMP command (via the QMP
human-monitor-command); if I'm understanding your complaint correctly,
you are saying that qemu doesn't do error reporting correctly for that?
If so, fix that in qemu, although libvirt should also be able to work
around it when dealing with broken qemu.


- on request to switch to snapshot the commands
   just do nothing and from the point of libvirt the command
   was successful

We should have these commands even in the simple
rudimentary current non-ideal form even just as wrappers
around HMP functions.

Do you have an opinion about importance of the last
issue? Should it be considered for 2.6?

We've gone since 0.14 without anyone writing the remaining few QMP
commands to completely obsolete the need for human-monitor-command
backdoors into HMP.  Volunteers are welcome to submit code to complete
the conversion, but the length of time it has taken so far may be an
indication that it is not as easy as you think.


I have one for snapshot operations :)

Den

Re: [Qemu-devel] [PATCH v2] hw/misc: Add support for ADC controller in Xilinx Zynq 7000

2015-11-10 Thread Peter Maydell

On 2 November 2015 at 06:33, Peter Crosthwaite
 wrote:
> I've made a v3 of this, some comments on changes below.

The v3 seems to have never hit the list?

thanks
-- PMM

Re: [Qemu-devel] [PATCH v5 0/8] e1000: Various fixes and registers' implementation

2015-11-10 Thread Jason Wang



On 11/10/2015 07:39 PM, Leonid Bloch wrote:
> On Tue, Nov 10, 2015 at 8:21 AM, Jason Wang  wrote:
>>
>> On 11/09/2015 10:59 PM, Leonid Bloch wrote:
>>> This series fixes issues with packet/octet counting in e1000's Statistic
>>> registers, fixes a bug in the packet address filtering procedure, and
>>> implements many MAC registers that were absent before, some Statistic
>>> counters among them.
>>>
>>> Besides this, the series introduces a parameter which, if set to "on"
>>> (default), will cause the entire MAC registers' array to migrate during
>>> live migration (please see patch #2 for details). The rational behind
>>> this is the ability to implement additional MAC registers in the future,
>>> without worrying about migration compatibility between future versions.
>>> For compatibility with previous versions, the above mentioned parameter
>>> can be set to "off".
>>>
>>> Also, a new array is introduced to control the access to the various MAC
>>> registers. This takes care of situations when a MAC register requires a
>>> certain parameter to be accessed, or is partially implemented, and
>>> requires a debug warning to be printed on access attempts.
>>>
>>> Additionally, several cosmetic changes are made.
>>>
>>> Differences v1-2:
>>> 
>>> * Wording of several commit messages corrected.
>>> * For trivially implemented Diagnostic registers, a debug message is
>>>   added on read/write attempts, alerting of incomplete implementation.
>>> * Following testing on a physical device, only the lower 16 bits can now
>>>   be read from AIT, and only the lower 4 - from FFMT*.
>>> * The grow_8reg_if_not_full function is rewritten.
>>> * inc_tx_bcast_or_mcast_count and increase_size_stats are now called
>>>   from within e1000_send_packet, to avoid code duplication.
>>>
>>> Differences v2-3:
>>> 
>>> * Minor rewordings of some commit messages (0002, 0003).
>>> * Live migration capability is added to the newly implemented registers.
>>>
>>> Differences v3-4:
>>> 
>>> * Introduction of the "full_mac_registers" parameter (see above).
>>> * Reversion of the live migration handling introduced in v3.
>>> * Small alignment changes in patch #1 to correspond with the following
>>>   patches.
>>>
>>> Differences v4-v5:
>>> 
>>> * Introduction of an array to control the access to the MAC registers.
>>> * Removal of the specific functions that warned of partial
>>>   implementation on read/write from patch 4.
>>> * Adequate changes to patches 4 and 8: mainly adding the registers
>>>   introduced there to the new array.
>>>
>>> The majority of these changes result from Jason Wang's review - thank
>>> you, Jason!
>> Thanks a lot for the patches. Almost done with two minor concerns:
>>
>> 1) to unbreak bisection we'd better enable the extra_mac_registers (and
>> compatibility stuffs) in patch 8 or patch 9
> Do you mean by that changing patch 2, so that the compatibility would
> be "on" by default, and setting it to "off" by default only in patch
> 8, or an additional patch 9?

I mean do not introduce the property "extra_mac_registers" until patch 8
and 9. In this case all function will be enabled completely at that time
instead of partially patch by patch in this series.

>> 2) looks like we could save some lines of codes in patch 3, see the
>> comment in that patch
>>
>> Since we're near to soft freeze (12th), want to ask whether or not you
>> want to send a v6 or I can fix 1 my self. (if 2 is correct, we can do
>> optimizations on top).
> Will send a v6 with a fix to 2 today. Regarding 1 - awaiting your answer.
>
> Thanks,
> Leonid.
>>> Leonid Bloch (8):
>>>   e1000: Cosmetic and alignment fixes
>>>   e1000: Add support for migrating the entire MAC registers' array
>>>   e1000: Introduced an array to control the access to the MAC registers
>>>   e1000: Trivial implementation of various MAC registers
>>>   e1000: Fixing the received/transmitted packets' counters
>>>   e1000: Fixing the received/transmitted octets' counters
>>>   e1000: Fixing the packet address filtering procedure
>>>   e1000: Implementing various counters
>>>
>>>  hw/net/e1000.c  | 503 
>>> +---
>>>  hw/net/e1000_regs.h |   8 +-
>>>  include/hw/compat.h |   4 +
>>>  3 files changed, 406 insertions(+), 109 deletions(-)
>>>

Re: [Qemu-devel] [PATCH] hw/arm/virt: error_report cleanups

2015-11-10 Thread Andrew Jones

On Tue, Nov 10, 2015 at 11:55:55AM +, Peter Maydell wrote:
> On 10 November 2015 at 09:39, Markus Armbruster  wrote:
> > Peter Maydell  writes:
> >> ...so in conclusion Andrew's patch is correct as it stands
> >> and I should just apply it? :-)
> >
> > Yes.  It got my R-by :)
> 
> OK, applied to target-arm.next. Thanks for walking me through this.
>

Thanks guys! And I'm glad you saw this patch Markus! I'll definitely
remember to CC you on all error_* patches in the future :-)

drew

Re: [Qemu-devel] [PATCH v2 2/5] target-i386/kvm: Hyper-V SynIC MSR's support

2015-11-10 Thread Paolo Bonzini



On 10/11/2015 13:52, Andrey Smetanin wrote:
> This patch does Hyper-V Synthetic interrupt
> controller(Hyper-V SynIC) MSR's support and
> migration. Hyper-V SynIC is enabled by cpu's
> 'hv-synic' option.
> 
> This patch does not allow cpu creation if
> 'hv-synic' option specified but kernel
> doesn't support Hyper-V SynIC.
> 
> Changes v2:
> * activate Hyper-V SynIC by enabling corresponding vcpu cap
> * reject cpu initialization if user requested Hyper-V SynIC
>   but kernel does not support Hyper-V SynIC
> 
> Signed-off-by: Andrey Smetanin 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: Paolo Bonzini 
> CC: Richard Henderson 
> CC: Eduardo Habkost 
> CC: "Andreas Färber" 
> CC: Marcelo Tosatti 
> CC: Roman Kagan 
> CC: Denis V. Lunev 
> CC: k...@vger.kernel.org
> 
> ---
>  target-i386/cpu-qom.h |  1 +
>  target-i386/cpu.c |  1 +
>  target-i386/cpu.h |  5 
>  target-i386/kvm.c | 67 
> ++-
>  target-i386/machine.c | 39 ++
>  5 files changed, 112 insertions(+), 1 deletion(-)
> 
> diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
> index e3bfe9d..7ea5b34 100644
> --- a/target-i386/cpu-qom.h
> +++ b/target-i386/cpu-qom.h
> @@ -94,6 +94,7 @@ typedef struct X86CPU {
>  bool hyperv_reset;
>  bool hyperv_vpindex;
>  bool hyperv_runtime;
> +bool hyperv_synic;
>  bool check_cpuid;
>  bool enforce_cpuid;
>  bool expose_kvm;
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index e5f1c5b..1462e19 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -3142,6 +3142,7 @@ static Property x86_cpu_properties[] = {
>  DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
>  DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
>  DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
> +DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
>  DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
>  DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
>  DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index fc4a605..8cf33df 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -918,6 +918,11 @@ typedef struct CPUX86State {
>  uint64_t msr_hv_tsc;
>  uint64_t msr_hv_crash_params[HV_X64_MSR_CRASH_PARAMS];
>  uint64_t msr_hv_runtime;
> +uint64_t msr_hv_synic_control;
> +uint64_t msr_hv_synic_version;
> +uint64_t msr_hv_synic_evt_page;
> +uint64_t msr_hv_synic_msg_page;
> +uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
>  
>  /* exception/interrupt handling */
>  int error_code;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 2a9953b..cfcd01d 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -86,6 +86,7 @@ static bool has_msr_hv_crash;
>  static bool has_msr_hv_reset;
>  static bool has_msr_hv_vpindex;
>  static bool has_msr_hv_runtime;
> +static bool has_msr_hv_synic;
>  static bool has_msr_mtrr;
>  static bool has_msr_xss;
>  
> @@ -521,7 +522,8 @@ static bool hyperv_enabled(X86CPU *cpu)
>  cpu->hyperv_crash ||
>  cpu->hyperv_reset ||
>  cpu->hyperv_vpindex ||
> -cpu->hyperv_runtime);
> +cpu->hyperv_runtime ||
> +cpu->hyperv_synic);
>  }
>  
>  static Error *invtsc_mig_blocker;
> @@ -610,6 +612,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  if (cpu->hyperv_runtime && has_msr_hv_runtime) {
>  c->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
>  }
> +if (cpu->hyperv_synic) {
> +if (!has_msr_hv_synic ||
> +kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_SYNIC, 0)) {
> +fprintf(stderr, "Hyper-V SynIC is not supported by 
> kernel\n");
> +return -ENOSYS;
> +}
> +c->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
> +}
>  c = &cpuid_data.entries[cpuid_i++];
>  c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
>  if (cpu->hyperv_relaxed_timing) {
> @@ -950,6 +960,10 @@ static int kvm_get_supported_msrs(KVMState *s)
>  has_msr_hv_runtime = true;
>  continue;
>  }
> +if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) {
> +has_msr_hv_synic = true;
> +continue;
> +}
>  }
>  }
>  
> @@ -1511,6 +1525,31 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_VP_RUNTIME,
>env->msr_hv_runtime);
>  }
> +if (cpu->hyperv_synic) {
> +int j;
> +
> +if (!env->msr_hv_synic_version) {
> +/* First time initialization */
> +env->msr_hv_synic_version = HV_SYNIC_VERSION_1;
> +for (j = 0; j < ARRAY

Re: [Qemu-devel] [PATCH v10 24/30] qapi: Factor out QAPISchemaObjectType.check_clash()

2015-11-10 Thread Eric Blake

On 11/10/2015 02:15 AM, Markus Armbruster wrote:

>> On the other hand, we've been arguing that check() should populate
>> everything after construction prior to anything else being run; and not
>> running Variant.type.check() during Variants.check() of flat unions
>> feels like we may have a hole (a flat union will have to inline its
>> types to the overall JSON object, and inlining types requires access to
>> type.members - but as written, we aren't populating them until
>> Variants.check_clash()).  I can play with hoisting the type.check() out
>> of type.check_clash() and instead keep base.check() in type.check(), and
>> add variant.type.check() in Variants.check() (but only for unions, not
>> for alternates), if you are interested.
> 
> My "qapi: Factor out QAPISchemaObjectTypeMember.check_clash()" adds
> QAPISchemaObjectTypeMember.check_clash() without changing the common
> protocol.  The new QAPISchemaObjectTypeMember.check_clash() is merely a
> helper for QAPISchemaObjectType.check().
> 
> The two .check_clash() you add (one in this patch, one in the previous
> one) are different: both contain calls of QAPISchemaObjectType.check().
> 
> I feel the .check() calls are too important to be buried deep like that.
> I'd stick to prior practice and put the .check() calls right into
> .check().  Obviously, the .check_clash() methods may only called after
> .check() then, but that's nothing new.
> 
> Fixup for your previous patch:
> 
> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index 4c56935..357127d 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py
> @@ -1065,7 +1065,6 @@ class QAPISchemaObjectTypeVariants(object):
>  vseen = dict(seen)
>  assert isinstance(v.type, QAPISchemaObjectType)
>  assert not v.type.variants   # not implemented
> -v.type.check(schema)
>  for m in v.type.members:
>  m.check_clash(vseen)
>  
> @@ -1077,6 +1076,7 @@ class 
> QAPISchemaObjectTypeVariant(QAPISchemaObjectTypeMember):
>  def check(self, schema, tag_type):
>  QAPISchemaObjectTypeMember.check(self, schema)
>  assert self.name in tag_type.values
> +self.type.check(schema)
>  

Won't quite work.  You are right that we must call
self.type.check(schema) for variants used by a union; but calling it for
ALL variants used by an alternate is wrong, because self.type for at
least one branch of an alternate will not be an instance of
QAPISchemaObjectType.  However, I'm currently testing whether it is safe
to check to just blindly check an object branch of an alternate, if
present (and that should not lead to cycles, since alternates have no
base class and since we don't allow one alternate type as a variant of
another alternate), in which case the fixup for 23/30 is more like:

diff --git i/scripts/qapi.py w/scripts/qapi.py
index a005c87..25fa642 100644
--- i/scripts/qapi.py
+++ w/scripts/qapi.py
@@ -1065,7 +1065,6 @@ class QAPISchemaObjectTypeVariants(object):
 vseen = dict(seen)
 assert isinstance(v.type, QAPISchemaObjectType)
 assert not v.type.variants   # not implemented
-v.type.check(schema)
 for m in v.type.members:
 m.check_clash(vseen)

@@ -1077,6 +1076,8 @@ class
QAPISchemaObjectTypeVariant(QAPISchemaObjectTypeMember):
 def check(self, schema, tag_type):
 QAPISchemaObjectTypeMember.check(self, schema)
 assert self.name in tag_type.values
+if isinstance(self.type, QAPISchemaObjectType):
+self.type.check(schema)

 # This function exists to support ugly simple union special cases
 # TODO get rid of them, and drop the function
@@ -1098,6 +1099,8 @@ class QAPISchemaAlternateType(QAPISchemaType):

 def check(self, schema):
 self.variants.tag_member.check(schema)
+# Not calling self.variants.check_clash(), because there's
+# nothing to clash with
 self.variants.check(schema, {})

 def json_type(self):




-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v5 0/8] e1000: Various fixes and registers' implementation

2015-11-10 Thread Leonid Bloch

On Tue, Nov 10, 2015 at 3:01 PM, Jason Wang  wrote:
>
>
> On 11/10/2015 07:39 PM, Leonid Bloch wrote:
>> On Tue, Nov 10, 2015 at 8:21 AM, Jason Wang  wrote:
>>>
>>> On 11/09/2015 10:59 PM, Leonid Bloch wrote:
 This series fixes issues with packet/octet counting in e1000's Statistic
 registers, fixes a bug in the packet address filtering procedure, and
 implements many MAC registers that were absent before, some Statistic
 counters among them.

 Besides this, the series introduces a parameter which, if set to "on"
 (default), will cause the entire MAC registers' array to migrate during
 live migration (please see patch #2 for details). The rational behind
 this is the ability to implement additional MAC registers in the future,
 without worrying about migration compatibility between future versions.
 For compatibility with previous versions, the above mentioned parameter
 can be set to "off".

 Also, a new array is introduced to control the access to the various MAC
 registers. This takes care of situations when a MAC register requires a
 certain parameter to be accessed, or is partially implemented, and
 requires a debug warning to be printed on access attempts.

 Additionally, several cosmetic changes are made.

 Differences v1-2:

 * Wording of several commit messages corrected.
 * For trivially implemented Diagnostic registers, a debug message is
   added on read/write attempts, alerting of incomplete implementation.
 * Following testing on a physical device, only the lower 16 bits can now
   be read from AIT, and only the lower 4 - from FFMT*.
 * The grow_8reg_if_not_full function is rewritten.
 * inc_tx_bcast_or_mcast_count and increase_size_stats are now called
   from within e1000_send_packet, to avoid code duplication.

 Differences v2-3:

 * Minor rewordings of some commit messages (0002, 0003).
 * Live migration capability is added to the newly implemented registers.

 Differences v3-4:

 * Introduction of the "full_mac_registers" parameter (see above).
 * Reversion of the live migration handling introduced in v3.
 * Small alignment changes in patch #1 to correspond with the following
   patches.

 Differences v4-v5:

 * Introduction of an array to control the access to the MAC registers.
 * Removal of the specific functions that warned of partial
   implementation on read/write from patch 4.
 * Adequate changes to patches 4 and 8: mainly adding the registers
   introduced there to the new array.

 The majority of these changes result from Jason Wang's review - thank
 you, Jason!
>>> Thanks a lot for the patches. Almost done with two minor concerns:
>>>
>>> 1) to unbreak bisection we'd better enable the extra_mac_registers (and
>>> compatibility stuffs) in patch 8 or patch 9
>> Do you mean by that changing patch 2, so that the compatibility would
>> be "on" by default, and setting it to "off" by default only in patch
>> 8, or an additional patch 9?
>
> I mean do not introduce the property "extra_mac_registers" until patch 8
> and 9. In this case all function will be enabled completely at that time
> instead of partially patch by patch in this series.

But won't there be compatibility issues between the patches if done
like that? Why not to prepare the ground for compatibility, and only
then introduce the new registers (as it is done now)?
>
>>> 2) looks like we could save some lines of codes in patch 3, see the
>>> comment in that patch
>>>
>>> Since we're near to soft freeze (12th), want to ask whether or not you
>>> want to send a v6 or I can fix 1 my self. (if 2 is correct, we can do
>>> optimizations on top).
>> Will send a v6 with a fix to 2 today. Regarding 1 - awaiting your answer.
>>
>> Thanks,
>> Leonid.
 Leonid Bloch (8):
   e1000: Cosmetic and alignment fixes
   e1000: Add support for migrating the entire MAC registers' array
   e1000: Introduced an array to control the access to the MAC registers
   e1000: Trivial implementation of various MAC registers
   e1000: Fixing the received/transmitted packets' counters
   e1000: Fixing the received/transmitted octets' counters
   e1000: Fixing the packet address filtering procedure
   e1000: Implementing various counters

  hw/net/e1000.c  | 503 
 +---
  hw/net/e1000_regs.h |   8 +-
  include/hw/compat.h |   4 +
  3 files changed, 406 insertions(+), 109 deletions(-)

>

[Qemu-devel] [PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-11-10 Thread Andrey Smetanin

A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v4:
* exit into userspace only if guest writes into SynIC MSR's

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-devel@nongnu.org

---
 Documentation/virtual/kvm/api.txt | 22 ++
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/hyperv.c | 20 
 arch/x86/kvm/x86.c|  6 ++
 include/linux/kvm_host.h  |  1 +
 include/uapi/linux/kvm.h  | 17 +
 6 files changed, 67 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 16096a2..abc4f48 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3337,6 +3337,28 @@ the userspace IOAPIC should process the EOI and 
retrigger the interrupt if
 it is still asserted.  Vector is the LAPIC interrupt vector for which the
 EOI was received.
 
+   struct kvm_hyperv_exit {
+#define KVM_EXIT_HYPERV_SYNIC  1
+   __u32 type;
+   union {
+   struct {
+   __u32 msr;
+   __u64 control;
+   __u64 evt_page;
+   __u64 msg_page;
+   } synic;
+   } u;
+   };
+   /* KVM_EXIT_HYPERV */
+struct kvm_hyperv_exit hyperv;
+Indicates that the VCPU exits into userspace to process some tasks
+related to Hyper-V emulation.
+Valid values for 'type' are:
+   KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
+Hyper-V SynIC state change. Notification is used to remap SynIC
+event/message pages and to enable/disable SynIC messages/events processing
+in userspace.
+
/* Fix the size of the union. */
char padding[256];
};
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad29e89..1cefa1e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -393,6 +393,7 @@ struct kvm_vcpu_hv {
u64 hv_vapic;
s64 runtime_offset;
struct kvm_vcpu_hv_synic synic;
+   struct kvm_hyperv_exit exit;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 83a3c0c..41869a9 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -130,6 +130,20 @@ static void kvm_hv_notify_acked_sint(struct kvm_vcpu 
*vcpu, u32 sint)
srcu_read_unlock(&kvm->irq_srcu, idx);
 }
 
+static void synic_exit(struct kvm_vcpu_hv_synic *synic, u32 msr)
+{
+   struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
+   struct kvm_vcpu_hv *hv_vcpu = &vcpu->arch.hyperv;
+
+   hv_vcpu->exit.type = KVM_EXIT_HYPERV_SYNIC;
+   hv_vcpu->exit.u.synic.msr = msr;
+   hv_vcpu->exit.u.synic.control = synic->control;
+   hv_vcpu->exit.u.synic.evt_page = synic->evt_page;
+   hv_vcpu->exit.u.synic.msg_page = synic->msg_page;
+
+   kvm_make_request(KVM_REQ_HV_EXIT, vcpu);
+}
+
 static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
 u32 msr, u64 data, bool host)
 {
@@ -145,6 +159,8 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
switch (msr) {
case HV_X64_MSR_SCONTROL:
synic->control = data;
+   if (!host)
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_SVERSION:
if (!host) {
@@ -161,6 +177,8 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
break;
}
synic->evt_page = data;
+   if (!host)
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_SIMP:
if (data & HV_SYNIC_SIMP_ENABLE)
@@ -170,6 +188,8 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
break;
}
synic->msg_page = data;
+   if (!host)
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_EOM: {
int i;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 41f3030..04daf32 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6377,6 +6377,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
r = 0;
goto out;
}
+   if (kvm_check_request(KVM_REQ_HV_EXIT, vcpu)) {
+   vcpu->run->exit_reason = KVM_

[Qemu-devel] [PATCH v2 3/5] kvm: Hyper-V SynIC irq routing support

2015-11-10 Thread Andrey Smetanin

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org

---
 include/sysemu/kvm.h |  1 +
 kvm-all.c| 33 +
 2 files changed, 34 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 4ac6176..92ccb35 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -447,6 +447,7 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg,
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint);
 
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq);
diff --git a/kvm-all.c b/kvm-all.c
index 1bc1273..d36b494 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1297,6 +1297,34 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return virq;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+struct kvm_irq_routing_entry kroute = {};
+int virq;
+
+if (!kvm_gsi_routing_enabled()) {
+return -ENOSYS;
+}
+if (!kvm_check_extension(s, KVM_CAP_HYPERV_SYNIC)) {
+return -ENOSYS;
+}
+virq = kvm_irqchip_get_virq(s);
+if (virq < 0) {
+return virq;
+}
+
+kroute.gsi = virq;
+kroute.type = KVM_IRQ_ROUTING_HV_SINT;
+kroute.flags = 0;
+kroute.u.hv_sint.vcpu = vcpu;
+kroute.u.hv_sint.sint = sint;
+
+kvm_add_routing_entry(s, &kroute);
+kvm_irqchip_commit_routes(s);
+
+return virq;
+}
+
 #else /* !KVM_CAP_IRQ_ROUTING */
 
 void kvm_init_irq_routing(KVMState *s)
@@ -1322,6 +1350,11 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return -ENOSYS;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+return -ENOSYS;
+}
+
 static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
 {
 abort();
-- 
2.4.3

[Qemu-devel] [PATCH v2 0/5] QEMU: Hyper-V SynIC support

2015-11-10 Thread Andrey Smetanin

Hyper-V SynIC (synthetic interrupt controller) support:
* msr's support
* irq routing setup
* irq injection
* irq ack's callbacks
* event/message pages changes tracking at Hyper-V exit
* Hyper-V test device to test SynIC by kvm-unit-tests

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org

Changes v2:
* linux headers update by scripts moved into separate patche
* activate Hyper-V SynIC by enabling corresponding vcpu cap
* reject cpu initialization if user requested Hyper-V SynIC
  but kernel does not support Hyper-V SynIC

Andrey Smetanin (5):
  headers: Linux kernel Hyper-V SynIC defines
  target-i386/kvm: Hyper-V SynIC MSR's support
  kvm: Hyper-V SynIC irq routing support
  target-i386/hyperv: Hyper-V SynIC SINT routing and vcpu exit
  hw/misc: Hyper-V test device 'hyperv-testdev'

 default-configs/i386-softmmu.mak  |   1 +
 default-configs/x86_64-softmmu.mak|   1 +
 hw/misc/Makefile.objs |   1 +
 hw/misc/hyperv_testdev.c  | 164 ++
 include/standard-headers/asm-x86/hyperv.h |  12 +++
 include/sysemu/kvm.h  |   1 +
 kvm-all.c |  33 ++
 linux-headers/linux/kvm.h |  25 +
 target-i386/Makefile.objs |   2 +-
 target-i386/cpu-qom.h |   1 +
 target-i386/cpu.c |   1 +
 target-i386/cpu.h |   5 +
 target-i386/hyperv.c  | 127 +++
 target-i386/hyperv.h  |  42 
 target-i386/kvm.c |  73 -
 target-i386/machine.c |  39 +++
 16 files changed, 526 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/hyperv_testdev.c
 create mode 100644 target-i386/hyperv.c
 create mode 100644 target-i386/hyperv.h

-- 
2.4.3

[Qemu-devel] [PATCH v4 0/5] KVM: Hyper-V synthetic interrupt controller

2015-11-10 Thread Andrey Smetanin

This patchset implements the KVM part of the synthetic interrupt
controller (SynIC) which is a building block of the Hyper-V
paravirtualized device bus (vmbus).

SynIC is a lapic extension, which is controlled via MSRs and maintains
for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Besides, a new vcpu exit is introduced to notify the userspace of the
changes in SynIC configuraion triggered by guest writing to the
corresponding MSRs.

Since auto-EOI behavior of SynIC cannot be made compatible with APIC
hardware virtualization, the latter is disabled using a newly
introduced flag, when SynIC is activated.

This patches seria has been tested by running of kvm-unit-tests
(which also includes previosly sent 'hyperv_synic' test) with
host CPU which supports APICv (Intel(R) Xeon(R) CPU E5-2407 v2 @ 2.40GHz)

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-devel@nongnu.org

Changes v4:
* disable APICv in case Hyper-V SynIC enabled
* patchset rebase into latest kvm/queue (10 Nov 2015)
* do Hyper-V SynIC exit only at !host(guest) msr's writes

Changes v3:
* Hyper-V SynIC KVM API documentation fixes

Changes v2:
* irqchip/eventfd preparation improvements to support
arch specific routing entries like Hyper-V SynIC.
* add Hyper-V SynIC vectors into EOI exit bitmap.
* do not use posted interrupts in case of Hyper-V SynIC
AutoEOI vectors

Andrey Smetanin (5):
  kvm/irqchip: kvm_arch_irq_routing_update renaming split
  kvm/x86: split ioapic-handled and EOI exit bitmaps
  kvm/x86: per-vcpu apicv deactivation support
  kvm/x86: Hyper-V synthetic interrupt controller
  kvm/x86: Hyper-V kvm exit

 Documentation/virtual/kvm/api.txt |  41 +
 arch/x86/include/asm/kvm_host.h   |  26 ++-
 arch/x86/kvm/hyperv.c | 335 ++
 arch/x86/kvm/hyperv.h |  23 +++
 arch/x86/kvm/ioapic.c |   4 +-
 arch/x86/kvm/ioapic.h |   7 +-
 arch/x86/kvm/irq.c|   2 +-
 arch/x86/kvm/irq_comm.c   |  41 -
 arch/x86/kvm/lapic.c  |  40 +++--
 arch/x86/kvm/lapic.h  |   9 +-
 arch/x86/kvm/svm.c|  13 +-
 arch/x86/kvm/vmx.c|  48 +++---
 arch/x86/kvm/x86.c|  66 +++-
 include/linux/kvm_host.h  |  12 +-
 include/uapi/linux/kvm.h  |  25 +++
 virt/kvm/irqchip.c|   7 +-
 16 files changed, 625 insertions(+), 74 deletions(-)

-- 
2.4.3

[Qemu-devel] [PATCH v4 1/5] kvm/irqchip: kvm_arch_irq_routing_update renaming split

2015-11-10 Thread Andrey Smetanin

Actually kvm_arch_irq_routing_update() should be
kvm_arch_post_irq_routing_update() as it's called at the end
of irq routing update.

This renaming frees kvm_arch_irq_routing_update function name.
kvm_arch_irq_routing_update() weak function which will be used
to update mappings for arch-specific irq routing entries
(in particular, the upcoming Hyper-V synthetic interrupts).

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-devel@nongnu.org

---
 arch/x86/kvm/irq_comm.c  | 2 +-
 include/linux/kvm_host.h | 5 +++--
 virt/kvm/irqchip.c   | 7 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 84b96d3..e39768c 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -332,7 +332,7 @@ int kvm_setup_empty_irq_routing(struct kvm *kvm)
return kvm_set_irq_routing(kvm, empty_routing, 0, 0);
 }
 
-void kvm_arch_irq_routing_update(struct kvm *kvm)
+void kvm_arch_post_irq_routing_update(struct kvm *kvm)
 {
if (ioapic_in_kernel(kvm) || !irqchip_in_kernel(kvm))
return;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 242a6d2..dbe2a2f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -473,12 +473,12 @@ void vcpu_put(struct kvm_vcpu *vcpu);
 
 #ifdef __KVM_HAVE_IOAPIC
 void kvm_vcpu_request_scan_ioapic(struct kvm *kvm);
-void kvm_arch_irq_routing_update(struct kvm *kvm);
+void kvm_arch_post_irq_routing_update(struct kvm *kvm);
 #else
 static inline void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
 {
 }
-static inline void kvm_arch_irq_routing_update(struct kvm *kvm)
+static inline void kvm_arch_post_irq_routing_update(struct kvm *kvm)
 {
 }
 #endif
@@ -1080,6 +1080,7 @@ static inline void kvm_irq_routing_update(struct kvm *kvm)
 {
 }
 #endif
+void kvm_arch_irq_routing_update(struct kvm *kvm);
 
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index f0b08a2..fe84e1a 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -166,6 +166,10 @@ out:
return r;
 }
 
+void __attribute__((weak)) kvm_arch_irq_routing_update(struct kvm *kvm)
+{
+}
+
 int kvm_set_irq_routing(struct kvm *kvm,
const struct kvm_irq_routing_entry *ue,
unsigned nr,
@@ -219,9 +223,10 @@ int kvm_set_irq_routing(struct kvm *kvm,
old = kvm->irq_routing;
rcu_assign_pointer(kvm->irq_routing, new);
kvm_irq_routing_update(kvm);
+   kvm_arch_irq_routing_update(kvm);
mutex_unlock(&kvm->irq_lock);
 
-   kvm_arch_irq_routing_update(kvm);
+   kvm_arch_post_irq_routing_update(kvm);
 
synchronize_srcu_expedited(&kvm->irq_srcu);
 
-- 
2.4.3

[Qemu-devel] [PATCH v2 1/5] headers: Linux kernel Hyper-V SynIC defines

2015-11-10 Thread Andrey Smetanin

This patch brings in the necessary changes from the corresponding kernel
patchset.  It's included only for completeness; ideally these changes
should arrive via the standard kernel header pull.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org

---
 include/standard-headers/asm-x86/hyperv.h | 12 
 linux-headers/linux/kvm.h | 25 +
 2 files changed, 37 insertions(+)

diff --git a/include/standard-headers/asm-x86/hyperv.h 
b/include/standard-headers/asm-x86/hyperv.h
index c37c14e..f9780f1 100644
--- a/include/standard-headers/asm-x86/hyperv.h
+++ b/include/standard-headers/asm-x86/hyperv.h
@@ -257,4 +257,16 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
int64_t tsc_offset;
 } HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 
+/* Define the number of synthetic interrupt sources. */
+#define HV_SYNIC_SINT_COUNT(16)
+/* Define the expected SynIC version. */
+#define HV_SYNIC_VERSION_1 (0x1)
+
+#define HV_SYNIC_CONTROL_ENABLE(1ULL << 0)
+#define HV_SYNIC_SIMP_ENABLE   (1ULL << 0)
+#define HV_SYNIC_SIEFP_ENABLE  (1ULL << 0)
+#define HV_SYNIC_SINT_MASKED   (1ULL << 16)
+#define HV_SYNIC_SINT_AUTO_EOI (1ULL << 17)
+#define HV_SYNIC_SINT_VECTOR_MASK  (0xFF)
+
 #endif
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index dcc410e..4e20262 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -154,6 +154,20 @@ struct kvm_s390_skeys {
__u32 flags;
__u32 reserved[9];
 };
+
+struct kvm_hyperv_exit {
+#define KVM_EXIT_HYPERV_SYNIC  1
+   __u32 type;
+   union {
+   struct {
+   __u32 msr;
+   __u64 control;
+   __u64 evt_page;
+   __u64 msg_page;
+   } synic;
+   } u;
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX1048576
 
@@ -184,6 +198,7 @@ struct kvm_s390_skeys {
 #define KVM_EXIT_SYSTEM_EVENT 24
 #define KVM_EXIT_S390_STSI25
 #define KVM_EXIT_IOAPIC_EOI   26
+#define KVM_EXIT_HYPERV   27
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -338,6 +353,8 @@ struct kvm_run {
struct {
__u8 vector;
} eoi;
+   /* KVM_EXIT_HYPERV */
+   struct kvm_hyperv_exit hyperv;
/* Fix the size of the union. */
char padding[256];
};
@@ -831,6 +848,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
 #define KVM_CAP_SPLIT_IRQCHIP 121
 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
+#define KVM_CAP_HYPERV_SYNIC 123
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -854,10 +872,16 @@ struct kvm_irq_routing_s390_adapter {
__u32 adapter_id;
 };
 
+struct kvm_irq_routing_hv_sint {
+   __u32 vcpu;
+   __u32 sint;
+};
+
 /* gsi routing entry types */
 #define KVM_IRQ_ROUTING_IRQCHIP 1
 #define KVM_IRQ_ROUTING_MSI 2
 #define KVM_IRQ_ROUTING_S390_ADAPTER 3
+#define KVM_IRQ_ROUTING_HV_SINT 4
 
 struct kvm_irq_routing_entry {
__u32 gsi;
@@ -868,6 +892,7 @@ struct kvm_irq_routing_entry {
struct kvm_irq_routing_irqchip irqchip;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
+   struct kvm_irq_routing_hv_sint hv_sint;
__u32 pad[8];
} u;
 };
-- 
2.4.3

[Qemu-devel] [PATCH v2 4/5] target-i386/hyperv: Hyper-V SynIC SINT routing and vcpu exit

2015-11-10 Thread Andrey Smetanin

Hyper-V SynIC(synthetic interrupt controller) helpers for
Hyper-V SynIC irq routing setup, irq injection, irq ack
notifications event/message pages changes tracking for future use.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org

---
 target-i386/Makefile.objs |   2 +-
 target-i386/hyperv.c  | 127 ++
 target-i386/hyperv.h  |  42 +++
 target-i386/kvm.c |   6 +++
 4 files changed, 176 insertions(+), 1 deletion(-)
 create mode 100644 target-i386/hyperv.c
 create mode 100644 target-i386/hyperv.h

diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs
index 437d997..2255f46 100644
--- a/target-i386/Makefile.objs
+++ b/target-i386/Makefile.objs
@@ -3,5 +3,5 @@ obj-y += excp_helper.o fpu_helper.o cc_helper.o int_helper.o 
svm_helper.o
 obj-y += smm_helper.o misc_helper.o mem_helper.o seg_helper.o
 obj-y += gdbstub.o
 obj-$(CONFIG_SOFTMMU) += machine.o arch_memory_mapping.o arch_dump.o monitor.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM) += kvm.o hyperv.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
diff --git a/target-i386/hyperv.c b/target-i386/hyperv.c
new file mode 100644
index 000..e79b173
--- /dev/null
+++ b/target-i386/hyperv.c
@@ -0,0 +1,127 @@
+/*
+ * QEMU KVM Hyper-V support
+ *
+ * Copyright (C) 2015 Andrey Smetanin 
+ *
+ * Authors:
+ *  Andrey Smetanin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hyperv.h"
+#include "standard-headers/asm-x86/hyperv.h"
+
+int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit)
+{
+CPUX86State *env = &cpu->env;
+
+switch (exit->type) {
+case KVM_EXIT_HYPERV_SYNIC:
+if (!cpu->hyperv_synic) {
+return -1;
+}
+
+/*
+ * For now just track changes in SynIC control and msg/evt pages msr's.
+ * When SynIC messaging/events processing will be added in future
+ * here we will do messages queues flushing and pages remapping.
+ */
+switch (exit->u.synic.msr) {
+case HV_X64_MSR_SCONTROL:
+env->msr_hv_synic_control = exit->u.synic.control;
+break;
+case HV_X64_MSR_SIMP:
+env->msr_hv_synic_msg_page = exit->u.synic.msg_page;
+break;
+case HV_X64_MSR_SIEFP:
+env->msr_hv_synic_evt_page = exit->u.synic.evt_page;
+break;
+default:
+return -1;
+}
+return 0;
+default:
+return -1;
+}
+}
+
+static void kvm_hv_sint_ack_handler(EventNotifier *notifier)
+{
+HvSintRoute *sint_route = container_of(notifier, HvSintRoute,
+   sint_ack_notifier);
+event_notifier_test_and_clear(notifier);
+if (sint_route->sint_ack_clb) {
+sint_route->sint_ack_clb(sint_route);
+}
+}
+
+HvSintRoute *kvm_hv_sint_route_create(uint32_t vcpu_id, uint32_t sint,
+  HvSintAckClb sint_ack_clb)
+{
+HvSintRoute *sint_route;
+int r, gsi;
+
+sint_route = g_malloc0(sizeof(*sint_route));
+r = event_notifier_init(&sint_route->sint_set_notifier, false);
+if (r) {
+goto err;
+}
+
+r = event_notifier_init(&sint_route->sint_ack_notifier, false);
+if (r) {
+goto err_sint_set_notifier;
+}
+
+event_notifier_set_handler(&sint_route->sint_ack_notifier,
+   kvm_hv_sint_ack_handler);
+
+gsi = kvm_irqchip_add_hv_sint_route(kvm_state, vcpu_id, sint);
+if (gsi < 0) {
+goto err_gsi;
+}
+
+r = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+   &sint_route->sint_set_notifier,
+   &sint_route->sint_ack_notifier, 
gsi);
+if (r) {
+goto err_irqfd;
+}
+sint_route->gsi = gsi;
+sint_route->sint_ack_clb = sint_ack_clb;
+sint_route->vcpu_id = vcpu_id;
+sint_route->sint = sint;
+
+return sint_route;
+
+err_irqfd:
+kvm_irqchip_release_virq(kvm_state, gsi);
+err_gsi:
+event_notifier_set_handler(&sint_route->sint_ack_notifier, NULL);
+event_notifier_cleanup(&sint_route->sint_ack_notifier);
+err_sint_set_notifier:
+event_notifier_cleanup(&sint_route->sint_set_notifier);
+err:
+g_free(sint_route);
+
+return NULL;
+}
+
+void kvm_hv_sint_route_destroy(HvSintRoute *sint_route)
+{
+kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+  &sint_route->sint_set_notifier,
+  sint_route->gsi);
+kvm_irqchip_release_virq(kvm_state, sint_route->gsi);
+event_notifier_set_handler(&sint_route->sint_ack_not

Re: [Qemu-devel] [PATCH v10 23/30] qapi: Check for qapi collisions of flat union branches

2015-11-10 Thread Eric Blake

On 11/10/2015 01:30 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> On 11/09/2015 05:56 AM, Markus Armbruster wrote:
>>> Eric Blake  writes:
>>>
 Right now, our ad hoc parser ensures that we cannot have a
 flat union that introduces any qapi member names that would
 conflict with the non-variant qapi members already present
 from the union's base type (see flat-union-clash-member.json).
 We want QAPISchemaObjectType.check() to make the same check,
 so we can later reduce some of the ad hoc checks.

>>
 In general, a type used as a branch of a flat union cannot
 also be the base type of the flat union, so even though we are
 adding a call to variant.type.check() in order to populate
 variant.type.members, this is merely a case of gaining
 topological sorting of how types are visited (and type.check()
 is already set up to allow multiple calls due to base types).
>>>
>>> Yes, a type cannot contain itself, neither as base nor as variant.
>>>
>>> We have tests covering attempts to do the former
>>> (struct-cycle-direct.json, struct-cycle-indirect.json).  As far as I can
> 
> Actually, these are just local, unpublished tests.  They both make
> check_member_clash() recurse infinitely.
> 
> # Direct inheritance loop
> # FIXME triggers infinite recursion
> { 'struct': 'Loopy', 'base': 'Loopy',
>   'data': {} }

Okay, I should add that into my pending patch that cleans up base loops,

> 
> # we reject a loop in base classes
> { 'struct': 'Base1', 'base': 'Base2', 'data': {} }
> { 'struct': 'Base2', 'base': 'Base1', 'data': {} }
> 
> The latter is actually yours, proposed as base-cycle.json in
> Subject: qapi: Detect collisions in C member names
> Message-Id: <1442872682-6523-17-git-send-email-ebl...@redhat.com>

and yes, that one is still in my queue for subset D:
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg07001.html

and I may indeed at a test for reusing the base type of a flat union as
one of the branches of the same union, depending on whether it uncovers
anything different.


> 
> If I disable the recursive call, the cycle detection in
> QAPISchemaObjectType.check() is reached, and works.
> 
> Completing the move of clash detection to check() methods should improve
> things from "accidental infinite recursion" to "intentional assertion
> failure", because it should get rid of check_member_clash() and should
> not break the cycle detection.
> 
> Then we can turn the assertion into a proper error message, and add the
> tests.

Yep, that's what is pending in my queue, just further out than subset C.
 Doesn't matter if it misses 2.5 (the bug is real, but is only triggered
by bad .json code, and we aren't going to add any bad .json code between
now and 2.5).


>> But you have me curious if this collision is still caught when the ad
>> hoc tests are gone.  If so, great; if not, I'll add a test here.  (I'll
>> know later when I get through rebasing to all of your comments.)

Still true - I'm still plowing through earlier patches before deciding
if my 'qapi: Detect base class loops' also needs to detect flat union loops.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4 2/5] kvm/x86: split ioapic-handled and EOI exit bitmaps

2015-11-10 Thread Andrey Smetanin

The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-devel@nongnu.org

---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/ioapic.c   |  4 ++--
 arch/x86/kvm/ioapic.h   |  7 ---
 arch/x86/kvm/irq_comm.c |  5 +++--
 arch/x86/kvm/lapic.c|  2 +-
 arch/x86/kvm/svm.c  |  2 +-
 arch/x86/kvm/vmx.c  |  3 +--
 arch/x86/kvm/x86.c  | 11 ++-
 8 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9265196..d51a7e1d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -400,7 +400,7 @@ struct kvm_vcpu_arch {
u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
-   u64 eoi_exit_bitmap[4];
+   DECLARE_BITMAP(ioapic_handled_vectors, 256);
unsigned long apic_attention;
int32_t apic_arb_prio;
int mp_state;
@@ -833,7 +833,7 @@ struct kvm_x86_ops {
int (*cpu_uses_apicv)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
-   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
+   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 88d0a92..1facfd6 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -233,7 +233,7 @@ static void kvm_ioapic_inject_all(struct kvm_ioapic 
*ioapic, unsigned long irr)
 }
 
 
-void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
+void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong 
*ioapic_handled_vectors)
 {
struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
union kvm_ioapic_redirect_entry *e;
@@ -250,7 +250,7 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
(e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
 kvm_apic_pending_eoi(vcpu, e->fields.vector)))
__set_bit(e->fields.vector,
-   (unsigned long *)eoi_exit_bitmap);
+ ioapic_handled_vectors);
}
}
spin_unlock(&ioapic->lock);
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 084617d..2d16dc2 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -121,7 +121,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
-void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
-void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
-
+void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
+  ulong *ioapic_handled_vectors);
+void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
+   ulong *ioapic_handled_vectors);
 #endif
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index e39768c..ece901c 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -339,7 +339,8 @@ void kvm_arch_post_irq_routing_update(struct kvm *kvm)
kvm_make_scan_ioapic_request(kvm);
 }
 
-void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
+void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
+   ulong *ioapic_handled_vectors)
 {
struct kvm *kvm = vcpu->kvm;
struct kvm_kernel_irq_routing_entry *entry;
@@ -369,7 +370,7 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
u32 vector = entry->msi.data & 0xff;
 
__set_bit(vector,
- (unsigned long *) eoi_exit_bitmap);
+ ioapic_handled_vectors);
}
}
}
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kv

[Qemu-devel] [PATCH v2 5/5] hw/misc: Hyper-V test device 'hyperv-testdev'

2015-11-10 Thread Andrey Smetanin

'hyperv-testdev' will be used by kvm-unit-tests
to setup Hyper-V SynIC SINT's routing and to inject
Hyper-V SynIC SINT's.

Hyper-V test device is ISA type device that creates 0x3000
IO memory region and catches write access into it. Every
write operation data decoded into ctl code and parameters
for Hyper-V test device.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org

---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/hyperv_testdev.c   | 164 +
 4 files changed, 167 insertions(+)
 create mode 100644 hw/misc/hyperv_testdev.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..7f3c850 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index dfb8095..e494d79 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 4aa76ff..fafc80a 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -40,3 +40,4 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_EDU) += edu.o
+obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
diff --git a/hw/misc/hyperv_testdev.c b/hw/misc/hyperv_testdev.c
new file mode 100644
index 000..f0e4e35
--- /dev/null
+++ b/hw/misc/hyperv_testdev.c
@@ -0,0 +1,164 @@
+/*
+ * QEMU KVM Hyper-V test device to support Hyper-V kvm-unit-tests
+ *
+ * Copyright (C) 2015 Andrey Smetanin 
+ *
+ * Authors:
+ *  Andrey Smetanin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/hw.h"
+#include "hw/qdev.h"
+#include "hw/isa/isa.h"
+#include "target-i386/hyperv.h"
+
+#define HV_TEST_DEV_MAX_SINT_ROUTES 64
+
+struct HypervTestDev {
+ISADevice parent_obj;
+MemoryRegion sint_control;
+HvSintRoute *sint_route[HV_TEST_DEV_MAX_SINT_ROUTES];
+};
+typedef struct HypervTestDev HypervTestDev;
+
+#define TYPE_HYPERV_TEST_DEV "hyperv-testdev"
+#define HYPERV_TEST_DEV(obj) \
+OBJECT_CHECK(HypervTestDev, (obj), TYPE_HYPERV_TEST_DEV)
+
+enum {
+HV_TEST_DEV_SINT_ROUTE_CREATE = 1,
+HV_TEST_DEV_SINT_ROUTE_DESTROY,
+HV_TEST_DEV_SINT_ROUTE_SET_SINT
+};
+
+static int alloc_sint_route_index(HypervTestDev *dev)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+if (dev->sint_route[i] == NULL) {
+return i;
+}
+}
+return -1;
+}
+
+static void free_sint_route_index(HypervTestDev *dev, int i)
+{
+assert(i >= 0 && i < ARRAY_SIZE(dev->sint_route));
+dev->sint_route[i] = NULL;
+}
+
+static int find_sint_route_index(HypervTestDev *dev, uint32_t vcpu_id,
+ uint32_t sint)
+{
+HvSintRoute *sint_route;
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+sint_route = dev->sint_route[i];
+if (sint_route && sint_route->vcpu_id == vcpu_id &&
+sint_route->sint == sint) {
+return i;
+}
+}
+return -1;
+}
+
+static void hv_synic_test_dev_control(HypervTestDev *dev, uint32_t ctl,
+  uint32_t vcpu_id, uint32_t sint)
+{
+int i;
+HvSintRoute *sint_route;
+
+switch (ctl) {
+case HV_TEST_DEV_SINT_ROUTE_CREATE:
+i = alloc_sint_route_index(dev);
+assert(i >= 0);
+sint_route = kvm_hv_sint_route_create(vcpu_id, sint, NULL);
+assert(sint_route);
+dev->sint_route[i] = sint_route;
+break;
+case HV_TEST_DEV_SINT_ROUTE_DESTROY:
+i = find_sint_route_index(dev, vcpu_id, sint);
+assert(i >= 0);
+sint_route = dev->sint_route[i];
+kvm_hv_sint_route_destroy(sint_route);
+free_sint_route_index(dev, i);
+break;
+case HV_TEST_DEV_SINT_ROUTE_SET_SINT:
+i = find_sint_route_index(dev, vcpu_id, sint);
+assert(i >= 0);
+sint_route = dev->sint_route[i];
+kvm_hv_sint_route_set_sint(sint_route);
+break;
+default:
+break;
+}
+}
+
+static void hv_test_dev_control(void *opaque, hwaddr addr, uint64_t data,
+uint32_t len)
+{
+HypervTestDev *dev = HYPERV_TEST_DEV(opaque);
+uint8_t ctl;
+
+ctl = (data >> 16ULL) & 0xFF;
+switch (ct

[Qemu-devel] [PATCH v4 3/5] kvm/x86: per-vcpu apicv deactivation support

2015-11-10 Thread Andrey Smetanin

The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-devel@nongnu.org

---
 arch/x86/include/asm/kvm_host.h |  6 +-
 arch/x86/kvm/irq.c  |  2 +-
 arch/x86/kvm/lapic.c| 23 +++--
 arch/x86/kvm/lapic.h|  4 ++--
 arch/x86/kvm/svm.c  | 11 +++---
 arch/x86/kvm/vmx.c  | 45 +
 arch/x86/kvm/x86.c  | 19 ++---
 7 files changed, 63 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d51a7e1d..a60a461 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -400,6 +400,7 @@ struct kvm_vcpu_arch {
u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
+   bool apicv_active;
DECLARE_BITMAP(ioapic_handled_vectors, 256);
unsigned long apic_attention;
int32_t apic_arb_prio;
@@ -830,7 +831,8 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
-   int (*cpu_uses_apicv)(struct kvm_vcpu *vcpu);
+   bool (*get_enable_apicv)(void);
+   void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
@@ -1096,6 +1098,8 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, 
gva_t gva,
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception);
 
+void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 097060e..3982b47 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -76,7 +76,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
if (kvm_cpu_has_extint(v))
return 1;
 
-   if (kvm_vcpu_apic_vid_enabled(v))
+   if (kvm_vcpu_apicv_active(v))
return 0;
 
return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b14436d..14d6fcc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -379,7 +379,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic 
*apic)
if (!apic->irr_pending)
return -1;
 
-   kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
+   if (apic->vcpu->arch.apicv_active)
+   kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
result = apic_search_irr(apic);
ASSERT(result == -1 || result >= 16);
 
@@ -392,7 +393,7 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic 
*apic)
 
vcpu = apic->vcpu;
 
-   if (unlikely(kvm_vcpu_apic_vid_enabled(vcpu))) {
+   if (unlikely(vcpu->arch.apicv_active)) {
/* try to update RVI */
apic_clear_vector(vec, apic->regs + APIC_IRR);
kvm_make_request(KVM_REQ_EVENT, vcpu);
@@ -418,7 +419,7 @@ static inline void apic_set_isr(int vec, struct kvm_lapic 
*apic)
 * because the processor can modify ISR under the hood.  Instead
 * just set SVI.
 */
-   if (unlikely(kvm_x86_ops->hwapic_isr_update))
+   if (unlikely(vcpu->arch.apicv_active))
kvm_x86_ops->hwapic_isr_update(vcpu->kvm, vec);
else {
++apic->isr_count;
@@ -466,7 +467,7 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic 
*apic)
 * on the other hand isr_count and highest_isr_cache are unused
 * and must be left alone.
 */
-   if (unlikely(kvm_x86_ops->hwapic_isr_update))
+   if (unlikely(vcpu->arch.apicv_active))
kvm_x86_ops->hwapic_isr_update(vcpu->kvm,
   apic_find_highest_isr

[Qemu-devel] [PATCH v2 2/5] target-i386/kvm: Hyper-V SynIC MSR's support

2015-11-10 Thread Andrey Smetanin

This patch does Hyper-V Synthetic interrupt
controller(Hyper-V SynIC) MSR's support and
migration. Hyper-V SynIC is enabled by cpu's
'hv-synic' option.

This patch does not allow cpu creation if
'hv-synic' option specified but kernel
doesn't support Hyper-V SynIC.

Changes v2:
* activate Hyper-V SynIC by enabling corresponding vcpu cap
* reject cpu initialization if user requested Hyper-V SynIC
  but kernel does not support Hyper-V SynIC

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: k...@vger.kernel.org

---
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  5 
 target-i386/kvm.c | 67 ++-
 target-i386/machine.c | 39 ++
 5 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index e3bfe9d..7ea5b34 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -94,6 +94,7 @@ typedef struct X86CPU {
 bool hyperv_reset;
 bool hyperv_vpindex;
 bool hyperv_runtime;
+bool hyperv_synic;
 bool check_cpuid;
 bool enforce_cpuid;
 bool expose_kvm;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index e5f1c5b..1462e19 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3142,6 +3142,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
 DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
 DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
+DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index fc4a605..8cf33df 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -918,6 +918,11 @@ typedef struct CPUX86State {
 uint64_t msr_hv_tsc;
 uint64_t msr_hv_crash_params[HV_X64_MSR_CRASH_PARAMS];
 uint64_t msr_hv_runtime;
+uint64_t msr_hv_synic_control;
+uint64_t msr_hv_synic_version;
+uint64_t msr_hv_synic_evt_page;
+uint64_t msr_hv_synic_msg_page;
+uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2a9953b..cfcd01d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -86,6 +86,7 @@ static bool has_msr_hv_crash;
 static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
+static bool has_msr_hv_synic;
 static bool has_msr_mtrr;
 static bool has_msr_xss;
 
@@ -521,7 +522,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu->hyperv_crash ||
 cpu->hyperv_reset ||
 cpu->hyperv_vpindex ||
-cpu->hyperv_runtime);
+cpu->hyperv_runtime ||
+cpu->hyperv_synic);
 }
 
 static Error *invtsc_mig_blocker;
@@ -610,6 +612,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (cpu->hyperv_runtime && has_msr_hv_runtime) {
 c->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
 }
+if (cpu->hyperv_synic) {
+if (!has_msr_hv_synic ||
+kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_SYNIC, 0)) {
+fprintf(stderr, "Hyper-V SynIC is not supported by kernel\n");
+return -ENOSYS;
+}
+c->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
+}
 c = &cpuid_data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
 if (cpu->hyperv_relaxed_timing) {
@@ -950,6 +960,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hv_runtime = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) {
+has_msr_hv_synic = true;
+continue;
+}
 }
 }
 
@@ -1511,6 +1525,31 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_VP_RUNTIME,
   env->msr_hv_runtime);
 }
+if (cpu->hyperv_synic) {
+int j;
+
+if (!env->msr_hv_synic_version) {
+/* First time initialization */
+env->msr_hv_synic_version = HV_SYNIC_VERSION_1;
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_synic_sint); j++) {
+env->msr_hv_synic_sint[j] = HV_SYNIC_SINT_MASKED;
+}
+}
+
+kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_SCONTROL,
+  env->msr_hv_synic_control);
+kvm_msr_entry_set(&msrs[n++], HV_X64_

[Qemu-devel] [PATCH v4 4/5] kvm/x86: Hyper-V synthetic interrupt controller

2015-11-10 Thread Andrey Smetanin

SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Changes v4:
* added activation of SynIC by vcpu KVM_ENABLE_CAP
* added per SynIC active flag
* added deactivation of APICv upon SynIC activation

Changes v3:
* added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
docs

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-devel@nongnu.org

---
 Documentation/virtual/kvm/api.txt |  19 +++
 arch/x86/include/asm/kvm_host.h   |  15 ++
 arch/x86/kvm/hyperv.c | 315 ++
 arch/x86/kvm/hyperv.h |  23 +++
 arch/x86/kvm/irq_comm.c   |  34 
 arch/x86/kvm/lapic.c  |  15 +-
 arch/x86/kvm/lapic.h  |   5 +
 arch/x86/kvm/x86.c|  34 +++-
 include/linux/kvm_host.h  |   6 +
 include/uapi/linux/kvm.h  |   8 +
 10 files changed, 467 insertions(+), 7 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 34cc068..16096a2 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1451,6 +1451,7 @@ struct kvm_irq_routing_entry {
struct kvm_irq_routing_irqchip irqchip;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
+   struct kvm_irq_routing_hv_sint hv_sint;
__u32 pad[8];
} u;
 };
@@ -1459,6 +1460,7 @@ struct kvm_irq_routing_entry {
 #define KVM_IRQ_ROUTING_IRQCHIP 1
 #define KVM_IRQ_ROUTING_MSI 2
 #define KVM_IRQ_ROUTING_S390_ADAPTER 3
+#define KVM_IRQ_ROUTING_HV_SINT 4
 
 No flags are specified so far, the corresponding field must be set to zero.
 
@@ -1482,6 +1484,10 @@ struct kvm_irq_routing_s390_adapter {
__u32 adapter_id;
 };
 
+struct kvm_irq_routing_hv_sint {
+   __u32 vcpu;
+   __u32 sint;
+};
 
 4.53 KVM_ASSIGN_SET_MSIX_NR (deprecated)
 
@@ -3685,3 +3691,16 @@ available, means that that the kernel has an 
implementation of the
 H_RANDOM hypercall backed by a hardware random-number generator.
 If present, the kernel H_RANDOM handler can be enabled for guest use
 with the KVM_CAP_PPC_ENABLE_HCALL capability.
+
+8.2 KVM_CAP_HYPERV_SYNIC
+
+Architectures: x86
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel has an implementation of the
+Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
+used to support Windows Hyper-V based guest paravirt drivers(VMBus).
+
+In order to use SynIC, it has to be activated by setting this
+capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
+will disable the use of APIC hardware virtualization even if supported
+by the CPU, as it's incompatible with SynIC auto-EOI behavior.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a60a461..ad29e89 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -374,10 +375,24 @@ struct kvm_mtrr {
struct list_head head;
 };
 
+/* Hyper-V synthetic interrupt controller (SynIC)*/
+struct kvm_vcpu_hv_synic {
+   u64 version;
+   u64 control;
+   u64 msg_page;
+   u64 evt_page;
+   atomic64_t sint[HV_SYNIC_SINT_COUNT];
+   atomic_t sint_to_gsi[HV_SYNIC_SINT_COUNT];
+   DECLARE_BITMAP(auto_eoi_bitmap, 256);
+   DECLARE_BITMAP(vec_bitmap, 256);
+   bool active;
+};
+
 /* Hyper-V per vcpu emulation context */
 struct kvm_vcpu_hv {
u64 hv_vapic;
s64 runtime_offset;
+   struct kvm_vcpu_hv_synic synic;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 62cf8c9..83a3c0c 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -23,13 +2

[Qemu-devel] [PULL 02/15] trace: add make dependencies on tracetool source

2015-11-10 Thread Stefan Hajnoczi

Patches that change tracetool can break the build if old build output
files are lying around.

This happens because the Makefile does not specify dependencies on
tracetool.  The build will use old object files that do not match the
current source code.

Signed-off-by: Stefan Hajnoczi 
Message-id: 1446198795-6081-3-git-send-email-stefa...@redhat.com
---
 trace/Makefile.objs | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/trace/Makefile.objs b/trace/Makefile.objs
index 73bec38..5145b34 100644
--- a/trace/Makefile.objs
+++ b/trace/Makefile.objs
@@ -1,12 +1,20 @@
 # -*- mode: makefile -*-
 
 ##
+# tracetool source files
+# Every rule that invokes tracetool must depend on this so code is regenerated
+# if tracetool itself changes.
+
+tracetool-y = $(SRC_PATH)/scripts/tracetool.py
+tracetool-y += $(shell find $(SRC_PATH)/scripts/tracetool -name "*.py")
+
+##
 # Auto-generated event descriptions for LTTng ust code
 
 ifeq ($(findstring ust,$(TRACE_BACKENDS)),ust)
 $(obj)/generated-ust-provider.h: $(obj)/generated-ust-provider.h-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
-$(obj)/generated-ust-provider.h-timestamp: $(SRC_PATH)/trace-events
+$(obj)/generated-ust-provider.h-timestamp: $(SRC_PATH)/trace-events 
$(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--format=ust-events-h \
--backends=$(TRACE_BACKENDS) \
@@ -14,7 +22,7 @@ $(obj)/generated-ust-provider.h-timestamp: 
$(SRC_PATH)/trace-events
 
 $(obj)/generated-ust.c: $(obj)/generated-ust.c-timestamp 
$(BUILD_DIR)/config-host.mak
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
-$(obj)/generated-ust.c-timestamp: $(SRC_PATH)/trace-events
+$(obj)/generated-ust.c-timestamp: $(SRC_PATH)/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--format=ust-events-c \
--backends=$(TRACE_BACKENDS) \
@@ -29,7 +37,7 @@ endif
 
 $(obj)/generated-events.h: $(obj)/generated-events.h-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
-$(obj)/generated-events.h-timestamp: $(SRC_PATH)/trace-events
+$(obj)/generated-events.h-timestamp: $(SRC_PATH)/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--format=events-h \
--backends=$(TRACE_BACKENDS) \
@@ -37,7 +45,7 @@ $(obj)/generated-events.h-timestamp: $(SRC_PATH)/trace-events
 
 $(obj)/generated-events.c: $(obj)/generated-events.c-timestamp 
$(BUILD_DIR)/config-host.mak
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
-$(obj)/generated-events.c-timestamp: $(SRC_PATH)/trace-events
+$(obj)/generated-events.c-timestamp: $(SRC_PATH)/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--format=events-c \
--backends=$(TRACE_BACKENDS) \
@@ -54,7 +62,7 @@ util-obj-y += generated-events.o
 
 $(obj)/generated-tracers.h: $(obj)/generated-tracers.h-timestamp
@cmp -s $< $@ || cp $< $@
-$(obj)/generated-tracers.h-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
+$(obj)/generated-tracers.h-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--format=h \
--backends=$(TRACE_BACKENDS) \
@@ -65,7 +73,7 @@ $(obj)/generated-tracers.h-timestamp: 
$(SRC_PATH)/trace-events $(BUILD_DIR)/conf
 
 $(obj)/generated-tracers.c: $(obj)/generated-tracers.c-timestamp
@cmp -s $< $@ || cp $< $@
-$(obj)/generated-tracers.c-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
+$(obj)/generated-tracers.c-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--format=c \
--backends=$(TRACE_BACKENDS) \
@@ -82,7 +90,7 @@ $(obj)/generated-tracers.o: $(obj)/generated-tracers.c 
$(obj)/generated-tracers.
 ifeq ($(findstring dtrace,$(TRACE_BACKENDS)),dtrace)
 $(obj)/generated-tracers-dtrace.dtrace: 
$(obj)/generated-tracers-dtrace.dtrace-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
-$(obj)/generated-tracers-dtrace.dtrace-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
+$(obj)/generated-tracers-dtrace.dtrace-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--format=d \
--backends=$(TRACE_BACKENDS) \
@@ -101,7 +109,7 @@ endif
 
 $(obj)/generated-helpers-wrappers.h: 
$(obj)/generated-helpers-wrappers.h-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
-$(obj)/generated-helpers-wrappers.h-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
+$(obj)/generated-helpers-wrappers.h-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak $(tracetool-y)
$(call quiet-com

[Qemu-devel] [PULL 00/15] Tracing patches

2015-11-10 Thread Stefan Hajnoczi

The following changes since commit a8b4f9585a0bf5186fca793ce2c5d754cd8ec49a:

  Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2015-11-10' into 
staging (2015-11-10 09:39:24 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/tracing-pull-request

for you to fetch changes up to bd0e34e715bcc784fe732945d011cb36645d7f12:

  log: add "-d trace:PATTERN" (2015-11-10 13:23:09 +)





Denis V. Lunev (2):
  trace: no need to call trace_backend_init in different branches now
  log: move qemu-log.c into util/ directory

Paolo Bonzini (11):
  trace: count number of enabled events
  trace: track enabled events in a separate array
  trace: fix documentation
  trace: split trace_init_events out of trace_init_backends
  trace: split trace_init_file out of trace_init_backends
  trace: add "-trace enable=..."
  trace: add "-trace help"
  log: do not unnecessarily include qom/cpu.h
  trace: convert stderr backend to log
  trace: switch default backend to "log"
  log: add "-d trace:PATTERN"

Stefan Hajnoczi (2):
  trace: fix make foo-timestamp rules
  trace: add make dependencies on tracetool source

 Makefile.objs|   1 -
 bsd-user/main.c  |   1 +
 configure|   6 +-
 cpu-exec.c   |   1 +
 exec.c   |   1 +
 hw/acpi/cpu_hotplug.c|   1 +
 hw/timer/a9gtimer.c  |   1 +
 include/exec/log.h   |  60 +++
 include/qemu/log.h   |  60 +--
 linux-user/main.c|   1 +
 qemu-io.c|   2 +-
 qemu-log.c   | 177 
 qemu-options.hx  |  22 ++--
 qom/cpu.c|   1 +
 scripts/tracetool/backend/stderr.py  |  47 -
 scripts/tracetool/format/events_c.py |   2 +-
 target-alpha/translate.c |   1 +
 target-arm/translate.c   |   1 +
 target-cris/translate.c  |   1 +
 target-i386/seg_helper.c |   1 +
 target-i386/smm_helper.c |   1 +
 target-i386/translate.c  |   1 +
 target-lm32/helper.c |   1 +
 target-lm32/translate.c  |   1 +
 target-m68k/translate.c  |   1 +
 target-microblaze/helper.c   |   1 +
 target-microblaze/translate.c|   1 +
 target-mips/helper.c |   1 +
 target-mips/translate.c  |   1 +
 target-moxie/translate.c |   1 +
 target-openrisc/translate.c  |   1 +
 target-ppc/mmu-hash32.c  |   1 +
 target-ppc/mmu-hash64.c  |   1 +
 target-ppc/mmu_helper.c  |   1 +
 target-ppc/translate.c   |   1 +
 target-s390x/translate.c |   1 +
 target-sh4/helper.c  |   1 +
 target-sh4/translate.c   |   1 +
 target-sparc/int32_helper.c  |   1 +
 target-sparc/int64_helper.c  |   1 +
 target-sparc/translate.c |   1 +
 target-tilegx/translate.c|   1 +
 target-tricore/translate.c   |   1 +
 target-unicore32/translate.c |   1 +
 target-xtensa/translate.c|   1 +
 tcg/tcg.c|   1 +
 trace/Makefile.objs  |  48 +
 trace/control-internal.h |  15 ++-
 trace/control.c  |  98 +-
 trace/control.h  |  44 +++-
 trace/event-internal.h   |   2 -
 trace/simple.c   |   6 +-
 trace/simple.h   |   4 +-
 translate-all.c  |   1 +
 util/Makefile.objs   |   1 +
 util/log.c   | 190 +++
 vl.c |  38 ---
 57 files changed, 488 insertions(+), 373 deletions(-)
 create mode 100644 include/exec/log.h
 delete mode 100644 qemu-log.c
 delete mode 100644 scripts/tracetool/backend/stderr.py
 create mode 100644 util/log.c

-- 
2.5.0

[Qemu-devel] [PULL 01/15] trace: fix make foo-timestamp rules

2015-11-10 Thread Stefan Hajnoczi

The Makefile uses intermediate timestamp files to avoid rebuilding if
tracetool output is unchanged.

Timestamps are implemented incorrectly.  This was fixed for rules.mak in
commit 4b25966ab976f3a7fd9008193b2defcc82f8f04d ("rules.mak: cleanup
config generation rules") but never fixed in trace/Makefile.objs.

The problem with the old timestamp implementation was that make doesn't
notice the updated file modification time until the next time it is run.
It was necessary to run make twice in a row to achieve a full rebuild.

Signed-off-by: Stefan Hajnoczi 
Message-id: 1446198795-6081-2-git-send-email-stefa...@redhat.com
---
 trace/Makefile.objs | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/trace/Makefile.objs b/trace/Makefile.objs
index 32f7a32..73bec38 100644
--- a/trace/Makefile.objs
+++ b/trace/Makefile.objs
@@ -5,20 +5,20 @@
 
 ifeq ($(findstring ust,$(TRACE_BACKENDS)),ust)
 $(obj)/generated-ust-provider.h: $(obj)/generated-ust-provider.h-timestamp
+   @cmp $< $@ >/dev/null 2>&1 || cp $< $@
 $(obj)/generated-ust-provider.h-timestamp: $(SRC_PATH)/trace-events
$(call quiet-command,$(TRACETOOL) \
--format=ust-events-h \
--backends=$(TRACE_BACKENDS) \
< $< > $@,"  GEN   $(patsubst %-timestamp,%,$@)")
-   @cmp -s $@ $(patsubst %-timestamp,%,$@) || cp $@ $(patsubst 
%-timestamp,%,$@)
 
 $(obj)/generated-ust.c: $(obj)/generated-ust.c-timestamp 
$(BUILD_DIR)/config-host.mak
+   @cmp $< $@ >/dev/null 2>&1 || cp $< $@
 $(obj)/generated-ust.c-timestamp: $(SRC_PATH)/trace-events
$(call quiet-command,$(TRACETOOL) \
--format=ust-events-c \
--backends=$(TRACE_BACKENDS) \
< $< > $@,"  GEN   $(patsubst %-timestamp,%,$@)")
-   @cmp -s $@ $(patsubst %-timestamp,%,$@) || cp $@ $(patsubst 
%-timestamp,%,$@)
 
 $(obj)/generated-events.h: $(obj)/generated-ust-provider.h
 $(obj)/generated-events.c: $(obj)/generated-ust.c
@@ -28,20 +28,20 @@ endif
 # Auto-generated event descriptions
 
 $(obj)/generated-events.h: $(obj)/generated-events.h-timestamp
+   @cmp $< $@ >/dev/null 2>&1 || cp $< $@
 $(obj)/generated-events.h-timestamp: $(SRC_PATH)/trace-events
$(call quiet-command,$(TRACETOOL) \
--format=events-h \
--backends=$(TRACE_BACKENDS) \
< $< > $@,"  GEN   $(patsubst %-timestamp,%,$@)")
-   @cmp -s $@ $(patsubst %-timestamp,%,$@) || cp $@ $(patsubst 
%-timestamp,%,$@)
 
 $(obj)/generated-events.c: $(obj)/generated-events.c-timestamp 
$(BUILD_DIR)/config-host.mak
+   @cmp $< $@ >/dev/null 2>&1 || cp $< $@
 $(obj)/generated-events.c-timestamp: $(SRC_PATH)/trace-events
$(call quiet-command,$(TRACETOOL) \
--format=events-c \
--backends=$(TRACE_BACKENDS) \
< $< > $@,"  GEN   $(patsubst %-timestamp,%,$@)")
-   @cmp -s $@ $(patsubst %-timestamp,%,$@) || cp $@ $(patsubst 
%-timestamp,%,$@)
 
 util-obj-y += generated-events.o
 
@@ -81,12 +81,12 @@ $(obj)/generated-tracers.o: $(obj)/generated-tracers.c 
$(obj)/generated-tracers.
 # rule file. So we use '.dtrace' instead
 ifeq ($(findstring dtrace,$(TRACE_BACKENDS)),dtrace)
 $(obj)/generated-tracers-dtrace.dtrace: 
$(obj)/generated-tracers-dtrace.dtrace-timestamp
+   @cmp $< $@ >/dev/null 2>&1 || cp $< $@
 $(obj)/generated-tracers-dtrace.dtrace-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
$(call quiet-command,$(TRACETOOL) \
--format=d \
--backends=$(TRACE_BACKENDS) \
< $< > $@,"  GEN   $(patsubst %-timestamp,%,$@)")
-   @cmp -s $@ $(patsubst %-timestamp,%,$@) || cp $@ $(patsubst 
%-timestamp,%,$@)
 
 $(obj)/generated-tracers-dtrace.h: $(obj)/generated-tracers-dtrace.dtrace
$(call quiet-command,dtrace -o $@ -h -s $<, "  GEN   $@")
@@ -100,28 +100,28 @@ endif
 # Translation level
 
 $(obj)/generated-helpers-wrappers.h: 
$(obj)/generated-helpers-wrappers.h-timestamp
+   @cmp $< $@ >/dev/null 2>&1 || cp $< $@
 $(obj)/generated-helpers-wrappers.h-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
$(call quiet-command,$(TRACETOOL) \
--format=tcg-helper-wrapper-h \
--backend=$(TRACE_BACKENDS) \
< $< > $@,"  GEN   $(patsubst %-timestamp,%,$@)")
-   @cmp -s $@ $(patsubst %-timestamp,%,$@) || cp $@ $(patsubst 
%-timestamp,%,$@)
 
 $(obj)/generated-helpers.h: $(obj)/generated-helpers.h-timestamp
+   @cmp $< $@ >/dev/null 2>&1 || cp $< $@
 $(obj)/generated-helpers.h-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
$(call quiet-command,$(TRACETOOL) \
--format=tcg-helper-h \
--backend=$(TRACE_BACKENDS) \
< $< > $@,"  GEN   $(patsubst %-timestamp,%,$@)")
-   @cmp -s $@ $(patsubst %-timestamp,%,$@) || cp $@ $(patsubst 
%-timestamp,%,$@)
 
 $(obj)/generated-helpers.c:

[Qemu-devel] [PULL 04/15] trace: track enabled events in a separate array

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This is more cache friendly on the fast path, where we already have
the event id available.

Signed-off-by: Paolo Bonzini 
Message-id: 1446012388-9586-3-git-send-email-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 scripts/tracetool/format/events_c.py |  2 +-
 trace/control-internal.h | 15 +++
 trace/control.c  |  1 +
 trace/control.h  |  2 +-
 trace/event-internal.h   |  2 --
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/scripts/tracetool/format/events_c.py 
b/scripts/tracetool/format/events_c.py
index 2d97fa3..2717ea3 100644
--- a/scripts/tracetool/format/events_c.py
+++ b/scripts/tracetool/format/events_c.py
@@ -27,7 +27,7 @@ def generate(events, backend):
 out('TraceEvent trace_events[TRACE_EVENT_COUNT] = {')
 
 for e in events:
-out('{ .id = %(id)s, .name = \"%(name)s\", .sstate = %(sstate)s, 
.dstate = 0 },',
+out('{ .id = %(id)s, .name = \"%(name)s\", .sstate = %(sstate)s 
},',
 id = "TRACE_" + e.name.upper(),
 name = e.name,
 sstate = "TRACE_%s_ENABLED" % e.name.upper())
diff --git a/trace/control-internal.h b/trace/control-internal.h
index 271bddb..07cb1c1 100644
--- a/trace/control-internal.h
+++ b/trace/control-internal.h
@@ -14,6 +14,7 @@
 
 
 extern TraceEvent trace_events[];
+extern bool trace_events_dstate[];
 extern int trace_events_enabled_count;
 
 
@@ -52,18 +53,24 @@ static inline bool trace_event_get_state_static(TraceEvent 
*ev)
 return ev->sstate;
 }
 
+static inline bool trace_event_get_state_dynamic_by_id(int id)
+{
+return unlikely(trace_events_enabled_count) && trace_events_dstate[id];
+}
+
 static inline bool trace_event_get_state_dynamic(TraceEvent *ev)
 {
-assert(ev != NULL);
-return unlikely(trace_events_enabled_count) && ev->dstate;
+int id = trace_event_get_id(ev);
+return trace_event_get_state_dynamic_by_id(id);
 }
 
 static inline void trace_event_set_state_dynamic(TraceEvent *ev, bool state)
 {
+int id = trace_event_get_id(ev);
 assert(ev != NULL);
 assert(trace_event_get_state_static(ev));
-trace_events_enabled_count += state - ev->dstate;
-ev->dstate = state;
+trace_events_enabled_count += state - trace_events_dstate[id];
+trace_events_dstate[id] = state;
 }
 
 #endif  /* TRACE__CONTROL_INTERNAL_H */
diff --git a/trace/control.c b/trace/control.c
index 95fbc07..700440c 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -17,6 +17,7 @@
 #include "qemu/error-report.h"
 
 int trace_events_enabled_count;
+bool trace_events_dstate[TRACE_EVENT_COUNT];
 
 TraceEvent *trace_event_name(const char *name)
 {
diff --git a/trace/control.h b/trace/control.h
index da9bb6b..6af7ddc 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -104,7 +104,7 @@ static const char * trace_event_get_name(TraceEvent *ev);
  * As a down side, you must always use an immediate #TraceEventID value.
  */
 #define trace_event_get_state(id)   \
-((id ##_ENABLED) && trace_event_get_state_dynamic(trace_event_id(id)))
+((id ##_ENABLED) && trace_event_get_state_dynamic_by_id(id))
 
 /**
  * trace_event_get_state_static:
diff --git a/trace/event-internal.h b/trace/event-internal.h
index b2310d9..86f6a51 100644
--- a/trace/event-internal.h
+++ b/trace/event-internal.h
@@ -18,7 +18,6 @@
  * @id: Unique event identifier.
  * @name: Event name.
  * @sstate: Static tracing state.
- * @dstate: Dynamic tracing state.
  *
  * Opaque generic description of a tracing event.
  */
@@ -26,7 +25,6 @@ typedef struct TraceEvent {
 TraceEventID id;
 const char * name;
 const bool sstate;
-bool dstate;
 } TraceEvent;
 
 
-- 
2.5.0

[Qemu-devel] [PULL 07/15] trace: split trace_init_file out of trace_init_backends

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This is cleaner, and improves error reporting with -daemonize.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Acked-by: Christian Borntraeger 
Message-id: 1446151457-21157-4-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 qemu-io.c   |  2 +-
 trace/control.c | 17 -
 trace/control.h | 13 -
 trace/simple.c  |  6 ++
 trace/simple.h  |  4 ++--
 vl.c| 13 +
 6 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index d6fa11b..fbddf82 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -440,7 +440,7 @@ int main(int argc, char **argv)
 }
 break;
 case 'T':
-if (!trace_init_backends(optarg)) {
+if (!trace_init_backends()) {
 exit(1); /* error message will have been printed */
 }
 break;
diff --git a/trace/control.c b/trace/control.c
index 931d64c..f5a497a 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -145,17 +145,24 @@ void trace_init_events(const char *fname)
 loc_pop(&loc);
 }
 
-bool trace_init_backends(const char *file)
+void trace_init_file(const char *file)
 {
 #ifdef CONFIG_TRACE_SIMPLE
-if (!st_init(file)) {
-fprintf(stderr, "failed to initialize simple tracing backend.\n");
-return false;
-}
+st_set_trace_file(file);
 #else
 if (file) {
 fprintf(stderr, "error: -trace file=...: "
 "option not supported by the selected tracing backends\n");
+exit(1);
+}
+#endif
+}
+
+bool trace_init_backends(void)
+{
+#ifdef CONFIG_TRACE_SIMPLE
+if (!st_init()) {
+fprintf(stderr, "failed to initialize simple tracing backend.\n");
 return false;
 }
 #endif
diff --git a/trace/control.h b/trace/control.h
index 7905917..d50f399 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -157,7 +157,7 @@ static void trace_event_set_state_dynamic(TraceEvent *ev, 
bool state);
  *
  * Returns: Whether the backends could be successfully initialized.
  */
-bool trace_init_backends(const char *file);
+bool trace_init_backends(void);
 
 /**
  * trace_init_events:
@@ -170,6 +170,17 @@ bool trace_init_backends(const char *file);
  */
 void trace_init_events(const char *file);
 
+/**
+ * trace_init_file:
+ * @file:   Name of trace output file; may be NULL.
+ *  Corresponds to commandline option "-trace file=...".
+ *
+ * Record the name of the output file for the tracing backend.
+ * Exits if no selected backend does not support specifying the
+ * output file, and a non-NULL file was passed.
+ */
+void trace_init_file(const char *file);
+
 
 #include "trace/control-internal.h"
 
diff --git a/trace/simple.c b/trace/simple.c
index 11ad030..a4bc705 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -322,7 +322,7 @@ void st_set_trace_file_enabled(bool enable)
  * @fileThe trace file name or NULL for the default name- set at
  *  config time
  */
-bool st_set_trace_file(const char *file)
+void st_set_trace_file(const char *file)
 {
 st_set_trace_file_enabled(false);
 
@@ -335,7 +335,6 @@ bool st_set_trace_file(const char *file)
 }
 
 st_set_trace_file_enabled(true);
-return true;
 }
 
 void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE 
*stream, const char *fmt, ...))
@@ -373,7 +372,7 @@ static GThread *trace_thread_create(GThreadFunc fn)
 return thread;
 }
 
-bool st_init(const char *file)
+bool st_init(void)
 {
 GThread *thread;
 
@@ -386,6 +385,5 @@ bool st_init(const char *file)
 }
 
 atexit(st_flush_trace_buffer);
-st_set_trace_file(file);
 return true;
 }
diff --git a/trace/simple.h b/trace/simple.h
index 6997996..8d1a32e 100644
--- a/trace/simple.h
+++ b/trace/simple.h
@@ -20,8 +20,8 @@
 
 void st_print_trace_file_status(FILE *stream, fprintf_function stream_printf);
 void st_set_trace_file_enabled(bool enable);
-bool st_set_trace_file(const char *file);
-bool st_init(const char *file);
+void st_set_trace_file(const char *file);
+bool st_init(void);
 void st_flush_trace_buffer(void);
 
 typedef struct {
diff --git a/vl.c b/vl.c
index 4df502c..b567ed9 100644
--- a/vl.c
+++ b/vl.c
@@ -2991,7 +2991,7 @@ int main(int argc, char **argv, char **envp)
 bool userconfig = true;
 const char *log_mask = NULL;
 const char *log_file = NULL;
-const char *trace_file = NULL;
+char *trace_file = NULL;
 ram_addr_t maxram_size;
 uint64_t ram_slots = 0;
 FILE *vmstate_dump_file = NULL;
@@ -3908,7 +3908,10 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 trace_init_events(qemu_opt_get(opts, "events"));
-trace_file = qemu_opt_get(opts, "file");
+if (trace_file) {
+g_free(trace_file);
+}
+trace_file = g_strdup(qemu_opt_get(opts, "file"));
 qemu_opts_

[Qemu-devel] [PULL 03/15] trace: count number of enabled events

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This lets trace_event_get_state_dynamic quickly return false.  Right
now there is hardly any benefit because there are also many assertions
and indirections, but the next patch will streamline all of this.

Signed-off-by: Paolo Bonzini 
Message-id: 1446012388-9586-2-git-send-email-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 trace/control-internal.h | 4 +++-
 trace/control.c  | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/trace/control-internal.h b/trace/control-internal.h
index 5a8df28..271bddb 100644
--- a/trace/control-internal.h
+++ b/trace/control-internal.h
@@ -14,6 +14,7 @@
 
 
 extern TraceEvent trace_events[];
+extern int trace_events_enabled_count;
 
 
 static inline TraceEventID trace_event_count(void)
@@ -54,13 +55,14 @@ static inline bool trace_event_get_state_static(TraceEvent 
*ev)
 static inline bool trace_event_get_state_dynamic(TraceEvent *ev)
 {
 assert(ev != NULL);
-return ev->dstate;
+return unlikely(trace_events_enabled_count) && ev->dstate;
 }
 
 static inline void trace_event_set_state_dynamic(TraceEvent *ev, bool state)
 {
 assert(ev != NULL);
 assert(trace_event_get_state_static(ev));
+trace_events_enabled_count += state - ev->dstate;
 ev->dstate = state;
 }
 
diff --git a/trace/control.c b/trace/control.c
index 995beb3..95fbc07 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -16,6 +16,8 @@
 #endif
 #include "qemu/error-report.h"
 
+int trace_events_enabled_count;
+
 TraceEvent *trace_event_name(const char *name)
 {
 assert(name != NULL);
-- 
2.5.0

[Qemu-devel] [PULL 10/15] trace: add "-trace help"

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Print a list of trace points

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Acked-by: Christian Borntraeger 
Message-id: 1446151457-21157-7-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 qemu-options.hx |  2 ++
 trace/control.c | 21 -
 trace/control.h |  7 +++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 45ddd27..fff23dd 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3495,6 +3495,8 @@ available if QEMU has been compiled with the 
@var{simple}, @var{stderr}
 or @var{ftrace} tracing backend.  To specify multiple events or patterns,
 specify the @option{-trace} option multiple times.
 
+Use @code{-trace help} to print a list of names of trace points.
+
 @item events=@var{file}
 Immediately enable events listed in @var{file}.
 The file must contain one event name (as listed in the @file{trace-events} 
file)
diff --git a/trace/control.c b/trace/control.c
index af92705..bef7884 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -88,7 +88,16 @@ TraceEvent *trace_event_pattern(const char *pat, TraceEvent 
*ev)
 return NULL;
 }
 
-void trace_enable_events(const char *line_buf)
+void trace_list_events(void)
+{
+int i;
+for (i = 0; i < trace_event_count(); i++) {
+TraceEvent *res = trace_event_id(i);
+fprintf(stderr, "%s\n", trace_event_get_name(res));
+}
+}
+
+static void do_trace_enable_events(const char *line_buf)
 {
 const bool enable = ('-' != line_buf[0]);
 const char *line_ptr = enable ? line_buf : line_buf + 1;
@@ -114,6 +123,16 @@ void trace_enable_events(const char *line_buf)
 }
 }
 
+void trace_enable_events(const char *line_buf)
+{
+if (is_help_option(line_buf)) {
+trace_list_events();
+exit(0);
+} else {
+do_trace_enable_events(line_buf);
+}
+}
+
 void trace_init_events(const char *fname)
 {
 Location loc;
diff --git a/trace/control.h b/trace/control.h
index d5081ce..d5bc86e 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -182,6 +182,13 @@ void trace_init_events(const char *file);
 void trace_init_file(const char *file);
 
 /**
+ * trace_list_events:
+ *
+ * List all available events.
+ */
+void trace_list_events(void);
+
+/**
  * trace_enable_events:
  * @line_buf: A string with a glob pattern of events to be enabled or,
  *if the string starts with '-', disabled.
-- 
2.5.0

[Qemu-devel] [PULL 11/15] log: do not unnecessarily include qom/cpu.h

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Split the bits that require it to exec/log.h.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Acked-by: Christian Borntraeger 
Message-id: 1446151457-21157-8-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 bsd-user/main.c   |  1 +
 cpu-exec.c|  1 +
 exec.c|  1 +
 hw/acpi/cpu_hotplug.c |  1 +
 hw/timer/a9gtimer.c   |  1 +
 include/exec/log.h| 60 +++
 include/qemu/log.h| 59 --
 linux-user/main.c |  1 +
 qom/cpu.c |  1 +
 target-alpha/translate.c  |  1 +
 target-arm/translate.c|  1 +
 target-cris/translate.c   |  1 +
 target-i386/seg_helper.c  |  1 +
 target-i386/smm_helper.c  |  1 +
 target-i386/translate.c   |  1 +
 target-lm32/helper.c  |  1 +
 target-lm32/translate.c   |  1 +
 target-m68k/translate.c   |  1 +
 target-microblaze/helper.c|  1 +
 target-microblaze/translate.c |  1 +
 target-mips/helper.c  |  1 +
 target-mips/translate.c   |  1 +
 target-moxie/translate.c  |  1 +
 target-openrisc/translate.c   |  1 +
 target-ppc/mmu-hash32.c   |  1 +
 target-ppc/mmu-hash64.c   |  1 +
 target-ppc/mmu_helper.c   |  1 +
 target-ppc/translate.c|  1 +
 target-s390x/translate.c  |  1 +
 target-sh4/helper.c   |  1 +
 target-sh4/translate.c|  1 +
 target-sparc/int32_helper.c   |  1 +
 target-sparc/int64_helper.c   |  1 +
 target-sparc/translate.c  |  1 +
 target-tilegx/translate.c |  1 +
 target-tricore/translate.c|  1 +
 target-unicore32/translate.c  |  1 +
 target-xtensa/translate.c |  1 +
 tcg/tcg.c |  1 +
 translate-all.c   |  1 +
 40 files changed, 98 insertions(+), 59 deletions(-)
 create mode 100644 include/exec/log.h

diff --git a/bsd-user/main.c b/bsd-user/main.c
index adf2de0..520ce99 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -33,6 +33,7 @@
 #include "tcg.h"
 #include "qemu/timer.h"
 #include "qemu/envlist.h"
+#include "exec/log.h"
 
 int singlestep;
 unsigned long mmap_min_addr;
diff --git a/cpu-exec.c b/cpu-exec.c
index c88d0ff..8e2e52b 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -27,6 +27,7 @@
 #include "exec/address-spaces.h"
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
+#include "exec/log.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
diff --git a/exec.c b/exec.c
index a028961..dec6691 100644
--- a/exec.c
+++ b/exec.c
@@ -54,6 +54,7 @@
 
 #include "exec/memory-internal.h"
 #include "exec/ram_addr.h"
+#include "exec/log.h"
 
 #include "qemu/range.h"
 #ifndef _WIN32
diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
index f5b9972..16bacfc 100644
--- a/hw/acpi/cpu_hotplug.c
+++ b/hw/acpi/cpu_hotplug.c
@@ -11,6 +11,7 @@
  */
 #include "hw/hw.h"
 #include "hw/acpi/cpu_hotplug.h"
+#include "qom/cpu.h"
 
 static uint64_t cpu_status_read(void *opaque, hwaddr addr, unsigned int size)
 {
diff --git a/hw/timer/a9gtimer.c b/hw/timer/a9gtimer.c
index dd4aae8..b38c76a 100644
--- a/hw/timer/a9gtimer.c
+++ b/hw/timer/a9gtimer.c
@@ -24,6 +24,7 @@
 #include "qemu/timer.h"
 #include "qemu/bitops.h"
 #include "qemu/log.h"
+#include "qom/cpu.h"
 
 #ifndef A9_GTIMER_ERR_DEBUG
 #define A9_GTIMER_ERR_DEBUG 0
diff --git a/include/exec/log.h b/include/exec/log.h
new file mode 100644
index 000..ba1c9b5
--- /dev/null
+++ b/include/exec/log.h
@@ -0,0 +1,60 @@
+#ifndef QEMU_EXEC_LOG_H
+#define QEMU_EXEC_LOG_H
+
+#include "qemu/log.h"
+#include "qom/cpu.h"
+#include "disas/disas.h"
+
+/* cpu_dump_state() logging functions: */
+/**
+ * log_cpu_state:
+ * @cpu: The CPU whose state is to be logged.
+ * @flags: Flags what to log.
+ *
+ * Logs the output of cpu_dump_state().
+ */
+static inline void log_cpu_state(CPUState *cpu, int flags)
+{
+if (qemu_log_enabled()) {
+cpu_dump_state(cpu, qemu_logfile, fprintf, flags);
+}
+}
+
+/**
+ * log_cpu_state_mask:
+ * @mask: Mask when to log.
+ * @cpu: The CPU whose state is to be logged.
+ * @flags: Flags what to log.
+ *
+ * Logs the output of cpu_dump_state() if loglevel includes @mask.
+ */
+static inline void log_cpu_state_mask(int mask, CPUState *cpu, int flags)
+{
+if (qemu_loglevel & mask) {
+log_cpu_state(cpu, flags);
+}
+}
+
+#ifdef NEED_CPU_H
+/* disas() and target_disas() to qemu_logfile: */
+static inline void log_target_disas(CPUState *cpu, target_ulong start,
+target_ulong len, int flags)
+{
+target_disas(qemu_logfile, cpu, start, len, flags);
+}
+
+static inline void log_disas(void *code, unsigned long size)
+{
+disas(qemu_logfile, code, size);
+}
+
+#if defined(CONFIG_USER_ONLY)
+/* page_dump() output to the log file: */
+static inline void log_page_dump(void)
+{
+page_dump(qemu_logfile);
+}
+#endif
+#endif
+
+#endif
di

[Qemu-devel] [PULL 06/15] trace: split trace_init_events out of trace_init_backends

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This is cleaner and has two advantages.  First, it improves error
reporting with -daemonize.  Second, multiple "-trace events" options
now cumulate.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Acked-by: Christian Borntraeger 
Message-id: 1446151457-21157-3-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 qemu-io.c   |  2 +-
 trace/control.c |  5 ++---
 trace/control.h | 15 ---
 vl.c|  8 
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 269f17c..d6fa11b 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -440,7 +440,7 @@ int main(int argc, char **argv)
 }
 break;
 case 'T':
-if (!trace_init_backends(optarg, NULL)) {
+if (!trace_init_backends(optarg)) {
 exit(1); /* error message will have been printed */
 }
 break;
diff --git a/trace/control.c b/trace/control.c
index 700440c..931d64c 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -88,7 +88,7 @@ TraceEvent *trace_event_pattern(const char *pat, TraceEvent 
*ev)
 return NULL;
 }
 
-static void trace_init_events(const char *fname)
+void trace_init_events(const char *fname)
 {
 Location loc;
 FILE *fp;
@@ -145,7 +145,7 @@ static void trace_init_events(const char *fname)
 loc_pop(&loc);
 }
 
-bool trace_init_backends(const char *events, const char *file)
+bool trace_init_backends(const char *file)
 {
 #ifdef CONFIG_TRACE_SIMPLE
 if (!st_init(file)) {
@@ -167,6 +167,5 @@ bool trace_init_backends(const char *events, const char 
*file)
 }
 #endif
 
-trace_init_events(events);
 return true;
 }
diff --git a/trace/control.h b/trace/control.h
index 6af7ddc..7905917 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -150,8 +150,6 @@ static void trace_event_set_state_dynamic(TraceEvent *ev, 
bool state);
 
 /**
  * trace_init_backends:
- * @events: Name of file with events to be enabled at startup; may be NULL.
- *  Corresponds to commandline option "-trace events=...".
  * @file:   Name of trace output file; may be NULL.
  *  Corresponds to commandline option "-trace file=...".
  *
@@ -159,7 +157,18 @@ static void trace_event_set_state_dynamic(TraceEvent *ev, 
bool state);
  *
  * Returns: Whether the backends could be successfully initialized.
  */
-bool trace_init_backends(const char *events, const char *file);
+bool trace_init_backends(const char *file);
+
+/**
+ * trace_init_events:
+ * @events: Name of file with events to be enabled at startup; may be NULL.
+ *  Corresponds to commandline option "-trace events=...".
+ *
+ * Read the list of enabled tracing events.
+ *
+ * Returns: Whether the backends could be successfully initialized.
+ */
+void trace_init_events(const char *file);
 
 
 #include "trace/control-internal.h"
diff --git a/vl.c b/vl.c
index 21e8876..4df502c 100644
--- a/vl.c
+++ b/vl.c
@@ -2991,7 +2991,6 @@ int main(int argc, char **argv, char **envp)
 bool userconfig = true;
 const char *log_mask = NULL;
 const char *log_file = NULL;
-const char *trace_events = NULL;
 const char *trace_file = NULL;
 ram_addr_t maxram_size;
 uint64_t ram_slots = 0;
@@ -3908,8 +3907,9 @@ int main(int argc, char **argv, char **envp)
 if (!opts) {
 exit(1);
 }
-trace_events = qemu_opt_get(opts, "events");
+trace_init_events(qemu_opt_get(opts, "events"));
 trace_file = qemu_opt_get(opts, "file");
+qemu_opts_del(opts);
 break;
 }
 case QEMU_OPTION_readconfig:
@@ -4109,7 +4109,7 @@ int main(int argc, char **argv, char **envp)
 }
 
 if (!is_daemonized()) {
-if (!trace_init_backends(trace_events, trace_file)) {
+if (!trace_init_backends(trace_file)) {
 exit(1);
 }
 }
@@ -4672,7 +4672,7 @@ int main(int argc, char **argv, char **envp)
 os_setup_post();
 
 if (is_daemonized()) {
-if (!trace_init_backends(trace_events, trace_file)) {
+if (!trace_init_backends(trace_file)) {
 exit(1);
 }
 }
-- 
2.5.0

[Qemu-devel] [PULL 09/15] trace: add "-trace enable=..."

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Allow enabling events without going through a file, for example:

   qemu-system-x86_64 -trace bdrv_aio_writev -trace bdrv_aio_readv

or with globbing too:

   qemu-system-x86_64 -trace 'bdrv_aio_*'

if an appropriate backend is enabled (simple, stderr, ftrace).

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Acked-by: Christian Borntraeger 
Message-id: 1446151457-21157-6-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 qemu-options.hx | 10 +-
 trace/control.c | 48 +++-
 trace/control.h |  9 +
 vl.c| 11 +--
 4 files changed, 54 insertions(+), 24 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index a8fe78e..45ddd27 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3475,7 +3475,7 @@ config files on @var{sysconfdir}, but won't make it skip 
the QEMU-provided confi
 files from @var{datadir}.
 ETEXI
 DEF("trace", HAS_ARG, QEMU_OPTION_trace,
-"-trace [events=][,file=]\n"
+"-trace [[enable=]][,events=][,file=]\n"
 "specify tracing options\n",
 QEMU_ARCH_ALL)
 STEXI
@@ -3487,6 +3487,14 @@ HXCOMM HX does not support conditional compilation of 
text.
 Specify tracing options.
 
 @table @option
+@item [enable=]@var{pattern}
+Immediately enable events matching @var{pattern}.
+The file must contain one event name (as listed in the @file{trace-events} 
file)
+per line; globbing patterns are accepted too.  This option is only
+available if QEMU has been compiled with the @var{simple}, @var{stderr}
+or @var{ftrace} tracing backend.  To specify multiple events or patterns,
+specify the @option{-trace} option multiple times.
+
 @item events=@var{file}
 Immediately enable events listed in @var{file}.
 The file must contain one event name (as listed in the @file{trace-events} 
file)
diff --git a/trace/control.c b/trace/control.c
index f5a497a..af92705 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -88,6 +88,32 @@ TraceEvent *trace_event_pattern(const char *pat, TraceEvent 
*ev)
 return NULL;
 }
 
+void trace_enable_events(const char *line_buf)
+{
+const bool enable = ('-' != line_buf[0]);
+const char *line_ptr = enable ? line_buf : line_buf + 1;
+
+if (trace_event_is_pattern(line_ptr)) {
+TraceEvent *ev = NULL;
+while ((ev = trace_event_pattern(line_ptr, ev)) != NULL) {
+if (trace_event_get_state_static(ev)) {
+trace_event_set_state_dynamic(ev, enable);
+}
+}
+} else {
+TraceEvent *ev = trace_event_name(line_ptr);
+if (ev == NULL) {
+error_report("WARNING: trace event '%s' does not exist",
+ line_ptr);
+} else if (!trace_event_get_state_static(ev)) {
+error_report("WARNING: trace event '%s' is not traceable",
+ line_ptr);
+} else {
+trace_event_set_state_dynamic(ev, enable);
+}
+}
+}
+
 void trace_init_events(const char *fname)
 {
 Location loc;
@@ -114,27 +140,7 @@ void trace_init_events(const char *fname)
 if ('#' == line_buf[0]) { /* skip commented lines */
 continue;
 }
-const bool enable = ('-' != line_buf[0]);
-char *line_ptr = enable ? line_buf : line_buf + 1;
-if (trace_event_is_pattern(line_ptr)) {
-TraceEvent *ev = NULL;
-while ((ev = trace_event_pattern(line_ptr, ev)) != NULL) {
-if (trace_event_get_state_static(ev)) {
-trace_event_set_state_dynamic(ev, enable);
-}
-}
-} else {
-TraceEvent *ev = trace_event_name(line_ptr);
-if (ev == NULL) {
-error_report("WARNING: trace event '%s' does not exist",
- line_ptr);
-} else if (!trace_event_get_state_static(ev)) {
-error_report("WARNING: trace event '%s' is not traceable",
- line_ptr);
-} else {
-trace_event_set_state_dynamic(ev, enable);
-}
-}
+trace_enable_events(line_buf);
 }
 }
 if (fclose(fp) != 0) {
diff --git a/trace/control.h b/trace/control.h
index d50f399..d5081ce 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -181,6 +181,15 @@ void trace_init_events(const char *file);
  */
 void trace_init_file(const char *file);
 
+/**
+ * trace_enable_events:
+ * @line_buf: A string with a glob pattern of events to be enabled or,
+ *if the string starts with '-', disabled.
+ *
+ * Enable or disable matching events.
+ */
+void trace_enable_events(const char *line_buf);
+
 
 #include "trace/control-internal.h"
 
diff --git a/vl.c b/vl.c
index a9c3449..e391e1d 100644
--- a/vl.c
+++ b/vl.c
@@ -271,10 +271,14 @@ static QemuO

Re: [Qemu-devel] [PATCH v11 00/14] block: incremental backup transactions using BlockJobTxn

2015-11-10 Thread Stefan Hajnoczi

On Thu, Nov 05, 2015 at 06:13:06PM -0500, John Snow wrote:
> Welcome to the Incremental Backup Transactions Newsletter!
> 
> What's new?
> 
> I replaced the per-action "transactional-cancel" parameter with
> a per-transaction paremeter named "completion-mode" which is implemented
> as an enum in case we want to add new behaviors in the future, such
> as a "jobs only" cancel mode.
> 
> For now, it's "grouped" or "individual", and if you use it with actions
> that do not support the latent transactional cancel, you will receive
> an error for your troubles.
> 
> Version 10 primarily changed V7's patches 10-11 and replaced them
> with patches 10-12 that are cut a little differently.
> 
> This is based on top of the work by Stefan Hajnoczi and Fam Zheng.
> 
> Recap: motivation for block job transactions
> 
> If an incremental backup block job fails then we reclaim the bitmap so
> the job can be retried.  The problem comes when multiple jobs are started as
> part of a qmp 'transaction' command.  We need to group these jobs in a
> transaction so that either all jobs complete successfully or all bitmaps are
> reclaimed.
> 
> Without transactions, there is a case where some jobs complete successfully 
> and
> throw away their bitmaps, making it impossible to retry the backup by 
> rerunning
> the command if one of the jobs fails.
> 
> How does this implementation work?
> --
> These patches add a BlockJobTxn object with the following API:
> 
>   txn = block_job_txn_new();
>   block_job_txn_add_job(txn, job1);
>   block_job_txn_add_job(txn, job2);
> 
> The jobs either both complete successfully or they both fail/cancel.  If the
> user cancels job1 then job2 will also be cancelled and vice versa.
> 
> Jobs objects stay alive waiting for other jobs to complete, even if the
> coroutines have returned.  They can be cancelled by the user during this time.
> Job blockers are still in effect and no other block job can run on this device
> in the meantime (since QEMU currently only allows 1 job per device).  This is
> the main drawback to this approach but reasonable since you probably don't 
> want
> to run other jobs/operations until you're sure the backup was successful (you
> won't be able to retry a failed backup if there's a new job running).
> 
> [History]
> 
> v11: Renamed "err-cancel" to "completion-mode"
>  "none" becomes "individual"
>  "all" becomes "grouped"
> 
> v10: Took series back from Fam. (jsnow)
>  Replaced per-action parameter with per-transaction properties.
>  Patches 10,11 were split into 10-12.
> 
> v9: this version fixes a reference count problem with job->bs,
> in patch 05.
> 
> v8: Rebase on to master.
> Minor fixes addressing John Snow's comments.
> 
> v7: Add Eric's rev-by in 1, 11.
> Add Max's rev-by in 4, 5, 9, 10, 11.
> Add John's rev-by in 5, 6, 8.
> Fix wording for 6. [John]
> Fix comment of block_job_txn_add_job() in 9. [Max]
> Remove superfluous hunks, and document default value in 11. [Eric]
> Update Makefile dep in 14. [Max]
> 
> 
> 
> For convenience, this branch is available at:
> https://github.com/jnsnow/qemu.git branch block-transpop
> https://github.com/jnsnow/qemu/tree/block-transpop
> 
> This version is tagged block-transpop-v11:
> https://github.com/jnsnow/qemu/releases/tag/block-transpop-v11
> 
> Fam Zheng (6):
>   backup: Extract dirty bitmap handling as a separate function
>   blockjob: Introduce reference count and fix reference to job->bs
>   blockjob: Add .commit and .abort block job actions
>   blockjob: Add "completed" and "ret" in BlockJob
>   blockjob: Simplify block_job_finish_sync
>   block: Add block job transactions
> 
> John Snow (7):
>   qapi: Add transaction support to block-dirty-bitmap operations
>   iotests: add transactional incremental backup test
>   block: rename BlkTransactionState and BdrvActionOps
>   block/backup: Rely on commit/abort for cleanup
>   block: Add BlockJobTxn support to backup_run
>   block: add transactional properties
>   iotests: 124 - transactional failure test
> 
> Stefan Hajnoczi (1):
>   tests: add BlockJobTxn unit test
> 
>  block.c|  19 +-
>  block/backup.c |  50 --
>  block/mirror.c |   2 +-
>  blockdev.c | 432 
> ++---
>  blockjob.c | 189 
>  docs/bitmaps.md|   6 +-
>  include/block/block.h  |   2 +-
>  include/block/block_int.h  |   6 +-
>  include/block/blockjob.h   |  85 -
>  qapi-schema.json   |  56 +-
>  qemu-img.c |   3 -
>  qmp-commands.hx|   2 +-
>  tests/Makefile |   3 +
>  tests/qemu-iotests/124 | 182 ++-
>  tests/qemu-iotests/124.out |   4 +-
>  tests/test-blockjob-txn.c  | 250

[Qemu-devel] [PULL 05/15] trace: fix documentation

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Mention the ftrace backend too.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Acked-by: Christian Borntraeger 
Message-id: 1446151457-21157-2-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 qemu-options.hx | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 0eea4ee..a8fe78e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3489,13 +3489,13 @@ Specify tracing options.
 @table @option
 @item events=@var{file}
 Immediately enable events listed in @var{file}.
-The file must contain one event name (as listed in the @var{trace-events} file)
-per line.
-This option is only available if QEMU has been compiled with
-either @var{simple} or @var{stderr} tracing backend.
+The file must contain one event name (as listed in the @file{trace-events} 
file)
+per line; globbing patterns are accepted too.  This option is only
+available if QEMU has been compiled with the @var{simple}, @var{stderr} or
+@var{ftrace} tracing backend.
+
 @item file=@var{file}
 Log output traces to @var{file}.
-
 This option is only available if QEMU has been compiled with
 the @var{simple} tracing backend.
 @end table
-- 
2.5.0

[Qemu-devel] [PULL 12/15] log: move qemu-log.c into util/ directory

2015-11-10 Thread Stefan Hajnoczi

From: "Denis V. Lunev" 

log will become common facility with tracepoints support in next step.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Paolo Bonzini 
Message-id: 1446151457-21157-9-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 Makefile.objs  |   1 -
 qemu-log.c | 177 -
 util/Makefile.objs |   1 +
 util/log.c | 177 +
 4 files changed, 178 insertions(+), 178 deletions(-)
 delete mode 100644 qemu-log.c
 create mode 100644 util/log.c

diff --git a/Makefile.objs b/Makefile.objs
index 77be052..61d2798 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -83,7 +83,6 @@ endif
 
 ###
 # Target-independent parts used in system and user emulation
-common-obj-y += qemu-log.o
 common-obj-y += tcg-runtime.o
 common-obj-y += hw/
 common-obj-y += qom/
diff --git a/qemu-log.c b/qemu-log.c
deleted file mode 100644
index 7cb01a8..000
--- a/qemu-log.c
+++ /dev/null
@@ -1,177 +0,0 @@
-/*
- * Logging support
- *
- *  Copyright (c) 2003 Fabrice Bellard
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; if not, see .
- */
-
-#include "qemu-common.h"
-#include "qemu/log.h"
-
-static char *logfilename;
-FILE *qemu_logfile;
-int qemu_loglevel;
-static int log_append = 0;
-
-void qemu_log(const char *fmt, ...)
-{
-va_list ap;
-
-va_start(ap, fmt);
-if (qemu_logfile) {
-vfprintf(qemu_logfile, fmt, ap);
-}
-va_end(ap);
-}
-
-void qemu_log_mask(int mask, const char *fmt, ...)
-{
-va_list ap;
-
-va_start(ap, fmt);
-if ((qemu_loglevel & mask) && qemu_logfile) {
-vfprintf(qemu_logfile, fmt, ap);
-}
-va_end(ap);
-}
-
-/* enable or disable low levels log */
-void do_qemu_set_log(int log_flags, bool use_own_buffers)
-{
-qemu_loglevel = log_flags;
-if (qemu_loglevel && !qemu_logfile) {
-if (logfilename) {
-qemu_logfile = fopen(logfilename, log_append ? "a" : "w");
-if (!qemu_logfile) {
-perror(logfilename);
-_exit(1);
-}
-} else {
-/* Default to stderr if no log file specified */
-qemu_logfile = stderr;
-}
-/* must avoid mmap() usage of glibc by setting a buffer "by hand" */
-if (use_own_buffers) {
-static char logfile_buf[4096];
-
-setvbuf(qemu_logfile, logfile_buf, _IOLBF, sizeof(logfile_buf));
-} else {
-#if defined(_WIN32)
-/* Win32 doesn't support line-buffering, so use unbuffered output. 
*/
-setvbuf(qemu_logfile, NULL, _IONBF, 0);
-#else
-setvbuf(qemu_logfile, NULL, _IOLBF, 0);
-#endif
-log_append = 1;
-}
-}
-if (!qemu_loglevel && qemu_logfile) {
-qemu_log_close();
-}
-}
-
-void qemu_set_log_filename(const char *filename)
-{
-g_free(logfilename);
-logfilename = g_strdup(filename);
-qemu_log_close();
-qemu_set_log(qemu_loglevel);
-}
-
-const QEMULogItem qemu_log_items[] = {
-{ CPU_LOG_TB_OUT_ASM, "out_asm",
-  "show generated host assembly code for each compiled TB" },
-{ CPU_LOG_TB_IN_ASM, "in_asm",
-  "show target assembly code for each compiled TB" },
-{ CPU_LOG_TB_OP, "op",
-  "show micro ops for each compiled TB" },
-{ CPU_LOG_TB_OP_OPT, "op_opt",
-  "show micro ops (x86 only: before eflags optimization) and\n"
-  "after liveness analysis" },
-{ CPU_LOG_INT, "int",
-  "show interrupts/exceptions in short format" },
-{ CPU_LOG_EXEC, "exec",
-  "show trace before each executed TB (lots of logs)" },
-{ CPU_LOG_TB_CPU, "cpu",
-  "show CPU state before block translation" },
-{ CPU_LOG_MMU, "mmu",
-  "log MMU-related activities" },
-{ CPU_LOG_PCALL, "pcall",
-  "x86 only: show protected mode far calls/returns/exceptions" },
-{ CPU_LOG_RESET, "cpu_reset",
-  "show CPU state before CPU resets" },
-{ LOG_UNIMP, "unimp",
-  "log unimplemented functionality" },
-{ LOG_GUEST_ERROR, "guest_errors",
-  "log when the guest OS does something invalid (eg accessing a\n"
-  "non-existent register)" },
-{ CPU_LOG_TB_NOCHAIN, "nochain",
-  "do not chain compiled TBs so that \"exe

Re: [Qemu-devel] [PULL 00/15] Tracing patches

2015-11-10 Thread Peter Maydell

On 10 November 2015 at 13:31, Stefan Hajnoczi  wrote:
> The following changes since commit a8b4f9585a0bf5186fca793ce2c5d754cd8ec49a:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2015-11-10' 
> into staging (2015-11-10 09:39:24 +)
>
> are available in the git repository at:
>
>   git://github.com/stefanha/qemu.git tags/tracing-pull-request
>
> for you to fetch changes up to bd0e34e715bcc784fe732945d011cb36645d7f12:
>
>   log: add "-d trace:PATTERN" (2015-11-10 13:23:09 +)
>
> 
>
> 

Fails to build on all platforms :-(

HEAD is now at b28cb9f... Merge remote-tracking branch
'remotes/stefanha/tags/tracing-pull-request' into staging
config-host.mak is out-of-date, running configure
  GEN   qemu-options.def
  GEN   qmp-commands.h
  GEN   qapi-types.h
  GEN   qapi-visit.h
  GEN   qapi-event.h
  GEN   qmp-introspect.h
  GEN   trace/generated-events.h
  GEN   trace/generated-tracers.h
  GEN   trace/generated-tcg-tracers.h
  GEN   trace/generated-helpers-wrappers.h
  GEN   trace/generated-helpers.h

ERROR: invalid trace backends
   Please choose supported trace backends.

make: *** [config-host.mak] Error 1

thanks
-- PMM

[Qemu-devel] [PULL 14/15] trace: switch default backend to "log"

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

This enables integration with other QEMU logging facilities.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Acked-by: Christian Borntraeger 
Message-id: 1446151457-21157-11-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 6bd4edf..e73d94b 100755
--- a/configure
+++ b/configure
@@ -302,7 +302,7 @@ pkgversion=""
 pie=""
 zero_malloc=""
 qom_cast_debug="yes"
-trace_backends="nop"
+trace_backends="log"
 trace_file="trace"
 spice=""
 rbd=""
-- 
2.5.0

[Qemu-devel] [PULL 08/15] trace: no need to call trace_backend_init in different branches now

2015-11-10 Thread Stefan Hajnoczi

From: "Denis V. Lunev" 

original idea to split calling locations was to spawn tracing thread
in the final child process according to

commit 8a745f2a9296ad2cf6bda33534ed298f2625a4ad
Author: Michael Mueller
Date:   Mon Sep 23 16:36:54 2013 +0200

os_daemonize is now on top of both locations. Drop unneeded ifs.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Paolo Bonzini 
Message-id: 1446151457-21157-5-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 vl.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/vl.c b/vl.c
index b567ed9..a9c3449 100644
--- a/vl.c
+++ b/vl.c
@@ -4113,10 +4113,8 @@ int main(int argc, char **argv, char **envp)
 qemu_set_log(mask);
 }
 
-if (!is_daemonized()) {
-if (!trace_init_backends()) {
-exit(1);
-}
+if (!trace_init_backends()) {
+exit(1);
 }
 
 /* If no data_dir is specified then try to find it relative to the
@@ -4676,12 +4674,6 @@ int main(int argc, char **argv, char **envp)
 
 os_setup_post();
 
-if (is_daemonized()) {
-if (!trace_init_backends()) {
-exit(1);
-}
-}
-
 main_loop();
 replay_disable_events();
 
-- 
2.5.0

[Qemu-devel] [PULL 13/15] trace: convert stderr backend to log

2015-11-10 Thread Stefan Hajnoczi

From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
Signed-off-by: Denis V. Lunev 
Acked-by: Christian Borntraeger 
Message-id: 1446151457-21157-10-git-send-email-...@openvz.org
Signed-off-by: Stefan Hajnoczi 
---
 configure   |  4 ++--
 include/qemu/log.h  |  1 +
 scripts/tracetool/backend/stderr.py | 47 -
 trace/control.c | 10 
 util/log.c  |  3 +++
 vl.c|  2 ++
 6 files changed, 18 insertions(+), 49 deletions(-)
 delete mode 100644 scripts/tracetool/backend/stderr.py

diff --git a/configure b/configure
index 46fd8bd..6bd4edf 100755
--- a/configure
+++ b/configure
@@ -5309,8 +5309,8 @@ if have_backend "simple"; then
   # Set the appropriate trace file.
   trace_file="\"$trace_file-\" FMT_pid"
 fi
-if have_backend "stderr"; then
-  echo "CONFIG_TRACE_STDERR=y" >> $config_host_mak
+if have_backend "log"; then
+  echo "CONFIG_TRACE_LOG=y" >> $config_host_mak
 fi
 if have_backend "ust"; then
   echo "CONFIG_TRACE_UST=y" >> $config_host_mak
diff --git a/include/qemu/log.h b/include/qemu/log.h
index 0fcdba9..fdcfab0 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -37,6 +37,7 @@ static inline bool qemu_log_enabled(void)
 #define LOG_GUEST_ERROR(1 << 11)
 #define CPU_LOG_MMU(1 << 12)
 #define CPU_LOG_TB_NOCHAIN (1 << 13)
+#define LOG_TRACE  (1 << 14)
 
 /* Returns true if a bit is set in the current loglevel mask
  */
diff --git a/scripts/tracetool/backend/stderr.py 
b/scripts/tracetool/backend/stderr.py
deleted file mode 100644
index ca58054..000
--- a/scripts/tracetool/backend/stderr.py
+++ /dev/null
@@ -1,47 +0,0 @@
-#!/usr/bin/env python
-# -*- coding: utf-8 -*-
-
-"""
-Stderr built-in backend.
-"""
-
-__author__ = "Lluís Vilanova "
-__copyright__  = "Copyright 2012-2014, Lluís Vilanova "
-__license__= "GPL version 2 or (at your option) any later version"
-
-__maintainer__ = "Stefan Hajnoczi"
-__email__  = "stefa...@linux.vnet.ibm.com"
-
-
-from tracetool import out
-
-
-PUBLIC = True
-
-
-def generate_h_begin(events):
-out('#include ',
-'#include ',
-'#include ',
-'#include ',
-'#include "trace/control.h"',
-'')
-
-
-def generate_h(event):
-argnames = ", ".join(event.args.names())
-if len(event.args) > 0:
-argnames = ", " + argnames
-
-out('if (trace_event_get_state(%(event_id)s)) {',
-'struct timeval _now;',
-'gettimeofday(&_now, NULL);',
-'fprintf(stderr, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
-'getpid(),',
-'(size_t)_now.tv_sec, (size_t)_now.tv_usec',
-'%(argnames)s);',
-'}',
-event_id="TRACE_" + event.name.upper(),
-name=event.name,
-fmt=event.fmt.rstrip("\n"),
-argnames=argnames)
diff --git a/trace/control.c b/trace/control.c
index bef7884..84ea840 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -14,6 +14,9 @@
 #ifdef CONFIG_TRACE_FTRACE
 #include "trace/ftrace.h"
 #endif
+#ifdef CONFIG_TRACE_LOG
+#include "qemu/log.h"
+#endif
 #include "qemu/error-report.h"
 
 int trace_events_enabled_count;
@@ -174,6 +177,13 @@ void trace_init_file(const char *file)
 {
 #ifdef CONFIG_TRACE_SIMPLE
 st_set_trace_file(file);
+#elif defined CONFIG_TRACE_LOG
+/* If both the simple and the log backends are enabled, "-trace file"
+ * only applies to the simple backend; use "-D" for the log backend.
+ */
+if (file) {
+qemu_set_log_filename(file);
+}
 #else
 if (file) {
 fprintf(stderr, "error: -trace file=...: "
diff --git a/util/log.c b/util/log.c
index 7cb01a8..5cc71eb 100644
--- a/util/log.c
+++ b/util/log.c
@@ -51,6 +51,9 @@ void qemu_log_mask(int mask, const char *fmt, ...)
 void do_qemu_set_log(int log_flags, bool use_own_buffers)
 {
 qemu_loglevel = log_flags;
+#ifdef CONFIG_TRACE_LOG
+qemu_loglevel |= LOG_TRACE;
+#endif
 if (qemu_loglevel && !qemu_logfile) {
 if (logfilename) {
 qemu_logfile = fopen(logfilename, log_append ? "a" : "w");
diff --git a/vl.c b/vl.c
index e391e1d..8b602ea 100644
--- a/vl.c
+++ b/vl.c
@@ -4118,6 +4118,8 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 qemu_set_log(mask);
+} else {
+qemu_set_log(0);
 }
 
 if (!trace_init_backends()) {
-- 
2.5.0

1 2 3 4 5 6 >

1 - 100 of 520 matches

Mail list logo