Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves

2013-07-18 Thread Paolo Bonzini
Il 19/07/2013 08:38, Alex Bligh ha scritto:
> 
> However, I still don't quite see how the poll in the mainloop is
> meant to exit when a timer expires. There's now no qemu_notify_event,
> and no SIGALRM, and the timeout will still be infinite (unless I
> calculate the timeout as the minimum across all clocks, in which
> case I might as well do (b) above).

Yes, that was the idea.

Paolo



Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value

2013-07-18 Thread Peter Lieven

On 16.07.2013 18:29, Paolo Bonzini wrote:

Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-7
are left for future extensions.

The return code is compatible with the old is_allocated API: returning
just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change
in clients of is_allocated.  We will return more precise information
in the next patches.

Signed-off-by: Paolo Bonzini 
---
 v1->v2: improved comment

  block.c   |  7 +--
  include/block/block.h | 26 ++
  2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 6e7a8a3..7ff0716 100644
--- a/block.c
+++ b/block.c
@@ -2990,7 +2990,7 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
  
  if (!bs->drv->bdrv_co_get_block_status) {

  *pnum = nb_sectors;
-return 1;
+return BDRV_BLOCK_DATA;
  }
  
  return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);

@@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, 
int64_t sector_num,
  int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, int *pnum)
  {
-return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
+int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
+return
+(ret & BDRV_BLOCK_DATA) ||
+((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));


i do also not understand the "((ret & BDRV_BLOCK_ZERO) && 
!bdrv_has_zero_init(bs))";
if a block is unallocated and reads as zero, but the device lacks zero init, it
is declared as allocated with this, isn't it?

for iscsi and host_device with lbprz==1 or discardzeroes respectively all
blocks would return as allocated. is this wanted?

Peter



Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves

2013-07-18 Thread Alex Bligh

Stefan,

--On 19 July 2013 09:58:50 +0800 Stefan Hajnoczi  
wrote:



  Options:

  a) restore alarm timers (at least for the time being). Make all
 alarm timers do qemu_notify_event. However, only run the AioContext
 clock's timers within aio_poll. This is the least intrusive change.

  b) calculate the timeout in aio_poll with respect to the minimum
 deadline across all clocks, not just AioContext's clock. Use the
 same logic in mainloop.

  I'd go for (b), except for the millisecond accuracy thing. So my
  temptation (sadly) is (a).


I think this is a non-issue because host_alarm_handler() can only be
called from the main loop:

main-loop.c:qemu_signal_init() sets up signalfd to monitor SIGALRM.
Therefore we do not asynchronously invoke the SIGALRM signals handler.
It is only invoked from main-loop.c:sigfd_handler() when the main loop
runs.

That's how I read the code.  I haven't checked a running process to be
sure.


Not sure that's the case on win32, but OK let's go with that.

However, I still don't quite see how the poll in the mainloop is
meant to exit when a timer expires. There's now no qemu_notify_event,
and no SIGALRM, and the timeout will still be infinite (unless I
calculate the timeout as the minimum across all clocks, in which
case I might as well do (b) above).

--
Alex Bligh



Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value

2013-07-18 Thread Peter Lieven

On 16.07.2013 18:29, Paolo Bonzini wrote:

Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-7
are left for future extensions.

The return code is compatible with the old is_allocated API: returning
just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change
in clients of is_allocated.  We will return more precise information
in the next patches.

Signed-off-by: Paolo Bonzini 
---
 v1->v2: improved comment

  block.c   |  7 +--
  include/block/block.h | 26 ++
  2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 6e7a8a3..7ff0716 100644
--- a/block.c
+++ b/block.c
@@ -2990,7 +2990,7 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
  
  if (!bs->drv->bdrv_co_get_block_status) {

  *pnum = nb_sectors;
-return 1;
+return BDRV_BLOCK_DATA;
  }
  
  return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);

@@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, 
int64_t sector_num,
  int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
 int nb_sectors, int *pnum)
  {
-return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
+int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);

just stumbled about a question i have:

isn't here a

if (ret < 0) {
 *pnum = 0;
 return 0;
}

missing?

I was told that this is the expected return behaviour in the
error case of the old API.

Peter


+return
+(ret & BDRV_BLOCK_DATA) ||
+((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
  }
  
  /*

diff --git a/include/block/block.h b/include/block/block.h
index fbc0822..83c7112 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -81,6 +81,32 @@ typedef struct BlockDevOps {
  #define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)
  #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
  
+/* BDRV_BLOCK_DATA: data is read from bs->file or another file

+ * BDRV_BLOCK_ZERO: sectors read as zero
+ * BDRV_BLOCK_OFFSET_VALID: sector stored in bs->file as raw data
+ *
+ * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
+ * bs->file where sector data can be read from as raw data.
+ *
+ * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
+ *
+ * DATA ZERO OFFSET_VALID
+ *  ttt   sectors read as zero, bs->file is zero at offset
+ *  tft   sectors read as valid from bs->file at offset
+ *  ftt   sectors preallocated, read as zero, bs->file not
+ *necessarily zero at offset
+ *  fft   sectors preallocated but read from backing_hd,
+ *bs->file contains garbage at offset
+ *  ttf   sectors preallocated, read as zero, unknown offset
+ *  tff   sectors read from unknown file or offset
+ *  ftf   not allocated or unknown offset, read as zero
+ *  fff   not allocated or unknown offset, read from backing_hd
+ */
+#define BDRV_BLOCK_DATA 1
+#define BDRV_BLOCK_ZERO 2
+#define BDRV_BLOCK_OFFSET_VALID 4
+#define BDRV_BLOCK_OFFSET_MASK  BDRV_SECTOR_MASK
+
  typedef enum {
  BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP
  } BlockErrorAction;



--

Mit freundlichen Grüßen

Peter Lieven

...

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

...




Re: [Qemu-devel] [PATCH 0/4] export internal snapshot by qemu-nbd

2013-07-18 Thread Wenchao Xia

于 2013-7-17 22:23, Eric Blake 写道:

On 07/17/2013 08:03 AM, Wenchao Xia wrote:

This series allow user to read internal snapshot's contents without qemu-img
convert. Another purpose is that, when qemu is online and have taken an
internal snapshot, let user invoke qemu-nbd to do any thing on it except write.

This brings two interesting issues:
1 is it safe to let qemu-nbd and qemu access that file at same time?


Probably not, for the same reason we tell people to not use qemu-img
while qemu is active on a file.


  For external case, or backing chain, qemu-nbd export while qemu is
active, do you think it is OK?

base->imageA

qemu-nbd export base
qemu use imageA.



I think it is safe, since qemu-nbd is read only. The data will be correct from
qemu-nbd, if qemu does not delete that snapshot when qemu-nbd is running, and
data is flushed to storage after qemu take that snapshot so that qemu-nbd
would see the correct data.


You're making assumptions that qemu won't be touching any metadata in a
manner in which the read-only qemu-nbd could get confused; I'm not sure
we are ready to make that guarantee.  I think the export has to be from
the running qemu process itself, rather than from a second process.



2 should an nbd-server exporting internal snapshot be added in qemu?
I think no. Compared with driver-backup, the snapshot, or COW happens
in storage level, so it allows another program to read it itself. Actually
it should be OK to let another server other than qemu's host, do the
export I/O job, if data is flushed.


Unfortunately, I disagree, and think the answer to this question is yes,
we need to do the export from within the single qemu process, if we want
to guarantee safety.



Next step:
As demonstrated before, an explict API should be added, which make sure
all I/O request is flushed and sent to underlining storage, and cache
is sync if it is writeback type. So at qemu level, we can make sure
no request is left behind, before qemu-nbd start. That API should
also benefit 3rd party block snapshot solution, such as LVM2.

More:
With this patch and previous qcow2 snapshot at block device level, I think
export/import/backup solution around qcow2, is nearly complete at qemu's
level. It can do similar things as backing chain but with better performance,
Some small optimization place are left:

1 compare two snapshot's data to get the diff with help of qcow2's L1/L2 table.
2 convertion between external snapshot and internal snapshot.

This series need following series applied first:
[PATCH V5 0/8] add internal snapshot support at block device level
http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg01831.html

Wenchao Xia (4):
   1 snapshot: distinguish id and name in load_tmp
   2 qemu-nbd: support internal snapshot export
   3 qemu-nbd: add doc for internal snapshot export
   4 qemu-iotests: add 057 internal snapshot export with qemu-nbd case

  block/qcow2-snapshot.c |   16 +++-
  block/qcow2.h  |5 ++-
  block/snapshot.c   |   37 ++-
  include/block/block_int.h  |4 ++-
  include/block/snapshot.h   |4 ++-
  qemu-img.c |7 +++-
  qemu-nbd.c |   56 -
  qemu-nbd.texi  |3 ++
  tests/qemu-iotests/057 |   87 
  tests/qemu-iotests/057.out |   26 +
  tests/qemu-iotests/group   |1 +
  11 files changed, 237 insertions(+), 9 deletions(-)
  create mode 100755 tests/qemu-iotests/057
  create mode 100644 tests/qemu-iotests/057.out










--
Best Regards

Wenchao Xia




Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves

2013-07-18 Thread Paolo Bonzini
Il 19/07/2013 03:58, Stefan Hajnoczi ha scritto:
>>   Options:
>>
>>   a) restore alarm timers (at least for the time being). Make all
>>  alarm timers do qemu_notify_event. However, only run the AioContext
>>  clock's timers within aio_poll. This is the least intrusive change.
>>
>>   b) calculate the timeout in aio_poll with respect to the minimum
>>  deadline across all clocks, not just AioContext's clock. Use the
>>  same logic in mainloop.
>>
>>   I'd go for (b), except for the millisecond accuracy thing. So my
>>   temptation (sadly) is (a).
> 
> I think this is a non-issue because host_alarm_handler() can only be
> called from the main loop:
> 
> main-loop.c:qemu_signal_init() sets up signalfd to monitor SIGALRM.
> Therefore we do not asynchronously invoke the SIGALRM signals handler.
> It is only invoked from main-loop.c:sigfd_handler() when the main loop
> runs.
> 
> That's how I read the code.  I haven't checked a running process to be
> sure.

I think you're right.  It was not strictly needed even with alarm
timers, because host_alarm_handler() is called always before
qemu_run_all_timers.  But it made the code more robust.

With your change to delete alarm timers and move the deadline to poll,
the next poll call will have a timeout of 0 and all will be well.

As to millisecond accuracy, as discussed earlier we can use ppoll on
Linux.  This of course should be introduced before alarm timers are
deleted, to avoid breaking bisection.

Paolo

>> 2. If we do delete alarm timers, I'll need to delete the -clock option.
> 
> I noticed this too because I think we should stub it out for
> compatibility.  Accept existing options but ignore them, update
> documentation to state that they are kept for compatibility.
> 
> Stefan
> 




Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert

2013-07-18 Thread Peter Lieven

On 19.07.2013 07:58, Paolo Bonzini wrote:

Il 18/07/2013 21:28, Peter Lieven ha scritto:

thanks for the details. I think to have optimal performance and best
change for unmapping in qemu-img convert
it might be best to export the OPTIMAL UNMAP GRANULARITY

Agreed about this.


as well as the write_zeroes_w_discard capability via the BDI

But why this?!?  It is _not_ needed.  All you need is to change the
default of the "-S" option to be the OPTIMAL UNMAP GRANULARITY if it is
nonzero.

2 reasons:
a) Does this guarantee that the requests are aligned to multiple of the -S 
value?

b) If I this flag exists qemu-img convert can do the "zeroing" a priori. This
has the benefit that combined with bdrv_get_block_status requests before it 
might
not need to touch large areas of the volume. Speaking for iSCSI its likely that
the user sets a fresh volume as the destination, but its not guaranteed.
With the Patch 4 there are only done a few get_block_status requests to verify
this. If we just write zeroes with BDRV_MAY_UNMAP, we send hundreds or
thousands of writesame requests for possibly already unmapped data.

To give an example. If I take my storage with an 1TB volume. It takes about 
10-12
get_block_status requests to verify that it is completely unmapped. After this
I am safe to set has_zero_init = 1 in qemu-img convert.

If I would convert a 1TB image to this target where lets say 50% are at leat 
15MB
zero blocks (15MB is the OUG of my storage) it would take ~35000 write same
requests to achieve the same.

Peter


Paolo


and than zero
out the whole device with bdrv_write_zeroes and the BDRV_MAY_DISCARD
flag.

the logic for this is already prepared in patch4 (topic of this email). except 
that
i have to exchange bdrv_discard with bdrv_write_zeroes w/ BDRV_MAY_DISCARD.

what do you think?





On Thu, Jul 18, 2013 at 11:43 AM, Peter Lieven  wrote:

Am 18.07.2013 um 16:35 schrieb Paolo Bonzini :


Il 18/07/2013 16:32, Peter Lieven ha scritto:

(Mis)alignment and granularity can be handled later.  We can ignore them
for now.  Later, if we decide the best way to support them is a flag,
we'll add it.  Let's not put the cart before the horse.

BTW, I expect alignment!=0 to be really, really rare.

To explain my concerns:

I know that my target has internal page size of 15MB. I will check what
happens
if I deallocate this 15MB in chunks of lets say 1MB. If the page gets
unprovisioned
after the last chunk is unmapped it would be fine :-)

You're talking of granularity here, not (mis)alignment.

you are right. for the target i am talking about this is 30720 512-byte blocks 
for the granularity (pagesize) and 0 for the alignment.
i will see what happens if I write same w/unmap the whole 30720 blocks in 
smaller blocks ;-) otherwise i will have to add support for honoring this 
values in qemu-img convert
as a follow up.

Peter







--

Mit freundlichen Grüßen

Peter Lieven

...

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

...




[Qemu-devel] [PATCH] migration: don't use uninitialized variables

2013-07-18 Thread Pawit Pornkitprasan
The qmp_migrate method uses the 'blk' and 'inc' parameter without
checking if they're valid or not (they may be uninitialized if
command is received via QMP)

Signed-off-by: Pawit Pornkitprasan 
---
 migration.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration.c b/migration.c
index 9f5a423..f3d1ff7 100644
--- a/migration.c
+++ b/migration.c
@@ -385,8 +385,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 MigrationParams params;
 const char *p;
 
-params.blk = blk;
-params.shared = inc;
+params.blk = has_blk && blk;
+params.shared = has_inc && inc;
 
 if (s->state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
-- 
1.8.1.2




[Qemu-devel] [PATCH] migration: send total time in QMP at "completed" stage

2013-07-18 Thread Pawit Pornkitprasan
The "completed" stage sets total_time but not has_total_time and
thus it is not sent via QMP reply (but sent via HMP nevertheless)

Signed-off-by: Pawit Pornkitprasan 
---
 migration.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration.c b/migration.c
index 9f5a423..4c16f2e 100644
--- a/migration.c
+++ b/migration.c
@@ -219,6 +219,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 info->has_status = true;
 info->status = g_strdup("completed");
+info->has_total_time = true;
 info->total_time = s->total_time;
 info->has_downtime = true;
 info->downtime = s->downtime;
-- 
1.8.1.2




Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert

2013-07-18 Thread Paolo Bonzini
Il 18/07/2013 21:28, Peter Lieven ha scritto:
> thanks for the details. I think to have optimal performance and best
> change for unmapping in qemu-img convert
> it might be best to export the OPTIMAL UNMAP GRANULARITY

Agreed about this.

> as well as the write_zeroes_w_discard capability via the BDI

But why this?!?  It is _not_ needed.  All you need is to change the
default of the "-S" option to be the OPTIMAL UNMAP GRANULARITY if it is
nonzero.

Paolo

> and than zero
> out the whole device with bdrv_write_zeroes and the BDRV_MAY_DISCARD
> flag.
> 
> the logic for this is already prepared in patch4 (topic of this email). 
> except that
> i have to exchange bdrv_discard with bdrv_write_zeroes w/ BDRV_MAY_DISCARD.
> 
> what do you think?
> 
> 
> 
>>
>>
>> On Thu, Jul 18, 2013 at 11:43 AM, Peter Lieven  wrote:
>>>
>>> Am 18.07.2013 um 16:35 schrieb Paolo Bonzini :
>>>
 Il 18/07/2013 16:32, Peter Lieven ha scritto:
>>>
>> (Mis)alignment and granularity can be handled later.  We can ignore them
>> for now.  Later, if we decide the best way to support them is a flag,
>> we'll add it.  Let's not put the cart before the horse.
>>
>> BTW, I expect alignment!=0 to be really, really rare.
> To explain my concerns:
>
> I know that my target has internal page size of 15MB. I will check what
> happens
> if I deallocate this 15MB in chunks of lets say 1MB. If the page gets
> unprovisioned
> after the last chunk is unmapped it would be fine :-)

 You're talking of granularity here, not (mis)alignment.
>>>
>>> you are right. for the target i am talking about this is 30720 512-byte 
>>> blocks for the granularity (pagesize) and 0 for the alignment.
>>> i will see what happens if I write same w/unmap the whole 30720 blocks in 
>>> smaller blocks ;-) otherwise i will have to add support for honoring this 
>>> values in qemu-img convert
>>> as a follow up.
>>>
>>> Peter
>>>
> 
> 
> 




Re: [Qemu-devel] [PATCH 12/17] qemu-img: add a "map" subcommand

2013-07-18 Thread Paolo Bonzini
Il 19/07/2013 06:48, Stefan Hajnoczi ha scritto:
> On Thu, Jul 18, 2013 at 02:04:31PM -0600, Eric Blake wrote:
>> On 07/03/2013 08:34 AM, Paolo Bonzini wrote:
>>> This command dumps the metadata of an entire chain, in either tabular or 
>>> JSON
>>> format.
>>>
>>> Signed-off-by: Paolo Bonzini 
>>> ---
>>>  qemu-img-cmds.hx |   6 ++
>>>  qemu-img.c   | 186 
>>> +++
>>>  2 files changed, 192 insertions(+)
>>>
>>
>>> +case OFORMAT_JSON:
>>> +printf("%s{ 'depth': %d, 'start': %lld, 'length': %lld, "
>>> +   "'zero': %s, 'data': %s",
>>> +   (e->start == 0 ? "[" : ",\n"),
>>> +   e->depth, (long long) e->start, (long long) e->length,
>>> +   (e->flags & BDRV_BLOCK_ZERO) ? "true" : "false",
>>> +   (e->flags & BDRV_BLOCK_DATA) ? "true" : "false");
>>> +if (e->flags & BDRV_BLOCK_OFFSET_VALID) {
>>> +printf(", 'offset': %lld", (long long) e->offset);
>>> +}
>>> +   putchar('}');
>>
>> Can we please get this format documented in qapi-schema.json, even if we
>> aren't using qapi to generate it yet?
> 
> Paolo: Please send a follow-up patch documenting the json schema, I've
> already merged this series.
> 
> I was although thinking about qemu-iotests for qemu-img map, but it's
> tricky since the allocation is an internal detail of the image format.
> Perhaps a test case using raw?

If you have interesting cases with sparse images, even raw depends on
how the underlying file system splits extents.

I can do a simple test that ignores the 'offset' field.

Paolo



Re: [Qemu-devel] [PATCH V5 0/8] add internal snapshot support at block device level

2013-07-18 Thread Stefan Hajnoczi
On Thu, Jul 18, 2013 at 02:34:52PM +0200, Kevin Wolf wrote:
> Am 11.07.2013 um 07:46 hat Wenchao Xia geschrieben:
> >   This series brings internal snapshot support at block devices level, now 
> > we
> > have two three methods to do block snapshot lively: 1) backing chain,
> > 2) internal one and 3) drive-back up approach.
> > 
> > Comparation:
> >  Advantages:Disadvantages:
> > 1)delta data, taken fast, export, sizeperformance, delete slow.
> > 2)  taken fast, delete fast, performance, size   delta data, format
> > 3)  performance, export, format   taken slow, delta data, 
> > size
> > 
> >   I think in most case, saving vmstate in an standalone file is better than
> > saving it inside qcow2, So suggest treat internal snapshot as block level
> > methods and not encourage user to savevm in qcow2 any more.
> > 
> > Implemention details:
> >   To avoid trouble, this serial have hide ID in create interfaces, this make
> > sure no chaos of ID and name will be introduced by these interfaces.
> >   There is one patch may be common to Pavel's savvm transaction, patch 1/11,
> > others are not quite related. Patch 1/11 will not set errp when no snapshot
> > find, since patch 3/11 need to distinguish real error case.
> > 
> > Next steps to better full VM snapshot:
> >   Improve internal snapshot's export capability.
> >   Better vmstate saving.
> > 
> >   Thanks Kevin to give advisement about how add it in qmp_transaction, 
> > oldest
> > version comes drom Dietmar Maurer.
> > 
> > v3:
> >   General:
> >   Rebased after Stenfan's driver-backup patch V6.
> > 
> >   Address Eric's comments:
> >   4/9: grammar fix and better doc.
> >   5/9: parameter name is mandatory now. grammar fix.
> >   6/9: redesiged interface: take both id and name as optional parameter, 
> > return
> > the deleted snapshot's info.
> > 
> >   Address Stefan's comments:
> >   4/9: add '' around %s in message. drop code comments about vm_clock.
> >   9/9: better doc, refined the code and add more test case.
> > 
> > v4:
> >   Address Stefan's comments:
> >   4/9: use error_setg_errno() to show error reason for 
> > bdrv_snapshot_create(),
> > spell fix and better doc.
> >   5/9: better doc.
> >   6/9: remove spurious ';' in code, spell fix and better doc.
> > 
> > v5:
> >   Address Kevin's comments:
> >   3/8, 4/8, 8/8: remove the limit of numeric snapshot name.
> >   General change:
> >   4/8: use existing type as parameter in qapi schema.
> > 
> > Wenchao Xia (8):
> >   1 snapshot: new function bdrv_snapshot_find_by_id_and_name()
> >   2 snapshot: distinguish id and name in snapshot delete
> >   3 qmp: add internal snapshot support in qmp_transaction
> >   4 qmp: add interface blockdev-snapshot-internal-sync
> >   5 qmp: add interface blockdev-snapshot-delete-internal-sync
> >   6 hmp: add interface hmp_snapshot_blkdev_internal
> >   7 hmp: add interface hmp_snapshot_delete_blkdev_internal
> >   8 qemu-iotests: add 056 internal snapshot for block device test case
> 
> Okay, reviewed the whole series. I had some comments on two or three
> patches.
> 
> One thing that I want to add is that in order to provide transactional
> behaviour (i.e. snapshotting multiple device in one transaction should
> result in a consistent snapshot), I think we may in fact need to stop
> the VM while taking the snapshots.

If we are doing guest memory snapshots we definitely need to stop vcpus.

For disk-only snapshots I do not think it's necessary to stop the guest
because:

1. Guest I/O can only be processed from the QEMU global mutex.  But we
   currently hold it so no one else can make progress.

2. In the dataplane case the drain callback that I added stops the
   dataplane thread temporarily.  It cannot be restarted until the main
   loop iterates again (see #1).

Therefore I think stopping vcpus is not necessary for disk-only
snapshots.

Stefan



Re: [Qemu-devel] RFC [PATCH] Make bdrv_flush synchronous only and update callers

2013-07-18 Thread Stefan Hajnoczi
On Thu, Jul 18, 2013 at 11:21:42PM +0200, Charlie Shepherd wrote:
> This patch makes bdrv_flush a synchronous function and updates any callers 
> from
> a coroutine context to use bdrv_co_flush instead.
> 
> The motivation for this patch comes from the GSoC Continuation-Passing C
> project. When coroutines were introduced, synchronous functions in the block
> layer were converted to use asynchronous methods by dynamically detecting if
> they were being run from a coroutine context by calling qemu_in_coroutine(), 
> and
> yielding if so. If they were not, they would spawn a new coroutine and poll
> until the asynchronous counterpart finished.
> 
> However this approach does not work with CPC as the CPC translator converts 
> all
> functions annotated coroutine_fn to a different (continuation based) calling
> convention. This means that coroutine_fn annotated functions cannot be called
> from a non-coroutine context.
> 
> This patch is a Request For Comments on the approach of splitting these
> "dynamic" functions into synchronous and asynchronous versions. This is easy 
> for
> bdrv_flush as it already has an asynchronous counterpart - bdrv_co_flush. The
> only caller of bdrv_flush from a coroutine context is mirror_drain in
> block/mirror.c - this should be annotated as a coroutine_fn as it calls
> qemu_coroutine_yield().
> 
> If this approach meets with approval I will develop a patchset splitting the
> other "dynamic" functions in the block layer. This will allow all coroutine
> functions to have a coroutine_fn annotation that can be statically checked 
> (CPC
> can be used to verify annotations).
> 
> I have audited the other callers of bdrv_flush, they are included below:
> 
> block.c: bdrv_reopen_prepare, bdrv_close, bdrv_commit, bdrv_pwrite_sync

bdrv_pwrite_sync() is a dynamic function.  If called from coroutine
context it will yield (indirectly from bdrv_pwrite() or bdrv_flush()).

> block/qcow2-cache.c: qcow2_cache_entry_flush, qcow2_cache_flush
> block/qcow2-refcount.c: qcow2_update_snapshot_refcount
> block/qcow2-snapshot.c: qcow2_write_snapshots
> block/qcow2.c: qcow2_mark_dirty, qcow2_mark_clean

qcow2 runs in coroutine context, the coroutine_fn annotations are just
missing.  See block/qcow2.c:bdrv_qcow2 for the entry points like
qcow2_co_readv.

> block/qed-check.c: qed_check_mark_clean
> block/qed.c: bdrv_qed_open, bdrv_qed_close
> blockdev.c: external_snapshot_prepare, do_drive_del
> cpus.c: do_vm_stop
> hw/block/nvme.c: nvme_clear_ctrl
> qemu-io-cmds.c: flush_f
> savevm.c: bdrv_fclose

I think the rest are fine and your patch looks good.



Re: [Qemu-devel] [PATCH 3/4] block/raw: add .bdrv_get_info

2013-07-18 Thread Stefan Hajnoczi
On Mon, Jul 15, 2013 at 12:49:34PM +0200, Peter Lieven wrote:
> Signed-off-by: Peter Lieven 
> ---
>  block/raw.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/block/raw.c b/block/raw.c
> index 8c81de9..f1682d4 100644
> --- a/block/raw.c
> +++ b/block/raw.c
> @@ -121,6 +121,11 @@ static int raw_has_zero_init(BlockDriverState *bs)
>  return bdrv_has_zero_init(bs->file);
>  }
>  
> +static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> +{
> +return bdrv_get_info(bs->file, bdi);
> +}
> +
>  static BlockDriver bdrv_raw = {
>  .format_name= "raw",
>  
> @@ -140,6 +145,7 @@ static BlockDriver bdrv_raw = {
>  
>  .bdrv_probe = raw_probe,
>  .bdrv_getlength = raw_getlength,
> +.bdrv_get_info  = raw_get_info,

I checked BlockDriverInfo to make sure the fields still make sense for
the raw BlockDriverState.  The vm_state_offset field is questionable,
since the raw BDS doesn't know about vmstate and you certainly cannot
write to it.  If protocols start supporting vmstate we might have to
rework that anyway, so I'm happy with this patch.

Stefan



Re: [Qemu-devel] [PATCH 0/4] qemu-img: conditionally discard target on convert

2013-07-18 Thread Stefan Hajnoczi
On Mon, Jul 15, 2013 at 12:49:31PM +0200, Peter Lieven wrote:
> this adds the proposed solution to add a generic mechanism to zeroize
> a target in qemu-img if it has discard_zeroes but has_zero_init is 0.
> 
> Peter Lieven (4):
>   block: add discard_zeroes and max_unmap to BlockDriverInfo
>   iscsi: add .bdrv_get_info
>   block/raw: add .bdrv_get_info
>   qemu-img: conditionally discard target on convert
> 
>  block/iscsi.c |   19 +++--
>  block/raw.c   |6 ++
>  include/block/block.h |5 +
>  qemu-img.c|   56 
> -
>  4 files changed, 83 insertions(+), 3 deletions(-)

Thanks, applied Patch 3 to my block tree as requested:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] Buildbots out of disk space

2013-07-18 Thread Stefan Hajnoczi
On Thu, Jul 18, 2013 at 12:45:42PM +0200, Andreas Färber wrote:
> Am 15.07.2013 08:43, schrieb Stefan Hajnoczi:
> > On Mon, Jul 15, 2013 at 07:31:05AM +0200, Christian Berendt wrote:
> >> Think we should clean up the registered build slaves. Here's a list
> >> of offline slaves. Can they be removed from the bot?
> >>
> >> default_s390
> > 
> > This slave goes offline relatively often but we do try to get it back
> > periodically.
> 
> It's a z/VM instance that had some networking issues but I can SSH onto
> the machine again now - I don't see any buildbot-related process running
> nor is there anything related scheduled in cron. Please advise what to
> do to get it back online.

I have added the following crontab entry for user 'build' on
qemu-s390.opensuse.org:

@reboot cd /home/build/qemu && /usr/local/bin/buildslave start

See here for buildslave status:

http://buildbot.b1-systems.de/qemu/buildslaves

Stefan



Re: [Qemu-devel] [PATCH 12/17] qemu-img: add a "map" subcommand

2013-07-18 Thread Stefan Hajnoczi
On Thu, Jul 18, 2013 at 02:04:31PM -0600, Eric Blake wrote:
> On 07/03/2013 08:34 AM, Paolo Bonzini wrote:
> > This command dumps the metadata of an entire chain, in either tabular or 
> > JSON
> > format.
> > 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  qemu-img-cmds.hx |   6 ++
> >  qemu-img.c   | 186 
> > +++
> >  2 files changed, 192 insertions(+)
> > 
> 
> > +case OFORMAT_JSON:
> > +printf("%s{ 'depth': %d, 'start': %lld, 'length': %lld, "
> > +   "'zero': %s, 'data': %s",
> > +   (e->start == 0 ? "[" : ",\n"),
> > +   e->depth, (long long) e->start, (long long) e->length,
> > +   (e->flags & BDRV_BLOCK_ZERO) ? "true" : "false",
> > +   (e->flags & BDRV_BLOCK_DATA) ? "true" : "false");
> > +if (e->flags & BDRV_BLOCK_OFFSET_VALID) {
> > +printf(", 'offset': %lld", (long long) e->offset);
> > +}
> > +   putchar('}');
> 
> Can we please get this format documented in qapi-schema.json, even if we
> aren't using qapi to generate it yet?

Paolo: Please send a follow-up patch documenting the json schema, I've
already merged this series.

I was although thinking about qemu-iotests for qemu-img map, but it's
tricky since the allocation is an internal detail of the image format.
Perhaps a test case using raw?

Stefan



Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata

2013-07-18 Thread Stefan Hajnoczi
On Tue, Jul 16, 2013 at 06:29:11PM +0200, Paolo Bonzini wrote:
> This series adds a subcommand to "map" that can dump file metadata.
> Metadata that is dumped includes:
> 
> - whether blocks are allocated in bs->file and, if so, where
> 
> - whether blocks are zero
> 
> - whether data is read from bs or bs->backing_hd
> 
> Metadata is dumped for an entire chain of images.  One example usage is
> (for non-compressed, non-encrypted images) to transform the metadata
> into a Linux device-mapper setup, and make a qcow2 image available (for
> read only) as a block device.  Another possible usage is to determine
> the used areas of a file, and convert it in place to another format.
> Alternatively, the richer data can be used by block jobs to quickly
> determine if a block is zero without reading it.
> 
> The patches implement a new operation, bdrv_co_get_block_status, that
> supersedes bdrv_co_is_allocated.  The bdrv_is_allocated API is still
> available of course, and implemented on top of the new operation.
> 
> Patches 1-3 touch cow, which uses bdrv_co_is_allocated during its reads
> and writes.  I optimized it a bit because it was unbearably slow during
> testing.  With these patches (also tested with blkverify), booting of
> a cow guest image is not particularly slow.
> 
> Patches 4 to 6 clean up the bdrv_is_allocated and bdrv_is_allocated_above
> implementation, eliminating some code duplication.
> 
> Patches 7 and 8 tweak bdrv_has_zero_init and its usage in qemu-img in
> a way that helps when implementing the new API.
> 
> Patches 9 to 11 implement bdrv_get_block_status and change the formats
> to report all the available information.
> 
> Patch 12 adds the "map" subcommand to qemu-img.
> 
> Finally, patches 13 to 17 tweak the implementation to extract more
> information from protocols, and combine this information with format
> metadata whenever possible.
> 
> Paolo
> 
> v1->v2: see patches 1, 6, 9, 10, 11, 12
> 
> Paolo Bonzini (17):
>   cow: make reads go at a decent speed
>   cow: make writes go at a less indecent speed
>   cow: do not call bdrv_co_is_allocated
>   block: make bdrv_co_is_allocated static
>   block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above
> distinction
>   block: expect errors from bdrv_co_is_allocated
>   qemu-img: always probe the input image for allocated sectors
>   block: make bdrv_has_zero_init return false for copy-on-write-images
>   block: introduce bdrv_get_block_status API
>   block: define get_block_status return value
>   block: return get_block_status data and flags for formats
>   qemu-img: add a "map" subcommand
>   block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO
>   raw-posix: return get_block_status data and flags
>   raw-posix: detect XFS unwritten extents
>   block: add default get_block_status implementation for protocols
>   block: look for zero blocks in bs->file
> 
>  block.c   | 134 +--
>  block/commit.c|   6 +-
>  block/cow.c   |  90 +--
>  block/mirror.c|   4 +-
>  block/qcow.c  |  13 ++-
>  block/qcow2.c |  24 +++--
>  block/qed.c   |  39 ++--
>  block/raw-posix.c |  24 +++--
>  block/raw.c   |   6 +-
>  block/sheepdog.c  |  14 +--
>  block/stream.c|  10 +--
>  block/vdi.c   |  17 +++-
>  block/vmdk.c  |  23 -
>  block/vvfat.c |  15 ++--
>  include/block/block.h |  34 +--
>  include/block/block_int.h |   2 +-
>  qemu-img-cmds.hx  |   6 ++
>  qemu-img.c| 225 
> +-
>  18 files changed, 503 insertions(+), 183 deletions(-)

Applied s/MAX/MIN/ fixup suggested by Fam.

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] [PATCH] block: fix bdrv_read_unthrottled()

2013-07-18 Thread Stefan Hajnoczi
On Thu, Jul 18, 2013 at 10:37:32AM +0200, Peter Lieven wrote:
> Signed-off-by: Peter Lieven 
> ---
>  block.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] [PATCH] cpus: Let vm_stop[_force_state]() always flush block devices

2013-07-18 Thread Stefan Hajnoczi
On Thu, Jul 18, 2013 at 02:52:19PM +0200, Kevin Wolf wrote:
> Even if the VM is already stopped, we cannot assume that all data has
> already been successfully flushed to disk. The flush during the previous
> vm_stop() could have failed.
> 
> Run bdrv_flush_all() unconditionally so that we get an error each time
> if the block device isn't really flushed.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  cpus.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] [PATCH v2 repost 8/9] i386: generate pc guest info

2013-07-18 Thread Hu Tao
<...>

> +void ich9_lpc_set_guest_info(PcGuestInfo *guest_info)
> +{
> +guest_info->sci_int = 9;
> +guest_info->acpi_enable_cmd = ICH9_APM_ACPI_ENABLE;
> +guest_info->acpi_disable_cmd = ICH9_APM_ACPI_DISABLE;
> +}
> +

This function has to be called somewhere(ich9_lpc_pm_init?) to setup
acpi_enable_cmd and acpi_disable_cmd, or guest Linux will report:

ACPI Error: No ACPI mode transition supported in this system (enable/disable 
both zero)

and disable ACPI.




Re: [Qemu-devel] [PATCH v7 00/10] qemu-ga: fsfreeze on Windows using VSS

2013-07-18 Thread Tomoki Sekiyama
Hi Michael,

> "CoCreateInstance(VSSCoordinator) failed. (Error: 80040154) Class not 
> registered

I have seen this error when I ran 32bit qemu-ga on 64bit Windows (2008 server 
R2).
Just in case, could you confirm qemu-ga.exe and qga-vss.dll are built for the
correct architecture?

Stopping VSS service by "net stop vss" might be helpful to fix hung.
Otherwise, possibly COM+ related registry are messed up...?
("qemu-ga -s uninstall" deletes whole qemu-ga VSS related registry entries,
 that should solve the problem in usual...)

And If you have any VSS/COM related log in Event Viewer (maybe in
System or Application log) , please let me know.

Thanks,
Tomoki Sekiyama


From: fluxion [fluks...@gmail.com] on behalf of Michael Roth 
[mdr...@linux.vnet.ibm.com]
Sent: Thursday, July 18, 2013 6:19 PM
To: Tomoki Sekiyama; qemu-devel@nongnu.org
Cc: libaiq...@huawei.com; ler...@redhat.com; gham...@redhat.com; 
stefa...@gmail.com; lcapitul...@redhat.com; vroze...@redhat.com; 
pbonz...@redhat.com; Seiji Aguchi; ebl...@redhat.com; ar...@redhat.com
Subject: Re: [PATCH v7 00/10] qemu-ga: fsfreeze on Windows using VSS

Quoting Tomoki Sekiyama (2013-07-15 11:20:23)
> Hi,
>
> This is v7 of patch series to add fsfreeze for Windows qemu-guest-agent.
>
> changes from v7:
>  - Fix COM initialization issue for Windows service thread (patch 07/10)
>
> v6: http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg01788.html

Hi Tomoki,

I'm also having some issues testing this, though I think my problem is a little
different than the issue in v6:

When I initially ran qemu-ga -s install, I got some output indicating the VSS
service was registered, but then it hung. I noticed afterward that I'd already
had the service running, so figured that that might've been the problem. So I
stopped the service and unregistered it (using a qemu-ga install that was in
a separate directory).

Then I went back to install via qemu-ga.exe -s install, and it just hangs with
no output. I wasn't sure how to reset the state, so I took the Windows approach
and rebooted.

After reboot, running qemu-ga -s install gives me:

C:\Users\mdroth\Documents\qga>qemu-ga.exe -s install
Registering QEMU Guest Agent VSS Provider:
  C:\Users\mdroth\Documents\qga\qga-vss.dll
  C:\Users\mdroth\Documents\qga\qga-vss.tlb
Failed to pCatalog->InstallComponent. (Error: 8011045c) The application name is
not unique and cannot be resolved to an application id

CoCreateInstance. (Error: 80040154) Class not registered

Removing COM+ Application: QEMU Guest Agent VSS Provider
Removing COM+ Application: QEMU Guest Agent VSS Provider

[mdroth@vm5 ~]$

I'm not sure if I'm still in a bad state from earlier, but I can't seem to
recover from here.

If I try running qemu-ga -s uninstall, then qemu-ga -s install again, I get a
pop-up error saying:

"CoCreateInstance(VSSCoordinator) failed. (Error: 80040154) Class not registered

Any ideas what's going on here? I'm on a Windows 7 vm and built using a Fedora
18 mingw environment. Let me know if you need any additional information to
debug.

Thanks!

>
>
> * Description
>   In Windows, VSS (Volume Shadow Copy Service) provides a facility to
>   quiesce filesystems and applications before disk snapshots are taken.
>   This patch series implements "fsfreeze" command of qemu-ga using VSS.
>
>
> * How to build & run qemu-ga with VSS support
>
>  - Download Microsoft VSS SDK from:
>http://www.microsoft.com/en-us/download/details.aspx?id=23490
>
>  - Setup the SDK
>scripts/extract-vsssdk-headers setup.exe (on POSIX-systems)
>
>  - Specify installed SDK directory to configure option as:
>./configure -with-vss-sdk="path/to/VSS SDK" 
> --cross-prefix=i686-w64-mingw32-
>
>  - make qemu-ga.exe qga/vss-win32/qga-vss.{dll,tlb}
>
>  - Install qemu-ga.exe, qga/vss-win32/qga-vss.{dll,tlb}, and
>the other required mingw libraries into the same directory in guests
>
>  - Run `qemu-ga.exe -s install' and `net start qemu-ga' in the guests
>
> Any feedback are appreciated.
>
> ---
> Tomoki Sekiyama (10):
>   configure: Support configuring C++ compiler
>   Add c++ keywords to QAPI helper script
>   checkpatch.pl: Check .cpp files
>   Add a script to extract VSS SDK headers on POSIX system
>   qemu-ga: Add configure options to specify path to Windows/VSS SDK
>   error: Add error_set_win32 and error_setg_win32
>   qemu-ga: Add Windows VSS provider and requester as DLL
>   qemu-ga: Call Windows VSS requester in fsfreeze command handler
>   qemu-ga: Install Windows VSS provider on `qemu-ga -s install'
>   QMP/qemu-ga-client: Make timeout longer for guest-fsfreeze-freeze 
> command
>
>
>  .gitignore |1
>  Makefile   |3
>  Makefile.objs  |2
>  QMP/qemu-ga-client |4
>  configure  |   87 +++
>  hmp.c  |2
>  hw/pci/pci.c   |

Re: [Qemu-devel] [PATCH] Bug Fix:Segmentation fault when use usb-ehci device

2013-07-18 Thread Mike Qiu

于 2013/7/19 1:14, Andreas Färber 写道:

Hi,

Am 18.07.2013 17:27, schrieb Mike Qiu:

Hi all

Any comments ?

You should've CCed the USB maintainer whose file you are touching for
review rather than just ppc people, see ./MAINTAINERS.

I have CC to the usb naintainer Gerd Hoffmann, his files are hw/usb/*.



There's some typos in the commit message, but the change looks okay to
me - although there were discussions to catch this on the memory API
side of things instead.

You mean this patch: see below:

exec: Support 64-bit operations in address_s

if so it is very different.

BTW, this bug has been opened before?

Thanks
Mike


Regards,
Andreas


Thanks
Mike
2013/7/16 11:50, Mike Qiu wrote:

For usb-ehci in qemu, its caps just has read() operation,
the write() operation does not exist.

This cause a Segmentation fault when use usb-ehci device in ppc64
platform.

here is gdb output:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffa7fcef20 (LWP 6793)]
0x103f5244 in memory_region_oldmmio_write_accessor
(opaque=0x113e9e78, addr=9, value=0x3fffa7fce088,
size=1, shift=0, mask=255) at /home/Mike/qemu-impreza/memory.c:384
384  mr->ops->old_mmio.write[ctz32(size)](mr->opaque, addr, tmp);
(gdb) p *mr->ops
$1 = {read = @0x10716f68: 0x1020699c , write = 0,
endianness = DEVICE_LITTLE_ENDIAN, valid = {min_access_size = 1,
max_access_size = 4, unaligned = false, accepts = 0}, impl =
{min_access_size = 1, max_access_size = 1, unaligned = false},
old_mmio = {read = {0, 0, 0}, write = {0, 0, 0}}}

Becasue function write() of mr->ops has not been implement, in
function memory_region_dispatch_write(), it call
oldmmio write accessor, but at the same time old_mmio still not
been implement by default.

That is the root cause of the Segmentation fault.

To solve this problem, add empty function: ehci_caps_write()

Signed-off-by: Mike Qiu 
---
  hw/usb/hcd-ehci.c |7 +++
  1 file changed, 7 insertions(+)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 67e4b24..6c8a439 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1072,6 +1072,12 @@ static void ehci_port_write(void *ptr, hwaddr addr,
  trace_usb_ehci_portsc_change(addr + s->portscbase, addr >> 2, *portsc, 
old);
  }

+static void ehci_caps_write(void *ptr, hwaddr addr, uint64_t val,
+   unsigned size)
+{
+/* nothing */
+}
+
  static void ehci_opreg_write(void *ptr, hwaddr addr,
   uint64_t val, unsigned size)
  {
@@ -2380,6 +2386,7 @@ static void ehci_frame_timer(void *opaque)

  static const MemoryRegionOps ehci_mmio_caps_ops = {
  .read = ehci_caps_read,
+.write = ehci_caps_write,
  .valid.min_access_size = 1,
  .valid.max_access_size = 4,
  .impl.min_access_size = 1,








Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves

2013-07-18 Thread Stefan Hajnoczi
On Thu, Jul 18, 2013 at 07:48:28PM +0100, Alex Bligh wrote:
> Stefan,
> 
> --On 17 July 2013 11:02:30 +0800 Stefan Hajnoczi  wrote:
> 
> >The steps to achieving this:
> >
> >1. Drop alarm timers from qemu-timer.c and calculate g_poll() timeout
> >   instead for the main loop.
> >
> >2. Introduce a per-AioContext aio_ctx_clock that can be used with
> >   qemu_new_timer() to create a QEMUTimer that expires during
> >   aio_poll().
> >
> >3. Calculate g_poll() timeout for aio_ctx_clock in aio_poll().
> 
> I've pretty much written this.
> 
> Two issues:
> 
> 1. I very much enjoyed deleting all the alarm timers code. However, it was
>   doing something vaguely useful, which was calling qemu_notify_event
>   when the timer expired. Under the new regime above, if the AioContext
>   clock's timers expires within aio_poll, life is good as the timeout
>   to g_poll will expire. However, this won't apply if any of the other
>   three static clock's timers expire. Also, it won't work within the
>   mainloop poll. Also, we lose the ability to respond to timers in a sub
>   millisecond way.
> 
>   Options:
> 
>   a) restore alarm timers (at least for the time being). Make all
>  alarm timers do qemu_notify_event. However, only run the AioContext
>  clock's timers within aio_poll. This is the least intrusive change.
> 
>   b) calculate the timeout in aio_poll with respect to the minimum
>  deadline across all clocks, not just AioContext's clock. Use the
>  same logic in mainloop.
> 
>   I'd go for (b), except for the millisecond accuracy thing. So my
>   temptation (sadly) is (a).

I think this is a non-issue because host_alarm_handler() can only be
called from the main loop:

main-loop.c:qemu_signal_init() sets up signalfd to monitor SIGALRM.
Therefore we do not asynchronously invoke the SIGALRM signals handler.
It is only invoked from main-loop.c:sigfd_handler() when the main loop
runs.

That's how I read the code.  I haven't checked a running process to be
sure.

> 2. If we do delete alarm timers, I'll need to delete the -clock option.

I noticed this too because I think we should stub it out for
compatibility.  Accept existing options but ignore them, update
documentation to state that they are kept for compatibility.

Stefan



Re: [Qemu-devel] [PATCH V4 3/4] Add backing drive while performing backup.

2013-07-18 Thread Fam Zheng
On Thu, 07/18 15:13, Paolo Bonzini wrote:
> Il 18/07/2013 08:39, Fam Zheng ha scritto:
> > On Wed, 07/17 13:04, Ian Main wrote:
> >> This patch adds the original source drive as a backing drive to our target
> >> image so that the target image will appear complete during backup.  This
> >> is especially useful for SYNC_MODE_NONE as it allows export via NBD to
> >> have a complete point-in-time snapshot available for export.
> >>
> >> Signed-off-by: Ian Main 
> >> ---
> >>  block/backup.c | 13 +
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/block/backup.c b/block/backup.c
> >> index 68abd23..e086e76 100644
> >> --- a/block/backup.c
> >> +++ b/block/backup.c
> >> @@ -323,6 +323,10 @@ static void coroutine_fn backup_run(void *opaque)
> >>  
> >>  hbitmap_free(job->bitmap);
> >>  
> >> +/* Set the target backing drive back to NULL before calling delete or
> >> + * it will also delete the underlying drive. */
> >> +target->backing_hd = NULL;
> >> +
> >>  bdrv_iostatus_disable(target);
> >>  bdrv_delete(target);
> >>  
> >> @@ -362,6 +366,15 @@ void backup_start(BlockDriverState *bs, 
> >> BlockDriverState *target,
> >>  return;
> >>  }
> >>  
> >> +/* Manually set the backing hd to be the backup source drive so
> >> + * that all reads done while we are backing up will be passed
> >> + * on to the original source drive.  This allows reading from the
> >> + * image while the backup is in progress, or in the case of
> >> + * SYNC_MODE_NONE allows a complete image to be present for export.
> >> + * Note that we do this for all modes including SYNC_MODE_TOP as
> >> + * even then it allows on-the-fly reading. */
> >> +target->backing_hd = bs;
> >> +
> > 
> > Also set target->backing_file and target->backing_format here? Paolo?
> 
> I don't think so, it is temporary while the job runs so that the NBD
> server can already return the actual data.

OK.

For NBD export, it's also going to have a name, to be used with
nbd_server_add. So should we call bdrv_set_in_use() here to protect
target from being used elsewhere?

-- 
Fam



Re: [Qemu-devel] [PATCH] linux-user: Fix target_stat and target_stat64 for OpenRISC

2013-07-18 Thread Jia Liu
Hi Peter,

On Thu, Jul 18, 2013 at 6:18 PM, Peter Maydell  wrote:
> Ping?
>

Thank you, it looks good to me, please push it.

> thanks
> -- PMM
>
> On 6 July 2013 21:44, Peter Maydell  wrote:
>> OpenRISC uses the asm-generic versions of target_stat and
>> target_stat64, but it was incorrectly using the x86/ARM/etc version
>> due to a misplaced defined(TARGET_OPENRISC).  The previously unused
>> OpenRISC section of the ifdef ladder also defined an incorrect
>> target_stat and omitted the target_stat64 definition.  Fix
>> target_stat, provide target_stat64, and add a comment noting that
>> these are the asm-generic versions for the benefit of future ports.
>>
>> Signed-off-by: Peter Maydell 
>> ---
>> This fixes basic problems like "ls -l output is nonsense" and "shell
>> thinks programs aren't executable and won't run them".
>>
>>  linux-user/syscall_defs.h |   49 
>> ++---
>>  1 file changed, 37 insertions(+), 12 deletions(-)
>>
>> diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
>> index 92c01a9..cb6341f 100644
>> --- a/linux-user/syscall_defs.h
>> +++ b/linux-user/syscall_defs.h
>> @@ -1138,8 +1138,7 @@ struct target_winsize {
>>  #endif
>>
>>  #if (defined(TARGET_I386) && defined(TARGET_ABI32)) || defined(TARGET_ARM) \
>> -|| defined(TARGET_CRIS) || defined(TARGET_UNICORE32) \
>> -|| defined(TARGET_OPENRISC)
>> +|| defined(TARGET_CRIS) || defined(TARGET_UNICORE32)
>>  struct target_stat {
>> unsigned short st_dev;
>> unsigned short __pad1;
>> @@ -1837,29 +1836,55 @@ struct target_stat {
>>  abi_ulong  __unused[3];
>>  };
>>  #elif defined(TARGET_OPENRISC)
>> +
>> +/* These are the asm-generic versions of the stat and stat64 structures */
>> +
>>  struct target_stat {
>>  abi_ulong st_dev;
>>  abi_ulong st_ino;
>> -abi_ulong st_nlink;
>> -
>>  unsigned int st_mode;
>> +unsigned int st_nlink;
>>  unsigned int st_uid;
>>  unsigned int st_gid;
>> -unsigned int __pad0;
>>  abi_ulong st_rdev;
>> +abi_ulong __pad1;
>>  abi_long st_size;
>> -abi_long st_blksize;
>> -abi_long st_blocks;/* Number 512-byte blocks allocated. */
>> -
>> -abi_ulong target_st_atime;
>> +int st_blksize;
>> +int __pad2;
>> +abi_long st_blocks;
>> +abi_long target_st_atime;
>>  abi_ulong target_st_atime_nsec;
>> -abi_ulong target_st_mtime;
>> +abi_long target_st_mtime;
>>  abi_ulong target_st_mtime_nsec;
>> -abi_ulong target_st_ctime;
>> +abi_long target_st_ctime;
>>  abi_ulong target_st_ctime_nsec;
>> +unsigned int __unused4;
>> +unsigned int __unused5;
>> +};
>>
>> -abi_long __unused[3];
>> +struct target_stat64 {
>> +uint64_t st_dev;
>> +uint64_t st_ino;
>> +unsigned int st_mode;
>> +unsigned int st_nlink;
>> +unsigned int st_uid;
>> +unsigned int st_gid;
>> +uint64_t st_rdev;
>> +uint64_t __pad1;
>> +int64_t st_size;
>> +int st_blksize;
>> +int __pad2;
>> +int64_t st_blocks;
>> +int target_st_atime;
>> +unsigned int target_st_atime_nsec;
>> +int target_st_mtime;
>> +unsigned int target_st_mtime_nsec;
>> +int target_st_ctime;
>> +unsigned int target_st_ctime_nsec;
>> +unsigned int __unused4;
>> +unsigned int __unused5;
>>  };
>> +
>>  #else
>>  #error unsupported CPU
>>  #endif
>> --
>> 1.7.9.5
>>
>>

Regards,
Jia



[Qemu-devel] Is it possible to detect GPA access through the mapped HVA

2013-07-18 Thread Hu Yaohui
Hi
I am new to QEMU. I want to know is it possible to detect the guest OS
physical memory access through QEMU? What I am doing right now is use
mprotect to set the mapped RAM as not accessible. Then register the SIGSEGV
handler to handle the segmentation fault in qemu_kvm_eat_signals. But I
always get Segmentation fault. I don't know whether this method works or I
miss something. Thanks for your time!

Best Wishes,
Yaohui Hu


[Qemu-devel] Fwd: Is it possible to detect GPA access through the mapped HVA

2013-07-18 Thread Hu Yaohui
Hi
I am new to QEMU. I want to know is it possible to detect the guest OS
physical memory access through QEMU? What I am doing right now is use
mprotect to set the mapped RAM as not accessible. Then register the SIGSEGV
handler to handle the segmentation fault in qemu_kvm_eat_signals. But I
always get Segmentation fault. I don't know whether this method works or I
miss something. Thanks for your time!

Best Wishes,
Yaohui Hu


Re: [Qemu-devel] [PATCH] PPC: dbdma: macio: Fix format specifiers (build regression)

2013-07-18 Thread Alexander Graf

On 18.07.2013, at 21:49, Stefan Weil wrote:

> Am 16.07.2013 07:54, schrieb Stefan Weil:
>> Am 12.07.2013 18:48, schrieb Stefan Weil:
>>> Fix a number of warnings for 32 bit builds (tested on MingW and Linux):
>>> 
>>>  CChw/ide/macio.o
>>> qemu/hw/ide/macio.c: In function 'pmac_ide_atapi_transfer_cb':
>>> qemu/hw/ide/macio.c:134:9: error: format '%lx' expects argument of type 
>>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format]
>>> qemu/hw/ide/macio.c: In function 'pmac_ide_transfer_cb':
>>> qemu/hw/ide/macio.c:215:5: error: format '%ld' expects argument of type 
>>> 'long int', but argument 5 has type 'int64_t' [-Werror=format]
>>> qemu/hw/ide/macio.c:222:9: error: format '%lx' expects argument of type 
>>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format]
>>> qemu/hw/ide/macio.c:264:9: error: format '%lx' expects argument of type 
>>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format]
>>> cc1: all warnings being treated as errors
>>> make: *** [hw/ide/macio.o] Error 1
>>> 
>>> Signed-off-by: Stefan Weil 
>>> ---
>>> 
>>> 
>>> 
>>> Hi Anthony,
>>> 
>>> the patch fixes a build regression which was introduced today.
>>> Could you please apply it without waiting for the next pull requests?
>>> 
>>> Thanks,
>>> Stefan
>>> 
>>> 
>>> 
>>> hw/ide/macio.c |9 +
>>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
>>> index 38ad924..ef4ba2b 100644
>>> --- a/hw/ide/macio.c
>>> +++ b/hw/ide/macio.c
>>> @@ -131,7 +131,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, 
>>> int ret)
>>> int sector_num = (s->lba << 2) + (s->io_buffer_index >> 9);
>>> int nsector = io->len >> 9;
>>> 
>>> -MACIO_DPRINTF("precopying unaligned %d bytes to %#lx\n",
>>> +MACIO_DPRINTF("precopying unaligned %d bytes to %#" HWADDR_PRIx 
>>> "\n",
>>>   unaligned, io->addr + io->len - unaligned);
>>> 
>>> bdrv_read(s->bs, sector_num + nsector, io->remainder, 1);
>>> @@ -212,14 +212,15 @@ static void pmac_ide_transfer_cb(void *opaque, int 
>>> ret)
>>> s->nsector -= n;
>>> }
>>> 
>>> -MACIO_DPRINTF("remainder: %d io->len: %d nsector: %d sector_num: 
>>> %ld\n",
>>> +MACIO_DPRINTF("remainder: %d io->len: %d nsector: %d "
>>> +  "sector_num: %" PRId64 "\n",
>>>   io->remainder_len, io->len, s->nsector, sector_num);
>>> if (io->remainder_len && io->len) {
>>> /* guest wants the rest of its previous transfer */
>>> int remainder_len = MIN(io->remainder_len, io->len);
>>> uint8_t *p = &io->remainder[0x200 - remainder_len];
>>> 
>>> -MACIO_DPRINTF("copying remainder %d bytes at %#lx\n",
>>> +MACIO_DPRINTF("copying remainder %d bytes at %#" HWADDR_PRIx "\n",
>>>   remainder_len, io->addr);
>>> 
>>> switch (s->dma_cmd) {
>>> @@ -261,7 +262,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
>>> if (unaligned) {
>>> int nsector = io->len >> 9;
>>> 
>>> -MACIO_DPRINTF("precopying unaligned %d bytes to %#lx\n",
>>> +MACIO_DPRINTF("precopying unaligned %d bytes to %#" HWADDR_PRIx 
>>> "\n",
>>>   unaligned, io->addr + io->len - unaligned);
>>> 
>>> switch (s->dma_cmd) {
>> 
>> Ping?
> 
> 
> Ping²?

Acked-by: Alexander Graf 

I assume this can go through the trivial tree? Or directly get applied by 
Anthony?


Alex




Re: [Qemu-devel] [PATCH v7 00/10] qemu-ga: fsfreeze on Windows using VSS

2013-07-18 Thread Michael Roth
Quoting Tomoki Sekiyama (2013-07-15 11:20:23)
> Hi,
> 
> This is v7 of patch series to add fsfreeze for Windows qemu-guest-agent.
> 
> changes from v7:
>  - Fix COM initialization issue for Windows service thread (patch 07/10)
> 
> v6: http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg01788.html

Hi Tomoki,

I'm also having some issues testing this, though I think my problem is a little
different than the issue in v6:

When I initially ran qemu-ga -s install, I got some output indicating the VSS
service was registered, but then it hung. I noticed afterward that I'd already
had the service running, so figured that that might've been the problem. So I
stopped the service and unregistered it (using a qemu-ga install that was in
a separate directory).

Then I went back to install via qemu-ga.exe -s install, and it just hangs with
no output. I wasn't sure how to reset the state, so I took the Windows approach
and rebooted.

After reboot, running qemu-ga -s install gives me:

C:\Users\mdroth\Documents\qga>qemu-ga.exe -s install
Registering QEMU Guest Agent VSS Provider:
  C:\Users\mdroth\Documents\qga\qga-vss.dll
  C:\Users\mdroth\Documents\qga\qga-vss.tlb
Failed to pCatalog->InstallComponent. (Error: 8011045c) The application name is
not unique and cannot be resolved to an application id

CoCreateInstance. (Error: 80040154) Class not registered

Removing COM+ Application: QEMU Guest Agent VSS Provider
Removing COM+ Application: QEMU Guest Agent VSS Provider

[mdroth@vm5 ~]$

I'm not sure if I'm still in a bad state from earlier, but I can't seem to
recover from here.

If I try running qemu-ga -s uninstall, then qemu-ga -s install again, I get a
pop-up error saying:

"CoCreateInstance(VSSCoordinator) failed. (Error: 80040154) Class not registered

Any ideas what's going on here? I'm on a Windows 7 vm and built using a Fedora
18 mingw environment. Let me know if you need any additional information to
debug.

Thanks!

> 
> 
> * Description
>   In Windows, VSS (Volume Shadow Copy Service) provides a facility to
>   quiesce filesystems and applications before disk snapshots are taken.
>   This patch series implements "fsfreeze" command of qemu-ga using VSS.
> 
> 
> * How to build & run qemu-ga with VSS support
> 
>  - Download Microsoft VSS SDK from:
>http://www.microsoft.com/en-us/download/details.aspx?id=23490
> 
>  - Setup the SDK
>scripts/extract-vsssdk-headers setup.exe (on POSIX-systems)
> 
>  - Specify installed SDK directory to configure option as:
>./configure -with-vss-sdk="path/to/VSS SDK" 
> --cross-prefix=i686-w64-mingw32-
> 
>  - make qemu-ga.exe qga/vss-win32/qga-vss.{dll,tlb}
> 
>  - Install qemu-ga.exe, qga/vss-win32/qga-vss.{dll,tlb}, and
>the other required mingw libraries into the same directory in guests
> 
>  - Run `qemu-ga.exe -s install' and `net start qemu-ga' in the guests
> 
> Any feedback are appreciated.
> 
> ---
> Tomoki Sekiyama (10):
>   configure: Support configuring C++ compiler
>   Add c++ keywords to QAPI helper script
>   checkpatch.pl: Check .cpp files
>   Add a script to extract VSS SDK headers on POSIX system
>   qemu-ga: Add configure options to specify path to Windows/VSS SDK
>   error: Add error_set_win32 and error_setg_win32
>   qemu-ga: Add Windows VSS provider and requester as DLL
>   qemu-ga: Call Windows VSS requester in fsfreeze command handler
>   qemu-ga: Install Windows VSS provider on `qemu-ga -s install'
>   QMP/qemu-ga-client: Make timeout longer for guest-fsfreeze-freeze 
> command
> 
> 
>  .gitignore |1 
>  Makefile   |3 
>  Makefile.objs  |2 
>  QMP/qemu-ga-client |4 
>  configure  |   87 +++
>  hmp.c  |2 
>  hw/pci/pci.c   |2 
>  include/qapi/error.h   |   13 +
>  qga/Makefile.objs  |5 
>  qga/commands-win32.c   |   82 ++
>  qga/main.c |   10 +
>  qga/vss-win32.c|  154 
>  qga/vss-win32.h|   27 ++
>  qga/vss-win32/Makefile.objs|   23 ++
>  qga/vss-win32/install.cpp  |  424 +
>  qga/vss-win32/provider.cpp |  513 
> 
>  qga/vss-win32/qga-vss.def  |   13 +
>  qga/vss-win32/qga-vss.idl  |   20 ++
>  qga/vss-win32/qga-vss.tlb  |  Bin
>  qga/vss-win32/requester.cpp|  487 ++
>  qga/vss-win32/requester.h  |   42 +++
>  qga/vss-win32/vss-common.h |  128 ++
>  rules.mak  |9 +
>  scripts/checkpatch.pl  |   37 ++-
>  scripts/extract-vsssdk-headers |   35 +++
>  scripts/qapi.py|   12 +
>  util/error.c   |   35 +++
>  27 files changed, 2146 insertions(+), 24 deletions(-)
>  create mode 100644 qga/vss-win32.c
>  create mode 100644 qga/vss-win32.h

[Qemu-devel] RFC [PATCH] Make bdrv_flush synchronous only and update callers

2013-07-18 Thread Charlie Shepherd
This patch makes bdrv_flush a synchronous function and updates any callers from
a coroutine context to use bdrv_co_flush instead.

The motivation for this patch comes from the GSoC Continuation-Passing C
project. When coroutines were introduced, synchronous functions in the block
layer were converted to use asynchronous methods by dynamically detecting if
they were being run from a coroutine context by calling qemu_in_coroutine(), and
yielding if so. If they were not, they would spawn a new coroutine and poll
until the asynchronous counterpart finished.

However this approach does not work with CPC as the CPC translator converts all
functions annotated coroutine_fn to a different (continuation based) calling
convention. This means that coroutine_fn annotated functions cannot be called
from a non-coroutine context.

This patch is a Request For Comments on the approach of splitting these
"dynamic" functions into synchronous and asynchronous versions. This is easy for
bdrv_flush as it already has an asynchronous counterpart - bdrv_co_flush. The
only caller of bdrv_flush from a coroutine context is mirror_drain in
block/mirror.c - this should be annotated as a coroutine_fn as it calls
qemu_coroutine_yield().

If this approach meets with approval I will develop a patchset splitting the
other "dynamic" functions in the block layer. This will allow all coroutine
functions to have a coroutine_fn annotation that can be statically checked (CPC
can be used to verify annotations).

I have audited the other callers of bdrv_flush, they are included below:

block.c: bdrv_reopen_prepare, bdrv_close, bdrv_commit, bdrv_pwrite_sync
block/qcow2-cache.c: qcow2_cache_entry_flush, qcow2_cache_flush
block/qcow2-refcount.c: qcow2_update_snapshot_refcount
block/qcow2-snapshot.c: qcow2_write_snapshots
block/qcow2.c: qcow2_mark_dirty, qcow2_mark_clean
block/qed-check.c: qed_check_mark_clean
block/qed.c: bdrv_qed_open, bdrv_qed_close
blockdev.c: external_snapshot_prepare, do_drive_del
cpus.c: do_vm_stop
hw/block/nvme.c: nvme_clear_ctrl
qemu-io-cmds.c: flush_f
savevm.c: bdrv_fclose

---
 block.c| 13 -
 block/mirror.c |  4 ++--
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 6c493ad..00d71df 100644
--- a/block.c
+++ b/block.c
@@ -4110,15 +4110,10 @@ int bdrv_flush(BlockDriverState *bs)
 .ret = NOT_DONE,
 };
 
-if (qemu_in_coroutine()) {
-/* Fast-path if already in coroutine context */
-bdrv_flush_co_entry(&rwco);
-} else {
-co = qemu_coroutine_create(bdrv_flush_co_entry);
-qemu_coroutine_enter(co, &rwco);
-while (rwco.ret == NOT_DONE) {
-qemu_aio_wait();
-}
+co = qemu_coroutine_create(bdrv_flush_co_entry);
+qemu_coroutine_enter(co, &rwco);
+while (rwco.ret == NOT_DONE) {
+qemu_aio_wait();
 }
 
 return rwco.ret;
diff --git a/block/mirror.c b/block/mirror.c
index bed4a7e..3d5da7e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -282,7 +282,7 @@ static void mirror_free_init(MirrorBlockJob *s)
 }
 }
 
-static void mirror_drain(MirrorBlockJob *s)
+static void coroutine_fn mirror_drain(MirrorBlockJob *s)
 {
 while (s->in_flight > 0) {
 qemu_coroutine_yield();
@@ -390,7 +390,7 @@ static void coroutine_fn mirror_run(void *opaque)
 should_complete = false;
 if (s->in_flight == 0 && cnt == 0) {
 trace_mirror_before_flush(s);
-ret = bdrv_flush(s->target);
+ret = bdrv_co_flush(s->target);
 if (ret < 0) {
 if (mirror_error_action(s, false, -ret) == BDRV_ACTION_REPORT) 
{
 goto immediate_exit;
-- 
1.8.3.2




Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6

2013-07-18 Thread Paul Moore
On Thursday, July 18, 2013 10:31:46 PM Peter Maydell wrote:
> On 18 July 2013 21:05, Paul Moore  wrote:
> > On Thursday, July 18, 2013 08:48:10 PM Peter Maydell wrote:
> >> On 18 July 2013 20:39, Paul Moore  wrote:
> >> > On the plus side, I think libseccomp is very close to being pretty much
> >> > feature complete (excluding new architectures that may pop up, at
> >> > present
> >> > we are only x86, x86_64, x32, and ARM)
> >> 
> >> ...AArch64 ? :-)
> > 
> > Not yet, just 32-bit ARM EABI.
> > 
> > If you've got a working system and are willing to so some hacking or run
> > some tests we could work on it for a future libseccomp release.  An
> > emulated AArch64 VM would also work, but that route can be slow/annoying.
> 
> Simulators are all we have right now (we're juuust getting to the
> point where hardware is starting to become available). I wasn't
> being serious really, though I'm sure somebody (possibly even
> somebody at Red Hat :-)) will work around to it at some point.

Regardless, consider it a standing offer.

-- 
paul moore
security and virtualization @ redhat




Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6

2013-07-18 Thread Peter Maydell
On 18 July 2013 21:05, Paul Moore  wrote:
> On Thursday, July 18, 2013 08:48:10 PM Peter Maydell wrote:
>> On 18 July 2013 20:39, Paul Moore  wrote:
>> > On the plus side, I think libseccomp is very close to being pretty much
>> > feature complete (excluding new architectures that may pop up, at present
>> > we are only x86, x86_64, x32, and ARM)
>>
>> ...AArch64 ? :-)
>>
>
> Not yet, just 32-bit ARM EABI.
>
> If you've got a working system and are willing to so some hacking or run some
> tests we could work on it for a future libseccomp release.  An emulated
> AArch64 VM would also work, but that route can be slow/annoying.

Simulators are all we have right now (we're juuust getting to the
point where hardware is starting to become available). I wasn't
being serious really, though I'm sure somebody (possibly even
somebody at Red Hat :-)) will work around to it at some point.

-- PMM



[Qemu-devel] [Bug 1081416] Re: Qemu 1.2.0 crashes when using tcp serial console and GRUB boots

2013-07-18 Thread Ian Wells
I'm seeing this too.  If someone cares to tell me how I get a core file
from qemu-under-libvirt I will do that and report back on debugging.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1081416

Title:
  Qemu 1.2.0 crashes when using tcp serial console and GRUB boots

Status in QEMU:
  New

Bug description:
  When booting OpenWRT Attitude Adjustement ( 
http://downloads.openwrt.org/attitude_adjustment/12.09-beta2/x86/generic/openwrt-x86-generic-combined-ext4.img.gz
 ) with this command line:
  qemu-system-x86_64 -serial tcp:127.0.0.1: -hda 
openwrt-x86-generic-combined-ext4.img

  Qemu crashes as soon as GRUB starts, after network cards start.

  *** buffer overflow detected ***: /usr/bin/qemu-system-x86_64 terminated
  === Backtrace: =
  /usr/lib/libc.so.6(__fortify_fail+0x37)[0x745f2ad7]
  /usr/lib/libc.so.6(+0xf9bb0)[0x745f0bb0]
  /usr/lib/libc.so.6(+0xfba47)[0x745f2a47]
  /usr/bin/qemu-system-x86_64[0x46a628]
  /usr/bin/qemu-system-x86_64[0x4e8a14]
  /usr/bin/qemu-system-x86_64[0x4e802b]
  /usr/lib/libc.so.6(__libc_start_main+0xf5)[0x74518725]
  /usr/bin/qemu-system-x86_64[0x40d949]

  
  Here is a GDB backtrace:

  Program received signal SIGABRT, Aborted.
  0x7452bfa5 in raise () from /usr/lib/libc.so.6
  (gdb) bt
  #0  0x7452bfa5 in raise () from /usr/lib/libc.so.6
  #1  0x7452d428 in abort () from /usr/lib/libc.so.6
  #2  0x7456acfb in __libc_message () from /usr/lib/libc.so.6
  #3  0x745f2ad7 in __fortify_fail () from /usr/lib/libc.so.6
  #4  0x745f0bb0 in __chk_fail () from /usr/lib/libc.so.6
  #5  0x745f2a47 in __fdelt_warn () from /usr/lib/libc.so.6
  #6  0x0046a628 in qemu_iohandler_poll (readfds=0xdb7da0 , 
  writefds=0xdb7e20 , xfds=0x6, xfds@entry=0xdb7ea0 , ret=-1, 
  ret@entry=1) at iohandler.c:121
  #7  0x004e8a14 in main_loop_wait (nonblocking=)
  at main-loop.c:497
  #8  0x004e802b in main_loop ()
  at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:1643
  #9  main (argc=, argv=, envp=)
  at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:3755
  (gdb) 

  Here is a more useless dump...

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1081416/+subscriptions



[Qemu-devel] [Bug 1081416] Re: Qemu 1.2.0 crashes when using tcp serial console and GRUB boots

2013-07-18 Thread Ian Wells
(fairly sure it's in the iohandler based on a manual check of the
symbols, though)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1081416

Title:
  Qemu 1.2.0 crashes when using tcp serial console and GRUB boots

Status in QEMU:
  New

Bug description:
  When booting OpenWRT Attitude Adjustement ( 
http://downloads.openwrt.org/attitude_adjustment/12.09-beta2/x86/generic/openwrt-x86-generic-combined-ext4.img.gz
 ) with this command line:
  qemu-system-x86_64 -serial tcp:127.0.0.1: -hda 
openwrt-x86-generic-combined-ext4.img

  Qemu crashes as soon as GRUB starts, after network cards start.

  *** buffer overflow detected ***: /usr/bin/qemu-system-x86_64 terminated
  === Backtrace: =
  /usr/lib/libc.so.6(__fortify_fail+0x37)[0x745f2ad7]
  /usr/lib/libc.so.6(+0xf9bb0)[0x745f0bb0]
  /usr/lib/libc.so.6(+0xfba47)[0x745f2a47]
  /usr/bin/qemu-system-x86_64[0x46a628]
  /usr/bin/qemu-system-x86_64[0x4e8a14]
  /usr/bin/qemu-system-x86_64[0x4e802b]
  /usr/lib/libc.so.6(__libc_start_main+0xf5)[0x74518725]
  /usr/bin/qemu-system-x86_64[0x40d949]

  
  Here is a GDB backtrace:

  Program received signal SIGABRT, Aborted.
  0x7452bfa5 in raise () from /usr/lib/libc.so.6
  (gdb) bt
  #0  0x7452bfa5 in raise () from /usr/lib/libc.so.6
  #1  0x7452d428 in abort () from /usr/lib/libc.so.6
  #2  0x7456acfb in __libc_message () from /usr/lib/libc.so.6
  #3  0x745f2ad7 in __fortify_fail () from /usr/lib/libc.so.6
  #4  0x745f0bb0 in __chk_fail () from /usr/lib/libc.so.6
  #5  0x745f2a47 in __fdelt_warn () from /usr/lib/libc.so.6
  #6  0x0046a628 in qemu_iohandler_poll (readfds=0xdb7da0 , 
  writefds=0xdb7e20 , xfds=0x6, xfds@entry=0xdb7ea0 , ret=-1, 
  ret@entry=1) at iohandler.c:121
  #7  0x004e8a14 in main_loop_wait (nonblocking=)
  at main-loop.c:497
  #8  0x004e802b in main_loop ()
  at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:1643
  #9  main (argc=, argv=, envp=)
  at /usr/src/aur/qemu/src/qemu-1.2.0/vl.c:3755
  (gdb) 

  Here is a more useless dump...

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1081416/+subscriptions



Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/2] Add Enhanced Three-Speed Ethernet Controller (eTSEC)

2013-07-18 Thread Scott Wood

On 07/18/2013 04:27:50 AM, Fabien Chouteau wrote:

On 07/17/2013 11:02 PM, Scott Wood wrote:
> On 07/17/2013 05:17:06 AM, Fabien Chouteau wrote:
>> On 07/16/2013 07:50 PM, Scott Wood wrote:
>> > On 07/16/2013 10:28:28 AM, Fabien Chouteau wrote:
>> >> On 07/16/2013 04:06 AM, Scott Wood wrote:
>> >> > On 07/10/2013 12:10:02 PM, Fabien Chouteau wrote:
>> >> >> +if (*size == etsec->rx_padding) {
>> >> >> +/* The remaining bytes are for padding which is not  
actually allocated

>> >> >> +   in the buffer */
>> >> >> +
>> >> >> +rem = MIN(etsec->regs[MRBLR].value - bd->length,  
etsec->rx_padding);

>> >> >> +
>> >> >> +if (rem > 0) {
>> >> >> +memset(padd, 0x0, sizeof(padd));
>> >> >> +etsec->rx_padding -= rem;
>> >> >> +*size -= rem;
>> >> >> +bd->length+= rem;
>> >> >> +cpu_physical_memory_write(bufptr, padd, rem);
>> >> >> +}
>> >> >> +}
>> >> >
>> >> > What if *size > 0 && *size < etsec->rx_padding?
>> >>
>> >> I don't think it's possible...
>> >
>> > Maybe throw in an assertion, then?
>> >
>> > I can see how it might not be possible if rx_padding is being  
used for padding a short frame, since MRBLR must be a multiple of 64,  
but what if it's 4 bytes for CRC?

>> >
>>
>> Can you explain a possible error scenario?
>
> 126 byte packet, no fcb.  rx_padding is 4 for CRC.  Suppose MRBLR  
is 128.  Wouldn't *size be 2 here?

>

Yes, at the end of the function, but then rx_padding is 2 as well.


How is rx_padding 2?  rx_init_frame() will set it to 4 (since the  
packet size is not less than 60).  The only other place I see that  
modifies rx_padding is "etsec->rx_padding -= rem", but that doesn't  
happen because the "*size == etsec->rx_padding" check happens first.



value of "to_write" will be 126:

*size = etsec->rx_remaining_data + etsec->rx_padding;
  = 126 + 4;
  = 130;

to_write = MIN(etsec->rx_fcb_size + *size - etsec->rx_padding,  
etsec->regs[MRBLR].value);

 = MIN(0 + 130 - 4, 128);
 = MIN(126, 128);
 = 126;

So we write the packet in the first part of the BD, and there's 2  
bytes

left in the BD.

*size -= to_write;
   = 4;
bd->length = to_write;
   = 126;

So *size == etsec->rx_padding (This is expected as the first write
operation can only write data and no padding, I will comment this  
fact)


rem = MIN(etsec->regs[MRBLR].value - bd->length, etsec->rx_padding);
= MIN(128 - 126, 4);
= MIN(2, 4);
= 2;

We write 2 bytes of padding.

etsec->rx_padding -= rem;
   = 2;
*size -= rem;
   = 2;
bd->length+= rem;
   = 128;

The BD is full, we will have to put the rest of padding in the next  
one.


What rest of padding?  I thought you said rx_padding was 2 somehow?  If  
that were true, then it would be zero at the end.


>> > Could you at least have a way to diagnose when the guest OS  
tries to
>> > use some functionality that you don't support, rather than  
silently

>> > doing the wrong thing?
>> >
>>
>> This device is so complex, detecting unsupported features would  
take too

>> much work.
>
> I was thinking along the lines of marking registers and bits within  
registers as supported (or which are properly no-ops in QEMU) -- and  
warning the first time you see a non-default-value write to an  
unsupported field or register.  It could save a lot of debugging.

>

I think we'll spend more time implementing this than debugging.  
Another

solution is to enable debug output and see which registers are read or
write.


I don't think it'd be that hard -- you already have an array of  
registers.  The information could easily go in there, and be checked by  
common infrastructure.


-Scott



[Qemu-devel] [PATCH v2 09/11] pseries: savevm support for PCI host bridge

2013-07-18 Thread Anthony Liguori
From: David Gibson 

This adds the necessary support for saving the state of the PAPR virtual
PCI host bridge (or host bridges).

Signed-off-by: David Gibson 
Reviewed-by: Anthony Liguori 
---
 hw/ppc/spapr_pci.c  | 49 +
 include/hw/pci-host/spapr.h |  6 +++---
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index dd9e74e..560d375c 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -699,6 +699,54 @@ static Property spapr_phb_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_pci_lsi = {
+.name = "spapr_pci/lsi",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT32_EQUAL(irq, struct spapr_pci_lsi),
+
+VMSTATE_END_OF_LIST()
+},
+};
+
+static const VMStateDescription vmstate_spapr_pci_msi = {
+.name = "spapr_pci/lsi",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT32(config_addr, struct spapr_pci_msi),
+VMSTATE_UINT32(irq, struct spapr_pci_msi),
+VMSTATE_UINT32(nvec, struct spapr_pci_msi),
+
+VMSTATE_END_OF_LIST()
+},
+};
+
+static const VMStateDescription vmstate_spapr_pci = {
+.name = "spapr_pci",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT64_EQUAL(buid, sPAPRPHBState),
+VMSTATE_UINT32_EQUAL(dma_liobn, sPAPRPHBState),
+VMSTATE_UINT64_EQUAL(mem_win_addr, sPAPRPHBState),
+VMSTATE_UINT64_EQUAL(mem_win_size, sPAPRPHBState),
+VMSTATE_UINT64_EQUAL(io_win_addr, sPAPRPHBState),
+VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
+VMSTATE_UINT64_EQUAL(msi_win_addr, sPAPRPHBState),
+VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
+ vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
+VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
+ vmstate_spapr_pci_msi, struct spapr_pci_msi),
+
+VMSTATE_END_OF_LIST()
+},
+};
+
 static const char *spapr_phb_root_bus_path(PCIHostState *host_bridge,
PCIBus *rootbus)
 {
@@ -717,6 +765,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void 
*data)
 sdc->init = spapr_phb_init;
 dc->props = spapr_phb_properties;
 dc->reset = spapr_phb_reset;
+dc->vmsd = &vmstate_spapr_pci;
 }
 
 static const TypeInfo spapr_phb_info = {
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 1e23dbf..93f9511 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -52,14 +52,14 @@ typedef struct sPAPRPHBState {
 sPAPRTCETable *tcet;
 AddressSpace iommu_as;
 
-struct {
+struct spapr_pci_lsi {
 uint32_t irq;
 } lsi_table[PCI_NUM_PINS];
 
-struct {
+struct spapr_pci_msi {
 uint32_t config_addr;
 uint32_t irq;
-int nvec;
+uint32_t nvec;
 } msi_table[SPAPR_MSIX_MAX_DEVS];
 
 QLIST_ENTRY(sPAPRPHBState) list;
-- 
1.8.0




[Qemu-devel] [PATCH v2 04/11] pseries: savevm support for PAPR VIO logical tty

2013-07-18 Thread Anthony Liguori
From: David Gibson 

This patch adds the necessary VMStateDescription information to support
savevm/loadvm for the spapr_tty (PAPR logical serial) device.

Signed-off-by: David Gibson 
Reviewed-by: Anthony Liguori 
---
 hw/char/spapr_vty.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c
index 2993848..a799721 100644
--- a/hw/char/spapr_vty.c
+++ b/hw/char/spapr_vty.c
@@ -142,6 +142,21 @@ static Property spapr_vty_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_vty = {
+.name = "spapr_vty",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVTYDevice),
+
+VMSTATE_UINT32(in, VIOsPAPRVTYDevice),
+VMSTATE_UINT32(out, VIOsPAPRVTYDevice),
+VMSTATE_BUFFER(buf, VIOsPAPRVTYDevice),
+VMSTATE_END_OF_LIST()
+},
+};
+
 static void spapr_vty_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -152,6 +167,7 @@ static void spapr_vty_class_init(ObjectClass *klass, void 
*data)
 k->dt_type = "serial";
 k->dt_compatible = "hvterm1";
 dc->props = spapr_vty_properties;
+dc->vmsd = &vmstate_spapr_vty;
 }
 
 static const TypeInfo spapr_vty_info = {
-- 
1.8.0




Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6

2013-07-18 Thread Paul Moore
On Thursday, July 18, 2013 08:48:10 PM Peter Maydell wrote:
> On 18 July 2013 20:39, Paul Moore  wrote:
> > On the plus side, I think libseccomp is very close to being pretty much
> > feature complete (excluding new architectures that may pop up, at present
> > we are only x86, x86_64, x32, and ARM)
> 
> ...AArch64 ? :-)
> 

Not yet, just 32-bit ARM EABI.

If you've got a working system and are willing to so some hacking or run some 
tests we could work on it for a future libseccomp release.  An emulated 
AArch64 VM would also work, but that route can be slow/annoying.

-- 
paul moore
security and virtualization @ redhat




Re: [Qemu-devel] [PATCH 12/17] qemu-img: add a "map" subcommand

2013-07-18 Thread Eric Blake
On 07/03/2013 08:34 AM, Paolo Bonzini wrote:
> This command dumps the metadata of an entire chain, in either tabular or JSON
> format.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  qemu-img-cmds.hx |   6 ++
>  qemu-img.c   | 186 
> +++
>  2 files changed, 192 insertions(+)
> 

> +case OFORMAT_JSON:
> +printf("%s{ 'depth': %d, 'start': %lld, 'length': %lld, "
> +   "'zero': %s, 'data': %s",
> +   (e->start == 0 ? "[" : ",\n"),
> +   e->depth, (long long) e->start, (long long) e->length,
> +   (e->flags & BDRV_BLOCK_ZERO) ? "true" : "false",
> +   (e->flags & BDRV_BLOCK_DATA) ? "true" : "false");
> +if (e->flags & BDRV_BLOCK_OFFSET_VALID) {
> +printf(", 'offset': %lld", (long long) e->offset);
> +}
> + putchar('}');

Can we please get this format documented in qapi-schema.json, even if we
aren't using qapi to generate it yet?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Eric Blake
On 07/18/2013 01:13 PM, Ian Main wrote:
> On Thu, Jul 18, 2013 at 12:56:51PM -0600, Eric Blake wrote:
>> On 07/18/2013 12:47 PM, Ian Main wrote:
>>> qcow2 supports backing files so it makes sense to default to qcow2
>>> for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing
>>> drive and export it via nbd.  Defaulting FULL and TOP to SYNC_MODE_NONE
>>> breaks tests but that could be fixed if we wanted it.
>>>
>>> Signed-off-by: Ian Main 
>>> ---
>>>  blockdev.c   | 5 -
>>>  qapi-schema.json | 1 +
>>>  2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> Looks okay, but let's answer the meta-question first of whether we
>> should just make 'format' mandatory and be done with it.
>>
>> Also, I've noticed you aren't cc'ing many people; that can slow down
>> reviews.  http://wiki.qemu.org/Contribute/SubmitAPatch gives hints on
>> how to determine the right people to send your patches to, by
>> deciphering MAINTAINERS and running ./scripts/getmaintainer.pl.
> 
> Ah ok, I'll add them next rev.
> 
> My take has been to just do a patch that implements the suggestion and
> see what people think :).

But this list is so high volume that the people that matter won't check
your email unless they are cc'd :)  If you want opinions fast, it pays
to follow the directions.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH V4 1/4] Implement sync modes for drive-backup.

2013-07-18 Thread Eric Blake
On 07/18/2013 01:06 PM, Ian Main wrote:
> On Thu, Jul 18, 2013 at 11:19:43AM -0600, Eric Blake wrote:
>> On 07/17/2013 02:04 PM, Ian Main wrote:
>>> This patch adds sync-modes to the drive-backup interface and
>>> implements the FULL, NONE and TOP modes of synchronization.
>>>

>>> @@ -1807,6 +1807,10 @@
>>>  # @format: #optional the format of the new destination, default is to
>>>  #  probe if @mode is 'existing', else the format of the source
>>>  #
>>> +# @sync: what parts of the disk image should be copied to the destination
>>> +#(all the disk, only the sectors allocated in the topmost image, or
>>> +#only new I/O).
>>> +#
>>>  # @mode: #optional whether and how QEMU should create a new image, default 
>>> is
>>>  #'absolute-paths'.
>>
>> This hunk looks bogus; the 'DriveBackup' type already documents @sync as
>> of commit b53169e, which makes me suspect this hunk got misapplied
>> during rebasing.
> 
> Did you check that?  I know there was one bit of documentation missing
> that I fixed here.  I also just did a clean rebase (git am) to
> kevin/block and it all applied fine.

Yes, it _applied_ fine, because of git's insistence on applying hunks
regardless of line fuzzing.  It probably applied to 'drive-mirror',
since I see this context in the section for 'drive-mirror' when reading
the unpatched file at current qemu.git head:

> # @format: #optional the format of the new destination, default is to
> #  probe if @mode is 'existing', else the format of the source
> #
> # @mode: #optional whether and how QEMU should create a new image, default is
> #'absolute-paths'.
> #

Hence, my argument that you DON'T want this hunk.

>>>  Example:
>>>  -> { "execute": "drive-backup", "arguments": { "device": "drive0",
>>> +   "sync": "full",
>>> "target": "backup.img" } }
>>
>> Ouch - commit b53169e made 'sync' mandatory, but forgot to update the
>> example to match; perhaps this hunk ought to be applied independently?
> 
> That's up to you guys, I can split it out if needed.

I don't care either way as long as it gets fixed before 1.6; it all
depends on how long the review of the rest of your series takes.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATH 0/4] po/Makefile: Fix regression and some minor issues

2013-07-18 Thread Stefan Weil
Am 16.07.2013 07:16, schrieb Stefan Weil:
> Am 05.07.2013 22:55, schrieb Stefan Weil:
>> These patches are included:
>>
>> [PATCH 1/4] po/Makefile: Fix and improve help message
>> [PATCH 2/4] po/Makefile: Fix *.mo generation for out-of-tree builds
>> [PATCH 3/4] po/Makefile: Fix generation of messages.po
>> [PATCH 4/4] po/Makefile: Use macro quiet-command for nice looking
>>
>> Regards
>> Stefan W.
> Ping?

Ping^2




Re: [Qemu-devel] [PATCH] PPC: dbdma: macio: Fix format specifiers (build regression)

2013-07-18 Thread Stefan Weil
Am 16.07.2013 07:54, schrieb Stefan Weil:
> Am 12.07.2013 18:48, schrieb Stefan Weil:
>> Fix a number of warnings for 32 bit builds (tested on MingW and Linux):
>>
>>   CChw/ide/macio.o
>> qemu/hw/ide/macio.c: In function 'pmac_ide_atapi_transfer_cb':
>> qemu/hw/ide/macio.c:134:9: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format]
>> qemu/hw/ide/macio.c: In function 'pmac_ide_transfer_cb':
>> qemu/hw/ide/macio.c:215:5: error: format '%ld' expects argument of type 
>> 'long int', but argument 5 has type 'int64_t' [-Werror=format]
>> qemu/hw/ide/macio.c:222:9: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format]
>> qemu/hw/ide/macio.c:264:9: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 3 has type 'hwaddr' [-Werror=format]
>> cc1: all warnings being treated as errors
>> make: *** [hw/ide/macio.o] Error 1
>>
>> Signed-off-by: Stefan Weil 
>> ---
>>
>>
>>
>> Hi Anthony,
>>
>> the patch fixes a build regression which was introduced today.
>> Could you please apply it without waiting for the next pull requests?
>>
>> Thanks,
>> Stefan
>>
>>
>>
>>  hw/ide/macio.c |9 +
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/ide/macio.c b/hw/ide/macio.c
>> index 38ad924..ef4ba2b 100644
>> --- a/hw/ide/macio.c
>> +++ b/hw/ide/macio.c
>> @@ -131,7 +131,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int 
>> ret)
>>  int sector_num = (s->lba << 2) + (s->io_buffer_index >> 9);
>>  int nsector = io->len >> 9;
>>  
>> -MACIO_DPRINTF("precopying unaligned %d bytes to %#lx\n",
>> +MACIO_DPRINTF("precopying unaligned %d bytes to %#" HWADDR_PRIx 
>> "\n",
>>unaligned, io->addr + io->len - unaligned);
>>  
>>  bdrv_read(s->bs, sector_num + nsector, io->remainder, 1);
>> @@ -212,14 +212,15 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
>>  s->nsector -= n;
>>  }
>>  
>> -MACIO_DPRINTF("remainder: %d io->len: %d nsector: %d sector_num: %ld\n",
>> +MACIO_DPRINTF("remainder: %d io->len: %d nsector: %d "
>> +  "sector_num: %" PRId64 "\n",
>>io->remainder_len, io->len, s->nsector, sector_num);
>>  if (io->remainder_len && io->len) {
>>  /* guest wants the rest of its previous transfer */
>>  int remainder_len = MIN(io->remainder_len, io->len);
>>  uint8_t *p = &io->remainder[0x200 - remainder_len];
>>  
>> -MACIO_DPRINTF("copying remainder %d bytes at %#lx\n",
>> +MACIO_DPRINTF("copying remainder %d bytes at %#" HWADDR_PRIx "\n",
>>remainder_len, io->addr);
>>  
>>  switch (s->dma_cmd) {
>> @@ -261,7 +262,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
>>  if (unaligned) {
>>  int nsector = io->len >> 9;
>>  
>> -MACIO_DPRINTF("precopying unaligned %d bytes to %#lx\n",
>> +MACIO_DPRINTF("precopying unaligned %d bytes to %#" HWADDR_PRIx 
>> "\n",
>>unaligned, io->addr + io->len - unaligned);
>>  
>>  switch (s->dma_cmd) {
>
> Ping?


Ping²?





Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6

2013-07-18 Thread Peter Maydell
On 18 July 2013 20:39, Paul Moore  wrote:
> On the plus side, I think libseccomp is very close to being pretty much
> feature complete (excluding new architectures that may pop up, at present we
> are only x86, x86_64, x32, and ARM)

...AArch64 ? :-)

-- PMM



[Qemu-devel] [PATCH v2 11/11] xics: rename types to be sane and follow coding style

2013-07-18 Thread Anthony Liguori
>From Ben:

Basically, in HW the layout of the interrupt network is:

 - One ICP per processor thread (the "presenter"). This contains the
registers to fetch a pending interrupt (ack), EOI, and control the
processor priority.

 - One ICS per logical source of interrupts (ie, one per PCI host
bridge, and a few others here or there). This contains the per-interrupt
source configuration (target processor(s), priority, mask) and the
per-interrupt internal state.

Under PAPR, there is a single "virtual" ICS ... somewhat (it's a bit
oddball what pHyp does here, arguably there are two but we can ignore
that distinction). There is no register level access. A pair of firmware
(RTAS) calls is used to configure each virtual interrupt.

So our model here is somewhat the same. We have one ICS in the emulated
XICS which arguably *is* the emulated XICS, there's no point making it a
separate "device", that would just be gross, and each VCPU has an
associated ICP.

Yet we call the "XICS" struct icp_state and then the ICPs
'struct icp_server_state'.  It's particularly confusing when all of the
functions have xics_prefixes yet take *icp arguments.

Rename:

  struct icp_state -> XICSState
  struct icp_server_state -> ICPState
  struct ics_state -> ICSState
  struct ics_irq_state -> ICSIRQState

Signed-off-by: David Gibson 
[aik: added ics_resend() on post_load]
Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Anthony Liguori 
---
 hw/intc/xics.c | 345 +
 hw/ppc/spapr.c |  28 
 include/hw/ppc/spapr.h |   3 +-
 include/hw/ppc/xics.h  |  74 ++-
 4 files changed, 330 insertions(+), 120 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 091912e..6b3c071 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -34,34 +34,19 @@
  * ICP: Presentation layer
  */
 
-struct icp_server_state {
-uint32_t xirr;
-uint8_t pending_priority;
-uint8_t mfrr;
-qemu_irq output;
-};
-
 #define XISR_MASK  0x00ff
 #define CPPR_MASK  0xff00
 
 #define XISR(ss)   (((ss)->xirr) & XISR_MASK)
 #define CPPR(ss)   (((ss)->xirr) >> 24)
 
-struct ics_state;
-
-struct icp_state {
-long nr_servers;
-struct icp_server_state *ss;
-struct ics_state *ics;
-};
-
-static void ics_reject(struct ics_state *ics, int nr);
-static void ics_resend(struct ics_state *ics);
-static void ics_eoi(struct ics_state *ics, int nr);
+static void ics_reject(ICSState *ics, int nr);
+static void ics_resend(ICSState *ics);
+static void ics_eoi(ICSState *ics, int nr);
 
-static void icp_check_ipi(struct icp_state *icp, int server)
+static void icp_check_ipi(XICSState *icp, int server)
 {
-struct icp_server_state *ss = icp->ss + server;
+ICPState *ss = icp->ss + server;
 
 if (XISR(ss) && (ss->pending_priority <= ss->mfrr)) {
 return;
@@ -78,9 +63,9 @@ static void icp_check_ipi(struct icp_state *icp, int server)
 qemu_irq_raise(ss->output);
 }
 
-static void icp_resend(struct icp_state *icp, int server)
+static void icp_resend(XICSState *icp, int server)
 {
-struct icp_server_state *ss = icp->ss + server;
+ICPState *ss = icp->ss + server;
 
 if (ss->mfrr < CPPR(ss)) {
 icp_check_ipi(icp, server);
@@ -88,9 +73,9 @@ static void icp_resend(struct icp_state *icp, int server)
 ics_resend(icp->ics);
 }
 
-static void icp_set_cppr(struct icp_state *icp, int server, uint8_t cppr)
+static void icp_set_cppr(XICSState *icp, int server, uint8_t cppr)
 {
-struct icp_server_state *ss = icp->ss + server;
+ICPState *ss = icp->ss + server;
 uint8_t old_cppr;
 uint32_t old_xisr;
 
@@ -112,9 +97,9 @@ static void icp_set_cppr(struct icp_state *icp, int server, 
uint8_t cppr)
 }
 }
 
-static void icp_set_mfrr(struct icp_state *icp, int server, uint8_t mfrr)
+static void icp_set_mfrr(XICSState *icp, int server, uint8_t mfrr)
 {
-struct icp_server_state *ss = icp->ss + server;
+ICPState *ss = icp->ss + server;
 
 ss->mfrr = mfrr;
 if (mfrr < CPPR(ss)) {
@@ -122,7 +107,7 @@ static void icp_set_mfrr(struct icp_state *icp, int server, 
uint8_t mfrr)
 }
 }
 
-static uint32_t icp_accept(struct icp_server_state *ss)
+static uint32_t icp_accept(ICPState *ss)
 {
 uint32_t xirr = ss->xirr;
 
@@ -135,9 +120,9 @@ static uint32_t icp_accept(struct icp_server_state *ss)
 return xirr;
 }
 
-static void icp_eoi(struct icp_state *icp, int server, uint32_t xirr)
+static void icp_eoi(XICSState *icp, int server, uint32_t xirr)
 {
-struct icp_server_state *ss = icp->ss + server;
+ICPState *ss = icp->ss + server;
 
 /* Send EOI -> ICS */
 ss->xirr = (ss->xirr & ~CPPR_MASK) | (xirr & CPPR_MASK);
@@ -148,9 +133,9 @@ static void icp_eoi(struct icp_state *icp, int server, 
uint32_t xirr)
 }
 }
 
-static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t 
priority)
+static void icp_irq(XICSState *icp, int server, int nr, uin

[Qemu-devel] [PATCH v2 10/11] pseries: savevm support with KVM

2013-07-18 Thread Anthony Liguori
From: Alexey Kardashevskiy 

At present, the savevm / migration support for the pseries machine will not
work when KVM is enabled.  That's because KVM manages the guest's hash page
table in the host kernel, so qemu has no visibility of it.  This patch
fixes this by using new kernel interfaces to extract and reinsert the
guest's hash table during the migration process.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 106 +++--
 include/hw/ppc/spapr.h |   1 +
 target-ppc/kvm.c   |  69 
 target-ppc/kvm_ppc.h   |  22 ++
 4 files changed, 176 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 734a782..7144829 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -735,17 +735,27 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
 {
 sPAPREnvironment *spapr = opaque;
 
-spapr->htab_save_index = 0;
-spapr->htab_first_pass = true;
-
 /* "Iteration" header */
 qemu_put_be32(f, spapr->htab_shift);
 
+if (spapr->htab) {
+spapr->htab_save_index = 0;
+spapr->htab_first_pass = true;
+} else {
+assert(kvm_enabled());
+
+spapr->htab_fd = kvmppc_get_htab_fd(false);
+if (spapr->htab_fd < 0) {
+fprintf(stderr, "Unable to open fd for reading hash table from 
KVM: %s\n",
+strerror(errno));
+return -1;
+}
+}
+
+
 return 0;
 }
 
-#define MAX_ITERATION_NS500 /* 5 ms */
-
 static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
  int64_t max_ns)
 {
@@ -796,8 +806,8 @@ static void htab_save_first_pass(QEMUFile *f, 
sPAPREnvironment *spapr,
 spapr->htab_save_index = index;
 }
 
-static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
- int64_t max_ns)
+static int htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
+int64_t max_ns)
 {
 bool final = max_ns < 0;
 int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
@@ -870,21 +880,32 @@ static bool htab_save_later_pass(QEMUFile *f, 
sPAPREnvironment *spapr,
 
 spapr->htab_save_index = index;
 
-return (examined >= htabslots) && (sent == 0);
+return (examined >= htabslots) && (sent == 0) ? 1 : 0;
 }
 
+#define MAX_ITERATION_NS500 /* 5 ms */
+#define MAX_KVM_BUF_SIZE2048
+
 static int htab_save_iterate(QEMUFile *f, void *opaque)
 {
 sPAPREnvironment *spapr = opaque;
-bool nothingleft = false;;
+int rc = 0;
 
 /* Iteration header */
 qemu_put_be32(f, 0);
 
-if (spapr->htab_first_pass) {
+if (!spapr->htab) {
+assert(kvm_enabled());
+
+rc = kvmppc_save_htab(f, spapr->htab_fd,
+  MAX_KVM_BUF_SIZE, MAX_ITERATION_NS);
+if (rc < 0) {
+return rc;
+}
+} else  if (spapr->htab_first_pass) {
 htab_save_first_pass(f, spapr, MAX_ITERATION_NS);
 } else {
-nothingleft = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
+rc = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
 }
 
 /* End marker */
@@ -892,7 +913,7 @@ static int htab_save_iterate(QEMUFile *f, void *opaque)
 qemu_put_be16(f, 0);
 qemu_put_be16(f, 0);
 
-return nothingleft ? 1 : 0;
+return rc;
 }
 
 static int htab_save_complete(QEMUFile *f, void *opaque)
@@ -902,7 +923,20 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
 /* Iteration header */
 qemu_put_be32(f, 0);
 
-htab_save_later_pass(f, spapr, -1);
+if (!spapr->htab) {
+int rc;
+
+assert(kvm_enabled());
+
+rc = kvmppc_save_htab(f, spapr->htab_fd, MAX_KVM_BUF_SIZE, -1);
+if (rc < 0) {
+return rc;
+}
+close(spapr->htab_fd);
+spapr->htab_fd = -1;
+} else {
+htab_save_later_pass(f, spapr, -1);
+}
 
 /* End marker */
 qemu_put_be32(f, 0);
@@ -916,6 +950,7 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 {
 sPAPREnvironment *spapr = opaque;
 uint32_t section_hdr;
+int fd = -1;
 
 if (version_id < 1 || version_id > 1) {
 fprintf(stderr, "htab_load() bad version\n");
@@ -932,6 +967,16 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
+if (!spapr->htab) {
+assert(kvm_enabled());
+
+fd = kvmppc_get_htab_fd(true);
+if (fd < 0) {
+fprintf(stderr, "Unable to open fd to restore KVM hash table: 
%s\n",
+strerror(errno));
+}
+}
+
 while (true) {
 uint32_t index;
 uint16_t n_valid, n_invalid;
@@ -945,24 +990,41 @@ static int htab_load(QEMUFile *f, void *opaque, int 
version_id)
 break;
 }
 
-if ((index + n_valid + n_invalid) >=
+if ((index + n_valid + n_invalid) >
 (HTAB_SIZE

[Qemu-devel] [PATCH v2 01/11] target-ppc: Convert ppc cpu savevm to VMStateDescription

2013-07-18 Thread Anthony Liguori
From: Alexey Kardashevskiy 

The savevm code for the powerpc cpu emulation is currently based around
the old register_savevm() rather than register_vmstate() method.  It's also
rather broken, missing some important state on some CPU models.

This patch completely rewrites the savevm for target-ppc, using the new
VMStateDescription approach.  Exactly what needs to be saved in what
configurations has been more carefully examined, too.  This introduces a
new version (5) of the cpu save format.  The old load function is retained
to support version 4 images.

Signed-off-by: David Gibson 
[aik: ppc cpu savevm convertion fixed to use PowerPCCPU instead of CPUPPCState]
Signed-off-by: Alexey Kardashevskiy 
---
 target-ppc/cpu-qom.h|   4 +
 target-ppc/cpu.h|   8 +-
 target-ppc/machine.c| 531 
 target-ppc/translate_init.c |   2 +
 4 files changed, 452 insertions(+), 93 deletions(-)

diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
index 7132599..c660e3c 100644
--- a/target-ppc/cpu-qom.h
+++ b/target-ppc/cpu-qom.h
@@ -106,4 +106,8 @@ void ppc_cpu_dump_state(CPUState *cpu, FILE *f, 
fprintf_function cpu_fprintf,
 void ppc_cpu_dump_statistics(CPUState *cpu, FILE *f,
  fprintf_function cpu_fprintf, int flags);
 
+#ifndef CONFIG_USER_ONLY
+extern const struct VMStateDescription vmstate_ppc_cpu;
+#endif
+
 #endif
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 7a7b1bf..454ea13 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -948,7 +948,7 @@ struct CPUPPCState {
 #if defined(TARGET_PPC64)
 /* PowerPC 64 SLB area */
 ppc_slb_t slb[64];
-int slb_nr;
+int32_t slb_nr;
 #endif
 /* segment registers */
 hwaddr htab_base;
@@ -957,11 +957,11 @@ struct CPUPPCState {
 /* externally stored hash table */
 uint8_t *external_htab;
 /* BATs */
-int nb_BATs;
+uint32_t nb_BATs;
 target_ulong DBAT[2][8];
 target_ulong IBAT[2][8];
 /* PowerPC TLB registers (for 4xx, e500 and 60x software driven TLBs) */
-int nb_tlb;  /* Total number of TLB  */
+int32_t nb_tlb;  /* Total number of TLB  */
 int tlb_per_way; /* Speed-up helper: used to avoid divisions at run time */
 int nb_ways; /* Number of ways in the TLB set*/
 int last_way;/* Last used way used to allocate TLB in a LRU way  */
@@ -1176,8 +1176,6 @@ static inline CPUPPCState *cpu_init(const char *cpu_model)
 #define cpu_signal_handler cpu_ppc_signal_handler
 #define cpu_list ppc_cpu_list
 
-#define CPU_SAVE_VERSION 4
-
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _user
 #define MMU_MODE1_SUFFIX _kernel
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 2d10adb..12e1512 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -1,96 +1,12 @@
 #include "hw/hw.h"
 #include "hw/boards.h"
 #include "sysemu/kvm.h"
+#include "helper_regs.h"
 
-void cpu_save(QEMUFile *f, void *opaque)
+static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
 {
-CPUPPCState *env = (CPUPPCState *)opaque;
-unsigned int i, j;
-uint32_t fpscr;
-target_ulong xer;
-
-for (i = 0; i < 32; i++)
-qemu_put_betls(f, &env->gpr[i]);
-#if !defined(TARGET_PPC64)
-for (i = 0; i < 32; i++)
-qemu_put_betls(f, &env->gprh[i]);
-#endif
-qemu_put_betls(f, &env->lr);
-qemu_put_betls(f, &env->ctr);
-for (i = 0; i < 8; i++)
-qemu_put_be32s(f, &env->crf[i]);
-xer = cpu_read_xer(env);
-qemu_put_betls(f, &xer);
-qemu_put_betls(f, &env->reserve_addr);
-qemu_put_betls(f, &env->msr);
-for (i = 0; i < 4; i++)
-qemu_put_betls(f, &env->tgpr[i]);
-for (i = 0; i < 32; i++) {
-union {
-float64 d;
-uint64_t l;
-} u;
-u.d = env->fpr[i];
-qemu_put_be64(f, u.l);
-}
-fpscr = env->fpscr;
-qemu_put_be32s(f, &fpscr);
-qemu_put_sbe32s(f, &env->access_type);
-#if defined(TARGET_PPC64)
-qemu_put_betls(f, &env->spr[SPR_ASR]);
-qemu_put_sbe32s(f, &env->slb_nr);
-#endif
-qemu_put_betls(f, &env->spr[SPR_SDR1]);
-for (i = 0; i < 32; i++)
-qemu_put_betls(f, &env->sr[i]);
-for (i = 0; i < 2; i++)
-for (j = 0; j < 8; j++)
-qemu_put_betls(f, &env->DBAT[i][j]);
-for (i = 0; i < 2; i++)
-for (j = 0; j < 8; j++)
-qemu_put_betls(f, &env->IBAT[i][j]);
-qemu_put_sbe32s(f, &env->nb_tlb);
-qemu_put_sbe32s(f, &env->tlb_per_way);
-qemu_put_sbe32s(f, &env->nb_ways);
-qemu_put_sbe32s(f, &env->last_way);
-qemu_put_sbe32s(f, &env->id_tlbs);
-qemu_put_sbe32s(f, &env->nb_pids);
-if (env->tlb.tlb6) {
-// XXX assumes 6xx
-for (i = 0; i < env->nb_tlb; i++) {
-qemu_put_betls(f, &env->tlb.tlb6[i].pte0);
-qemu_put_betls(f, &env->tlb.tlb6[i].pte1);
-q

[Qemu-devel] [PATCH v2 07/11] pseries: savevm support for PAPR virtual SCSI

2013-07-18 Thread Anthony Liguori
From: David Gibson 

This patch adds the necessary support for saving the state of the PAPR VIO
virtual SCSI device. This also saves and restores active SCSI requests.

[aik: implemented vscsi_req save/restore]
Signed-off-by: Alexey Kardashevskiy 
Cc: David Gibson 
---
 hw/scsi/spapr_vscsi.c | 82 ++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index e2740fb..a86199b 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -578,6 +578,69 @@ static void vscsi_request_cancelled(SCSIRequest *sreq)
 vscsi_put_req(req);
 }
 
+static const VMStateDescription vmstate_spapr_vscsi_req = {
+.name = "spapr_vscsi_req",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_BUFFER(crq.raw, vscsi_req),
+VMSTATE_BUFFER(iu.srp.reserved, vscsi_req),
+VMSTATE_UINT32(qtag, vscsi_req),
+VMSTATE_BOOL(active, vscsi_req),
+VMSTATE_UINT32(data_len, vscsi_req),
+VMSTATE_BOOL(writing, vscsi_req),
+VMSTATE_UINT32(senselen, vscsi_req),
+VMSTATE_BUFFER(sense, vscsi_req),
+VMSTATE_UINT8(dma_fmt, vscsi_req),
+VMSTATE_UINT16(local_desc, vscsi_req),
+VMSTATE_UINT16(total_desc, vscsi_req),
+VMSTATE_UINT16(cdb_offset, vscsi_req),
+  /*Restart SCSI request from the beginning for now */
+  /*VMSTATE_UINT16(cur_desc_num, vscsi_req),
+VMSTATE_UINT16(cur_desc_offset, vscsi_req),*/
+VMSTATE_END_OF_LIST()
+},
+};
+
+static void vscsi_save_request(QEMUFile *f, SCSIRequest *sreq)
+{
+vscsi_req *req = sreq->hba_private;
+assert(req->active);
+
+vmstate_save_state(f, &vmstate_spapr_vscsi_req, req);
+
+dprintf("VSCSI: saving tag=%u, current desc#%d, offset=%x\n",
+req->qtag, req->cur_desc_num, req->cur_desc_offset);
+}
+
+static void *vscsi_load_request(QEMUFile *f, SCSIRequest *sreq)
+{
+SCSIBus *bus = sreq->bus;
+VSCSIState *s = VIO_SPAPR_VSCSI_DEVICE(bus->qbus.parent);
+vscsi_req *req;
+int rc;
+
+assert(sreq->tag < VSCSI_REQ_LIMIT);
+req = &s->reqs[sreq->tag];
+assert(!req->active);
+
+memset(req, 0, sizeof(*req));
+rc = vmstate_load_state(f, &vmstate_spapr_vscsi_req, req, 1);
+if (rc) {
+fprintf(stderr, "VSCSI: failed loading request tag#%u\n", sreq->tag);
+return NULL;
+}
+assert(req->active);
+
+req->sreq = scsi_req_ref(sreq);
+
+dprintf("VSCSI: restoring tag=%u, current desc#%d, offset=%x\n",
+req->qtag, req->cur_desc_num, req->cur_desc_offset);
+
+return req;
+}
+
 static void vscsi_process_login(VSCSIState *s, vscsi_req *req)
 {
 union viosrp_iu *iu = &req->iu;
@@ -932,7 +995,9 @@ static const struct SCSIBusInfo vscsi_scsi_info = {
 
 .transfer_data = vscsi_transfer_data,
 .complete = vscsi_command_complete,
-.cancel = vscsi_request_cancelled
+.cancel = vscsi_request_cancelled,
+.save_request = vscsi_save_request,
+.load_request = vscsi_load_request,
 };
 
 static void spapr_vscsi_reset(VIOsPAPRDevice *dev)
@@ -991,6 +1056,20 @@ static Property spapr_vscsi_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_vscsi = {
+.name = "spapr_vscsi",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_SPAPR_VIO(vdev, VSCSIState),
+/* VSCSI state */
+/*  */
+
+VMSTATE_END_OF_LIST()
+},
+};
+
 static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1005,6 +1084,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, 
void *data)
 k->signal_mask = 0x0001;
 dc->props = spapr_vscsi_properties;
 k->rtce_window_size = 0x1000;
+dc->vmsd = &vmstate_spapr_vscsi;
 }
 
 static const TypeInfo spapr_vscsi_info = {
-- 
1.8.0




Re: [Qemu-devel] [PATCH v2 0/2] libqtest leak fix & cleanup

2013-07-18 Thread Anthony Liguori
Applied.  Thanks.

Regards,

Anthony Liguori




Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6

2013-07-18 Thread Paul Moore
On Thursday, July 18, 2013 06:37:15 PM Paolo Bonzini wrote:
> Il 18/07/2013 18:35, Eduardo Otubo ha scritto:
> > On 07/18/2013 01:28 PM, Anthony Liguori wrote:
> >> Eduardo Otubo  writes:
> >>> Hello all,
> >> 
> >>> In this small patch series I basically:
> >> Cover letter should be marked [PATCH 0/2].  Otherwise it defeats
> >> filtering.
> >> 
> >> Would like to see a Reviewed-by from someone before applying this.
> > 
> > I'm running some tests with qemu && xen, I'll post a v3 by the end of
> > the day. I'll format the cover letter in the correct way next time.
> 
> I feel that, at some point, grep and code review must trump experiments...
> 
> Paul, how did you guys handle this in other projects?

To the best of my knowledge QEMU currently stands alone with its complexity 
and use of seccomp filtering.  There are other applications, but they are 
either of the syscall sandboxing type where the users define the filters, or 
the rigid, smaller, well defined filter type.  QEMU is both large and has a 
huge number of options which affect the syscalls used.

At some point it would be nice to develop a mechanism to do some static 
analysis on a binary and its associated libraries to come up with a worst case 
filter (worst case because you might not want all the syscalls that a library 
uses, e.g. glibc).  Unfortunately, we don't have such a tool the moment - it's 
hard enough generating correct filters with a nice architecture agnostic 
manner :)

On the plus side, I think libseccomp is very close to being pretty much 
feature complete (excluding new architectures that may pop up, at present we 
are only x86, x86_64, x32, and ARM) so I'll be able to start turning some 
effort towards better tools and patches for existing applications.

-- 
paul moore
security and virtualization @ redhat




[Qemu-devel] [PATCH v2 05/11] spapr-tce: make sPAPRTCETable a proper device

2013-07-18 Thread Anthony Liguori
Model TCE tables as a device that's hooked up as a child object to
the owner.  Besides the code cleanup, we get a few nice benefits:

1) free actually works now (it was dead code before)

2) the TCE information is visible in the device tree

3) we can expose table information as properties such that if we
   change the window_size, we can use globals to keep migration
   working.

Signed-off-by: David Gibson 
[dwg: pseries: savevm support for PAPR TCE tables]
Signed-off-by: Alexey Kardashevskiy 
[alexey: ppc kvm: fix to compile]
Signed-off-by: Anthony Liguori 
---
 hw/ppc/spapr.c |   3 -
 hw/ppc/spapr_iommu.c   | 146 -
 hw/ppc/spapr_pci.c |   2 +-
 hw/ppc/spapr_vio.c |   2 +-
 include/hw/ppc/spapr.h |  23 +---
 target-ppc/kvm.c   |   4 +-
 6 files changed, 117 insertions(+), 63 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 48ae092..e340708 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -848,9 +848,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 /* Set up EPOW events infrastructure */
 spapr_events_init(spapr);
 
-/* Set up IOMMU */
-spapr_iommu_init();
-
 /* Set up VIO bus */
 spapr->vio_bus = spapr_vio_bus_init();
 
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 89b33a5..3d4a1fc 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -36,17 +36,6 @@ enum sPAPRTCEAccess {
 SPAPR_TCE_RW = 3,
 };
 
-struct sPAPRTCETable {
-uint32_t liobn;
-uint32_t window_size;
-sPAPRTCE *table;
-bool bypass;
-int fd;
-MemoryRegion iommu;
-QLIST_ENTRY(sPAPRTCETable) list;
-};
-
-
 QLIST_HEAD(spapr_tce_tables, sPAPRTCETable) spapr_tce_tables;
 
 static sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
@@ -96,7 +85,7 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion 
*iommu, hwaddr addr)
 return (IOMMUTLBEntry) { .perm = IOMMU_NONE };
 }
 
-tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT].tce;
+tce = tcet->table[addr >> SPAPR_TCE_PAGE_SHIFT];
 
 #ifdef DEBUG_TCE
 fprintf(stderr, " ->  *paddr=0x%llx, *len=0x%llx\n",
@@ -112,55 +101,97 @@ static IOMMUTLBEntry 
spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
 };
 }
 
+static int spapr_tce_table_pre_load(void *opaque)
+{
+sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
+
+tcet->nb_table = tcet->window_size >> SPAPR_TCE_PAGE_SHIFT;
+
+return 0;
+}
+
+static const VMStateDescription vmstate_spapr_tce_table = {
+.name = "spapr_iommu",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.pre_load = spapr_tce_table_pre_load,
+.fields  = (VMStateField []) {
+/* Sanity check */
+VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
+VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
+
+/* IOMMU state */
+VMSTATE_BOOL(bypass, sPAPRTCETable),
+VMSTATE_VARRAY_UINT32(table, sPAPRTCETable, nb_table, 0, 
vmstate_info_uint64, uint64_t),
+
+VMSTATE_END_OF_LIST()
+},
+};
+
 static MemoryRegionIOMMUOps spapr_iommu_ops = {
 .translate = spapr_tce_translate_iommu,
 };
 
-sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t 
window_size)
+static int spapr_tce_table_realize(DeviceState *dev)
 {
-sPAPRTCETable *tcet;
-
-if (spapr_tce_find_by_liobn(liobn)) {
-fprintf(stderr, "Attempted to create TCE table with duplicate"
-" LIOBN 0x%x\n", liobn);
-return NULL;
-}
-
-if (!window_size) {
-return NULL;
-}
-
-tcet = g_malloc0(sizeof(*tcet));
-tcet->liobn = liobn;
-tcet->window_size = window_size;
+sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
 
 if (kvm_enabled()) {
-tcet->table = kvmppc_create_spapr_tce(liobn,
-  window_size,
+tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
+  tcet->window_size,
   &tcet->fd);
 }
 
 if (!tcet->table) {
-size_t table_size = (window_size >> SPAPR_TCE_PAGE_SHIFT)
-* sizeof(sPAPRTCE);
+size_t table_size = (tcet->window_size >> SPAPR_TCE_PAGE_SHIFT)
+* sizeof(uint64_t);
 tcet->table = g_malloc0(table_size);
 }
+tcet->nb_table = tcet->window_size >> SPAPR_TCE_PAGE_SHIFT;
 
 #ifdef DEBUG_TCE
 fprintf(stderr, "spapr_iommu: New TCE table @ %p, liobn=0x%x, "
 "table @ %p, fd=%d\n", tcet, liobn, tcet->table, tcet->fd);
 #endif
 
-memory_region_init_iommu(&tcet->iommu, OBJECT(owner), &spapr_iommu_ops,
+memory_region_init_iommu(&tcet->iommu, OBJECT(dev), &spapr_iommu_ops,
  "iommu-spapr", UINT64_MAX);
 
 QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
 
+return 0;
+}
+
+sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t 
window_size)

[Qemu-devel] [PATCH v2 00/11] pseries: migration and QOM support

2013-07-18 Thread Anthony Liguori
This series is based on Alexey's series:

  spapr: migration, pci, msi, power8

Which in turn was based on work by David Gibson.

I've removed the bits not related to migration and made the
following changes:

 1) QOMify TCE tables and XICS

 2) Do everything in terms of VMStateDescriptions

 3) Fix endianness problem with TCE table translation
a) Drop the VMSTATE_DIVIDE thing in the process

I've tested this with a TCG pseries guest on an x86_64 host.

Since v1, I've incorporated some fixes that Alexey posted
upon testing with KVM.

Regards,

Anthony Liguori




Re: [Qemu-devel] [PATCH] ioport: remove LITTLE_ENDIAN mark for portio

2013-07-18 Thread Anthony Liguori
Applied.  Thanks.

Regards,

Anthony Liguori




Re: [Qemu-devel] [PATCH 0/2] changes related to monitor flow control

2013-07-18 Thread Anthony Liguori
Applied.  Thanks.

Regards,

Anthony Liguori




Re: [Qemu-devel] [PATCH] configure: Provide more helpful message if libvte not present

2013-07-18 Thread Anthony Liguori
Applied.  Thanks.

Regards,

Anthony Liguori




Re: [Qemu-devel] [PULL] virtio-ccw: dataplane enablement

2013-07-18 Thread Anthony Liguori
Pulled.  Thanks.

Regards,

Anthony Liguori




[Qemu-devel] [PATCH v2 03/11] pseries: savevm support for PAPR VIO logical lan

2013-07-18 Thread Anthony Liguori
From: David Gibson 

This patch adds the necessary VMStateDescription information to support
savevm/loadvm for the spapr_llan (PAPR logical lan) device.

Signed-off-by: David Gibson 
Reviewed-by: Anthony Liguori 
---
 hw/net/spapr_llan.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
index 03a09f2..46f7d5f 100644
--- a/hw/net/spapr_llan.c
+++ b/hw/net/spapr_llan.c
@@ -81,9 +81,9 @@ typedef struct VIOsPAPRVLANDevice {
 VIOsPAPRDevice sdev;
 NICConf nicconf;
 NICState *nic;
-int isopen;
+bool isopen;
 target_ulong buf_list;
-int add_buf_ptr, use_buf_ptr, rx_bufs;
+uint32_t add_buf_ptr, use_buf_ptr, rx_bufs;
 target_ulong rxq_ptr;
 } VIOsPAPRVLANDevice;
 
@@ -500,6 +500,25 @@ static Property spapr_vlan_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_llan = {
+.name = "spapr_llan",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVLANDevice),
+/* LLAN state */
+VMSTATE_BOOL(isopen, VIOsPAPRVLANDevice),
+VMSTATE_UINTTL(buf_list, VIOsPAPRVLANDevice),
+VMSTATE_UINT32(add_buf_ptr, VIOsPAPRVLANDevice),
+VMSTATE_UINT32(use_buf_ptr, VIOsPAPRVLANDevice),
+VMSTATE_UINT32(rx_bufs, VIOsPAPRVLANDevice),
+VMSTATE_UINTTL(rxq_ptr, VIOsPAPRVLANDevice),
+
+VMSTATE_END_OF_LIST()
+},
+};
+
 static void spapr_vlan_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -514,6 +533,7 @@ static void spapr_vlan_class_init(ObjectClass *klass, void 
*data)
 k->signal_mask = 0x1;
 dc->props = spapr_vlan_properties;
 k->rtce_window_size = 0x1000;
+dc->vmsd = &vmstate_spapr_llan;
 }
 
 static const TypeInfo spapr_vlan_info = {
-- 
1.8.0




[Qemu-devel] [PATCH v2 06/11] pseries: rework PAPR virtual SCSI

2013-07-18 Thread Anthony Liguori
From: Alexey Kardashevskiy 

The patch reimplements handling of indirect requests in order to
simplify upcoming live migration support.
- all pointers (except SCSIRequest*) were replaces with integer
indexes and offsets;
- DMA'ed srp_direct_buf kept untouched (ie. BE format);
- vscsi_fetch_desc() is added, now it is the only place where
descriptors are fetched and byteswapped;
- vscsi_req struct fields converted to migration-friendly types;
- many dprintf()'s fixed.

This also removed an unused field 'lun' from the spapr_vscsi device
which is assigned, but never used.  So, remove it.

[David Gibson: removed unused 'lun']
Signed-off-by: Alexey Kardashevskiy 
---
Changes:
2013/08/07:
* fixed handling of indirect requests with an additional table descriptor
---
 hw/scsi/spapr_vscsi.c | 223 +-
 1 file changed, 130 insertions(+), 93 deletions(-)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index e8978bf..e2740fb 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -75,20 +75,19 @@ typedef struct vscsi_req {
 /* SCSI request tracking */
 SCSIRequest *sreq;
 uint32_tqtag; /* qemu tag != srp tag */
-int lun;
-int active;
-longdata_len;
-int writing;
-int senselen;
+boolactive;
+uint32_tdata_len;
+boolwriting;
+uint32_tsenselen;
 uint8_t sense[SCSI_SENSE_BUF_SIZE];
 
 /* RDMA related bits */
 uint8_t dma_fmt;
-struct srp_direct_buf   ext_desc;
-struct srp_direct_buf   *cur_desc;
-struct srp_indirect_buf *ind_desc;
-int local_desc;
-int total_desc;
+uint16_tlocal_desc;
+uint16_ttotal_desc;
+uint16_tcdb_offset;
+uint16_tcur_desc_num;
+uint16_tcur_desc_offset;
 } vscsi_req;
 
 #define TYPE_VIO_SPAPR_VSCSI_DEVICE "spapr-vscsi"
@@ -264,93 +263,138 @@ static int vscsi_send_rsp(VSCSIState *s, vscsi_req *req,
 return 0;
 }
 
-static inline void vscsi_swap_desc(struct srp_direct_buf *desc)
+static inline struct srp_direct_buf vscsi_swap_desc(struct srp_direct_buf desc)
 {
-desc->va = be64_to_cpu(desc->va);
-desc->len = be32_to_cpu(desc->len);
+desc.va = be64_to_cpu(desc.va);
+desc.len = be32_to_cpu(desc.len);
+return desc;
+}
+
+static int vscsi_fetch_desc(VSCSIState *s, struct vscsi_req *req,
+unsigned n, unsigned buf_offset,
+struct srp_direct_buf *ret)
+{
+struct srp_cmd *cmd = &req->iu.srp.cmd;
+
+switch (req->dma_fmt) {
+case SRP_NO_DATA_DESC: {
+dprintf("VSCSI: no data descriptor\n");
+return 0;
+}
+case SRP_DATA_DESC_DIRECT: {
+memcpy(ret, cmd->add_data + req->cdb_offset, sizeof(*ret));
+assert(req->cur_desc_num == 0);
+dprintf("VSCSI: direct segment\n");
+break;
+}
+case SRP_DATA_DESC_INDIRECT: {
+struct srp_indirect_buf *tmp = (struct srp_indirect_buf *)
+   (cmd->add_data + req->cdb_offset);
+if (n < req->local_desc) {
+*ret = tmp->desc_list[n];
+dprintf("VSCSI: indirect segment local tag=0x%x desc#%d/%d\n",
+req->qtag, n, req->local_desc);
+
+} else if (n < req->total_desc) {
+int rc;
+struct srp_direct_buf tbl_desc = vscsi_swap_desc(tmp->table_desc);
+unsigned desc_offset = n * sizeof(struct srp_direct_buf);
+
+if (desc_offset >= tbl_desc.len) {
+dprintf("VSCSI:   #%d is ouf of range (%d bytes)\n",
+n, desc_offset);
+return -1;
+}
+rc = spapr_vio_dma_read(&s->vdev, tbl_desc.va + desc_offset,
+ret, sizeof(struct srp_direct_buf));
+if (rc) {
+dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
+rc);
+return -1;
+}
+dprintf("VSCSI: indirect segment ext. tag=0x%x desc#%d/%d { 
va=%"PRIx64" len=%x }\n",
+req->qtag, n, req->total_desc, tbl_desc.va, tbl_desc.len);
+} else {
+dprintf("VSCSI:   Out of descriptors !\n");
+return 0;
+}
+break;
+}
+default:
+fprintf(stderr, "VSCSI:   Unknown format %x\n", req->dma_fmt);
+return -1;
+}
+
+*ret = vscsi_swap_desc(*ret);
+if (buf_offset > ret->len) {
+dprintf("   offset=%x is out of a descriptor #%d boundary=%x\n",
+buf_offset, req->cur_desc_num, ret->len);
+return -1;
+}
+ret->va += buf_offset;
+re

[Qemu-devel] [PATCH v2 02/11] pseries: savevm support for VIO devices

2013-07-18 Thread Anthony Liguori
From: David Gibson 

This patch adds helpers to allow PAPR VIO devices to save state common
to all VIO devices during savevm.

Signed-off-by: David Gibson 
Reviewed-by: Anthony Liguori 
---
 hw/ppc/spapr_vio.c | 20 
 include/hw/ppc/spapr_vio.h |  5 +
 2 files changed, 25 insertions(+)

diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 7c6f6e4..75ce19f 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -542,6 +542,26 @@ static const TypeInfo spapr_vio_bridge_info = {
 .class_init= spapr_vio_bridge_class_init,
 };
 
+const VMStateDescription vmstate_spapr_vio = {
+.name = "spapr_vio",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+/* Sanity check */
+VMSTATE_UINT32_EQUAL(reg, VIOsPAPRDevice),
+VMSTATE_UINT32_EQUAL(irq, VIOsPAPRDevice),
+
+/* General VIO device state */
+VMSTATE_UINTTL(signal_state, VIOsPAPRDevice),
+VMSTATE_UINT64(crq.qladdr, VIOsPAPRDevice),
+VMSTATE_UINT32(crq.qsize, VIOsPAPRDevice),
+VMSTATE_UINT32(crq.qnext, VIOsPAPRDevice),
+
+VMSTATE_END_OF_LIST()
+},
+};
+
 static void vio_spapr_device_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *k = DEVICE_CLASS(klass);
diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index 3609327..46edc2a 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -134,4 +134,9 @@ VIOsPAPRDevice *spapr_vty_get_default(VIOsPAPRBus *bus);
 
 void spapr_vio_quiesce(void);
 
+extern const VMStateDescription vmstate_spapr_vio;
+
+#define VMSTATE_SPAPR_VIO(_f, _s) \
+VMSTATE_STRUCT(_f, _s, 0, vmstate_spapr_vio, VIOsPAPRDevice)
+
 #endif /* _HW_SPAPR_VIO_H */
-- 
1.8.0




[Qemu-devel] [PATCH v2 08/11] pseries: savevm support for pseries machine

2013-07-18 Thread Anthony Liguori
From: David Gibson 

This adds the necessary pieces to implement savevm / migration for the
pseries machine.  The most complex part here is migrating the hash
table - for the paravirtualized pseries machine the guest's hash page
table is not stored within guest memory, but externally and the guest
accesses it via hypercalls.

This patch uses a hypervisor reserved bit of the HPTE as a dirty bit
(tracking changes to the HPTE itself, not the page it references).
This is used to implement a live migration style incremental save and
restore of the hash table contents.

Normally a hash table is 16MB but it can get bigger depending on how
much RAM the guest has. Due to its nature, updates to it are random so
the live migration style is used for it.

In addition it adds VMStateDescription information to save and restore
the (few) remaining pieces of state information needed by the pseries
machine.

Signed-off-by: David Gibson 
Reviewed-by: Anthony Liguori 

---
Changes:
2013/09/07:
* added a comment about HPT size and migration style choice
---
 hw/ppc/spapr.c | 269 -
 hw/ppc/spapr_hcall.c   |   8 +-
 include/hw/ppc/spapr.h |  12 ++-
 3 files changed, 281 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e340708..734a782 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -32,6 +32,7 @@
 #include "sysemu/cpus.h"
 #include "sysemu/kvm.h"
 #include "kvm_ppc.h"
+#include "mmu-hash64.h"
 
 #include "hw/boards.h"
 #include "hw/ppc/ppc.h"
@@ -666,7 +667,7 @@ static void spapr_cpu_reset(void *opaque)
 
 env->spr[SPR_HIOR] = 0;
 
-env->external_htab = spapr->htab;
+env->external_htab = (uint8_t *)spapr->htab;
 env->htab_base = -1;
 env->htab_mask = HTAB_SIZE(spapr) - 1;
 env->spr[SPR_SDR1] = (target_ulong)(uintptr_t)spapr->htab |
@@ -710,6 +711,268 @@ static int spapr_vga_init(PCIBus *pci_bus)
 }
 }
 
+static const VMStateDescription vmstate_spapr = {
+.name = "spapr",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT32(next_irq, sPAPREnvironment),
+
+/* RTC offset */
+VMSTATE_UINT64(rtc_offset, sPAPREnvironment),
+
+VMSTATE_END_OF_LIST()
+},
+};
+
+#define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
+#define HPTE_VALID(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_VALID)
+#define HPTE_DIRTY(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & 
HPTE64_V_HPTE_DIRTY)
+#define CLEAN_HPTE(_hpte)  ((*(uint64_t *)(_hpte)) &= 
tswap64(~HPTE64_V_HPTE_DIRTY))
+
+static int htab_save_setup(QEMUFile *f, void *opaque)
+{
+sPAPREnvironment *spapr = opaque;
+
+spapr->htab_save_index = 0;
+spapr->htab_first_pass = true;
+
+/* "Iteration" header */
+qemu_put_be32(f, spapr->htab_shift);
+
+return 0;
+}
+
+#define MAX_ITERATION_NS500 /* 5 ms */
+
+static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
+ int64_t max_ns)
+{
+int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
+int index = spapr->htab_save_index;
+int64_t starttime = qemu_get_clock_ns(rt_clock);
+
+assert(spapr->htab_first_pass);
+
+do {
+int chunkstart;
+
+/* Consume invalid HPTEs */
+while ((index < htabslots)
+   && !HPTE_VALID(HPTE(spapr->htab, index))) {
+index++;
+CLEAN_HPTE(HPTE(spapr->htab, index));
+}
+
+/* Consume valid HPTEs */
+chunkstart = index;
+while ((index < htabslots)
+   && HPTE_VALID(HPTE(spapr->htab, index))) {
+index++;
+CLEAN_HPTE(HPTE(spapr->htab, index));
+}
+
+if (index > chunkstart) {
+int n_valid = index - chunkstart;
+
+qemu_put_be32(f, chunkstart);
+qemu_put_be16(f, n_valid);
+qemu_put_be16(f, 0);
+qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
+HASH_PTE_SIZE_64 * n_valid);
+
+if ((qemu_get_clock_ns(rt_clock) - starttime) > max_ns) {
+break;
+}
+}
+} while ((index < htabslots) && !qemu_file_rate_limit(f));
+
+if (index >= htabslots) {
+assert(index == htabslots);
+index = 0;
+spapr->htab_first_pass = false;
+}
+spapr->htab_save_index = index;
+}
+
+static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
+ int64_t max_ns)
+{
+bool final = max_ns < 0;
+int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
+int examined = 0, sent = 0;
+int index = spapr->htab_save_index;
+int64_t starttime = qemu_get_clock_ns(rt_clock);
+
+assert(!spapr->htab_first_pass);
+
+do {
+int chunkstart, invalidstart;
+
+/* Consume non-dirty HPTEs */
+while ((index < htabslots)
+   && !HPTE_

Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert

2013-07-18 Thread Peter Lieven

Am 18.07.2013 um 20:54 schrieb ronnie sahlberg :

> BlockLimitsVPD  OptimalUnmapGranularity also applies to unmapping with
> writesame16 :
> 
> An OPTIMAL UNMAP GRANULARITY field set to a non-zero value indicates
> the optimal granularity in logical blocks
> for unmap requests (e.g., an UNMAP command or a WRITE SAME (16)
> command with the UNMAP bit set to
> one). An unmap request with a number of logical blocks that is not a
> multiple of this value may result in unmap
> operations on fewer LBAs than requested. An OPTIMAL UNMAP GRANULARITY
> field set to _h indicates
> that the device server does not report an optimal unmap granularity.
> 
> So if you send a writesame16+unmap-bit  that honours the "optimal
> unmap request" you have a pretty good chance
> that the blocks will be unmapped. If they are not they will at least
> be overwritten as zero.

thanks for the details. I think to have optimal performance and best
change for unmapping in qemu-img convert
it might be best to export the OPTIMAL UNMAP GRANULARITY as well
as the write_zeroes_w_discard capability via the BDI and than zero
out the whole device with bdrv_write_zeroes and the BDRV_MAY_DISCARD
flag.

the logic for this is already prepared in patch4 (topic of this email). except 
that
i have to exchange bdrv_discard with bdrv_write_zeroes w/ BDRV_MAY_DISCARD.

what do you think?



> 
> 
> On Thu, Jul 18, 2013 at 11:43 AM, Peter Lieven  wrote:
>> 
>> Am 18.07.2013 um 16:35 schrieb Paolo Bonzini :
>> 
>>> Il 18/07/2013 16:32, Peter Lieven ha scritto:
>> 
> (Mis)alignment and granularity can be handled later.  We can ignore them
> for now.  Later, if we decide the best way to support them is a flag,
> we'll add it.  Let's not put the cart before the horse.
> 
> BTW, I expect alignment!=0 to be really, really rare.
 To explain my concerns:
 
 I know that my target has internal page size of 15MB. I will check what
 happens
 if I deallocate this 15MB in chunks of lets say 1MB. If the page gets
 unprovisioned
 after the last chunk is unmapped it would be fine :-)
>>> 
>>> You're talking of granularity here, not (mis)alignment.
>> 
>> you are right. for the target i am talking about this is 30720 512-byte 
>> blocks for the granularity (pagesize) and 0 for the alignment.
>> i will see what happens if I write same w/unmap the whole 30720 blocks in 
>> smaller blocks ;-) otherwise i will have to add support for honoring this 
>> values in qemu-img convert
>> as a follow up.
>> 
>> Peter
>> 




Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Ian Main
On Thu, Jul 18, 2013 at 12:56:51PM -0600, Eric Blake wrote:
> On 07/18/2013 12:47 PM, Ian Main wrote:
> > qcow2 supports backing files so it makes sense to default to qcow2
> > for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing
> > drive and export it via nbd.  Defaulting FULL and TOP to SYNC_MODE_NONE
> > breaks tests but that could be fixed if we wanted it.
> > 
> > Signed-off-by: Ian Main 
> > ---
> >  blockdev.c   | 5 -
> >  qapi-schema.json | 1 +
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> Looks okay, but let's answer the meta-question first of whether we
> should just make 'format' mandatory and be done with it.
> 
> Also, I've noticed you aren't cc'ing many people; that can slow down
> reviews.  http://wiki.qemu.org/Contribute/SubmitAPatch gives hints on
> how to determine the right people to send your patches to, by
> deciphering MAINTAINERS and running ./scripts/getmaintainer.pl.

Ah ok, I'll add them next rev.

My take has been to just do a patch that implements the suggestion and
see what people think :).

Ian



Re: [Qemu-devel] [PATCH V4 1/4] Implement sync modes for drive-backup.

2013-07-18 Thread Ian Main
On Thu, Jul 18, 2013 at 11:19:43AM -0600, Eric Blake wrote:
> On 07/17/2013 02:04 PM, Ian Main wrote:
> > This patch adds sync-modes to the drive-backup interface and
> > implements the FULL, NONE and TOP modes of synchronization.
> > 
> > FULL performs as before copying the entire contents of the drive
> > while preserving the point-in-time using CoW.
> > NONE only copies new writes to the target drive.
> > TOP copies changes to the topmost drive image and preserves the
> > point-in-time using CoW.
> > 
> 
> > +++ b/qapi-schema.json
> > @@ -1807,6 +1807,10 @@
> >  # @format: #optional the format of the new destination, default is to
> >  #  probe if @mode is 'existing', else the format of the source
> >  #
> > +# @sync: what parts of the disk image should be copied to the destination
> > +#(all the disk, only the sectors allocated in the topmost image, or
> > +#only new I/O).
> > +#
> >  # @mode: #optional whether and how QEMU should create a new image, default 
> > is
> >  #'absolute-paths'.
> 
> This hunk looks bogus; the 'DriveBackup' type already documents @sync as
> of commit b53169e, which makes me suspect this hunk got misapplied
> during rebasing.

Did you check that?  I know there was one bit of documentation missing
that I fixed here.  I also just did a clean rebase (git am) to
kevin/block and it all applied fine.
 
> >  #
> > diff --git a/qmp-commands.hx b/qmp-commands.hx
> > index e075df4..f3f6b3d 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -957,6 +957,7 @@ Arguments:
> >  
> >  Example:
> >  -> { "execute": "drive-backup", "arguments": { "device": "drive0",
> > +   "sync": "full",
> > "target": "backup.img" } }
> 
> Ouch - commit b53169e made 'sync' mandatory, but forgot to update the
> example to match; perhaps this hunk ought to be applied independently?

That's up to you guys, I can split it out if needed.

Ian



Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Eric Blake
On 07/18/2013 12:47 PM, Ian Main wrote:
> qcow2 supports backing files so it makes sense to default to qcow2
> for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing
> drive and export it via nbd.  Defaulting FULL and TOP to SYNC_MODE_NONE
> breaks tests but that could be fixed if we wanted it.
> 
> Signed-off-by: Ian Main 
> ---
>  blockdev.c   | 5 -
>  qapi-schema.json | 1 +
>  2 files changed, 5 insertions(+), 1 deletion(-)

Looks okay, but let's answer the meta-question first of whether we
should just make 'format' mandatory and be done with it.

Also, I've noticed you aren't cc'ing many people; that can slow down
reviews.  http://wiki.qemu.org/Contribute/SubmitAPatch gives hints on
how to determine the right people to send your patches to, by
deciphering MAINTAINERS and running ./scripts/getmaintainer.pl.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert

2013-07-18 Thread ronnie sahlberg
BlockLimitsVPD  OptimalUnmapGranularity also applies to unmapping with
writesame16 :

An OPTIMAL UNMAP GRANULARITY field set to a non-zero value indicates
the optimal granularity in logical blocks
for unmap requests (e.g., an UNMAP command or a WRITE SAME (16)
command with the UNMAP bit set to
one). An unmap request with a number of logical blocks that is not a
multiple of this value may result in unmap
operations on fewer LBAs than requested. An OPTIMAL UNMAP GRANULARITY
field set to _h indicates
that the device server does not report an optimal unmap granularity.

So if you send a writesame16+unmap-bit  that honours the "optimal
unmap request" you have a pretty good chance
that the blocks will be unmapped. If they are not they will at least
be overwritten as zero.


On Thu, Jul 18, 2013 at 11:43 AM, Peter Lieven  wrote:
>
> Am 18.07.2013 um 16:35 schrieb Paolo Bonzini :
>
>> Il 18/07/2013 16:32, Peter Lieven ha scritto:
>
 (Mis)alignment and granularity can be handled later.  We can ignore them
 for now.  Later, if we decide the best way to support them is a flag,
 we'll add it.  Let's not put the cart before the horse.

 BTW, I expect alignment!=0 to be really, really rare.
>>> To explain my concerns:
>>>
>>> I know that my target has internal page size of 15MB. I will check what
>>> happens
>>> if I deallocate this 15MB in chunks of lets say 1MB. If the page gets
>>> unprovisioned
>>> after the last chunk is unmapped it would be fine :-)
>>
>> You're talking of granularity here, not (mis)alignment.
>
> you are right. for the target i am talking about this is 30720 512-byte 
> blocks for the granularity (pagesize) and 0 for the alignment.
> i will see what happens if I write same w/unmap the whole 30720 blocks in 
> smaller blocks ;-) otherwise i will have to add support for honoring this 
> values in qemu-img convert
> as a follow up.
>
> Peter
>



Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Eric Blake
On 07/18/2013 12:03 PM, Ian Main wrote:
>>
>> Or we could simplify life by making 'format' mandatory for drive-backup;
>> it was optional for 'drive-mirror' due to incremental implementation,
>> but for 'drive-backup', we still have the opportunity to do things right
>> from the first release.
> 
> Ah, I did make a doc change, I must have forgotten to add it.
> 
> I'm all for making format mandatory if that is ok with everyone.  Maybe
> that is the best solution.

We can always change our mind in 1.7 to make it optional if we change
our minds, but I'd definitely like to see patches that make 'format'
mandatory for DriveBackup for 1.6 - simpler all around.  Converting
mandatory to optional is discoverable via introspection; while
converting optional to mandatory is an API break.  And since we can
argue that optional formats is relatively risky, it's better to have our
initial release be conservative by requiring the field until (and
unless) someone comes up with a use case why leaving it unspecified
makes a difference.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves

2013-07-18 Thread Alex Bligh

Stefan,

--On 17 July 2013 11:02:30 +0800 Stefan Hajnoczi  wrote:


The steps to achieving this:

1. Drop alarm timers from qemu-timer.c and calculate g_poll() timeout
   instead for the main loop.

2. Introduce a per-AioContext aio_ctx_clock that can be used with
   qemu_new_timer() to create a QEMUTimer that expires during
   aio_poll().

3. Calculate g_poll() timeout for aio_ctx_clock in aio_poll().


I've pretty much written this.

Two issues:

1. I very much enjoyed deleting all the alarm timers code. However, it was
  doing something vaguely useful, which was calling qemu_notify_event
  when the timer expired. Under the new regime above, if the AioContext
  clock's timers expires within aio_poll, life is good as the timeout
  to g_poll will expire. However, this won't apply if any of the other
  three static clock's timers expire. Also, it won't work within the
  mainloop poll. Also, we lose the ability to respond to timers in a sub
  millisecond way.

  Options:

  a) restore alarm timers (at least for the time being). Make all
 alarm timers do qemu_notify_event. However, only run the AioContext
 clock's timers within aio_poll. This is the least intrusive change.

  b) calculate the timeout in aio_poll with respect to the minimum
 deadline across all clocks, not just AioContext's clock. Use the
 same logic in mainloop.

  I'd go for (b), except for the millisecond accuracy thing. So my
  temptation (sadly) is (a).

2. If we do delete alarm timers, I'll need to delete the -clock option.

WDYT?

--
Alex Bligh



[Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Ian Main
qcow2 supports backing files so it makes sense to default to qcow2
for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing
drive and export it via nbd.  Defaulting FULL and TOP to SYNC_MODE_NONE
breaks tests but that could be fixed if we wanted it.

Signed-off-by: Ian Main 
---
 blockdev.c   | 5 -
 qapi-schema.json | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 000dea6..a56ba08 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1462,7 +1462,10 @@ void qmp_drive_backup(const char *device, const char 
*target,
 }
 
 if (!has_format) {
-format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
+format = NULL;
+if (mode != NEW_IMAGE_MODE_EXISTING) {
+format = sync == MIRROR_SYNC_MODE_NONE ? "qcow2" : 
bs->drv->format_name;
+}
 }
 if (format) {
 drv = bdrv_find_format(format);
diff --git a/qapi-schema.json b/qapi-schema.json
index b3f6c2a..e2c86f9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1806,6 +1806,7 @@
 #
 # @format: #optional the format of the new destination, default is to
 #  probe if @mode is 'existing', else the format of the source
+#  drive.  If @sync is 'none' then the default is qcow2.
 #
 # @sync: what parts of the disk image should be copied to the destination
 #(all the disk, only the sectors allocated in the topmost image, or
-- 
1.8.1.4




Re: [Qemu-devel] [PATCH 4/4] qemu-img: conditionally discard target on convert

2013-07-18 Thread Peter Lieven

Am 18.07.2013 um 16:35 schrieb Paolo Bonzini :

> Il 18/07/2013 16:32, Peter Lieven ha scritto:
 
>>> (Mis)alignment and granularity can be handled later.  We can ignore them
>>> for now.  Later, if we decide the best way to support them is a flag,
>>> we'll add it.  Let's not put the cart before the horse.
>>> 
>>> BTW, I expect alignment!=0 to be really, really rare.
>> To explain my concerns:
>> 
>> I know that my target has internal page size of 15MB. I will check what
>> happens
>> if I deallocate this 15MB in chunks of lets say 1MB. If the page gets
>> unprovisioned
>> after the last chunk is unmapped it would be fine :-)
> 
> You're talking of granularity here, not (mis)alignment.

you are right. for the target i am talking about this is 30720 512-byte blocks 
for the granularity (pagesize) and 0 for the alignment.
i will see what happens if I write same w/unmap the whole 30720 blocks in 
smaller blocks ;-) otherwise i will have to add support for honoring this 
values in qemu-img convert
as a follow up.

Peter




Re: [Qemu-devel] [PATCH v2 0/6] Clean up bogus default boot order

2013-07-18 Thread Anthony Liguori
Markus Armbruster  writes:

> Ping?

This needs a rebase.

Regards,

Anthony Liguori

>
> Markus Armbruster  writes:
>
>> This is on top of "[PATCH v4 00/12] Boot order tests", so it's
>> protected by these tests.
>>
>> The first five patches are admittedly related to the stated purpose of
>> this series pretty much only by "I can't stand perpetuating this
>> stupid crap".  Max Filippov and Peter Maydell already cleaned up
>> Xtensa and ARM the same way.
>>
>> v2:
>> * Straightforward rebase
>> * PATCH 6 improved commit message (Alex)
>>
>> Markus Armbruster (6):
>>   pc: Don't prematurely explode QEMUMachineInitArgs
>>   pc: Don't explode QEMUMachineInitArgs into local variables needlessly
>>   sun4: Don't prematurely explode QEMUMachineInitArgs
>>   ppc: Don't explode QEMUMachineInitArgs into local variables
>> needlessly
>>   ppc: Don't duplicate QEMUMachineInitArgs in PPCE500Params
>>   hw: Clean up bogus default boot order
>>
>>  hw/alpha/dp264.c |   1 -
>>  hw/arm/collie.c  |   1 -
>>  hw/arm/exynos4_boards.c  |   2 -
>>  hw/arm/gumstix.c |   2 -
>>  hw/arm/highbank.c|   1 -
>>  hw/arm/integratorcp.c|   1 -
>>  hw/arm/kzm.c |   1 -
>>  hw/arm/mainstone.c   |   1 -
>>  hw/arm/musicpal.c|   1 -
>>  hw/arm/nseries.c |   6 +-
>>  hw/arm/omap_sx1.c|   2 -
>>  hw/arm/palm.c|   1 -
>>  hw/arm/realview.c|   4 -
>>  hw/arm/spitz.c   |   4 -
>>  hw/arm/stellaris.c   |   2 -
>>  hw/arm/tosa.c|   1 -
>>  hw/arm/versatilepb.c |   2 -
>>  hw/arm/vexpress.c|   2 -
>>  hw/arm/xilinx_zynq.c |   1 -
>>  hw/arm/z2.c  |   1 -
>>  hw/core/null-machine.c   |   1 -
>>  hw/cris/axis_dev88.c |   1 -
>>  hw/i386/pc_piix.c|  89 +++--
>>  hw/i386/pc_q35.c |  28 +++
>>  hw/i386/xen_machine_pv.c |   1 -
>>  hw/lm32/lm32_boards.c|   2 -
>>  hw/lm32/milkymist.c  |   1 -
>>  hw/m68k/an5206.c |   1 -
>>  hw/m68k/dummy_m68k.c |   1 -
>>  hw/m68k/mcf5208.c|   1 -
>>  hw/microblaze/petalogix_ml605_mmu.c  |   1 -
>>  hw/microblaze/petalogix_s3adsp1800_mmu.c |   1 -
>>  hw/mips/mips_fulong2e.c  |   1 -
>>  hw/mips/mips_jazz.c  |   2 -
>>  hw/mips/mips_malta.c |   1 -
>>  hw/mips/mips_mipssim.c   |   1 -
>>  hw/mips/mips_r4k.c   |   1 -
>>  hw/openrisc/openrisc_sim.c   |   1 -
>>  hw/ppc/e500.c|  35 +
>>  hw/ppc/e500.h|  13 +--
>>  hw/ppc/e500plat.c|  15 +---
>>  hw/ppc/mac_newworld.c|   4 +-
>>  hw/ppc/mac_oldworld.c|   4 +-
>>  hw/ppc/mpc8544ds.c   |  15 +---
>>  hw/ppc/ppc405_boards.c   |   2 -
>>  hw/ppc/ppc440_bamboo.c   |   1 -
>>  hw/ppc/prep.c|   4 +-
>>  hw/ppc/spapr.c   |   4 +-
>>  hw/ppc/virtex_ml507.c|   1 -
>>  hw/s390x/s390-virtio-ccw.c   |   1 -
>>  hw/s390x/s390-virtio.c   |   1 -
>>  hw/sh4/r2d.c |   1 -
>>  hw/sh4/shix.c|   1 -
>>  hw/sparc/leon3.c |   1 -
>>  hw/sparc/sun4m.c | 131 
>> ---
>>  hw/sparc64/sun4u.c   |  58 +-
>>  hw/unicore32/puv3.c  |   1 -
>>  hw/xtensa/xtensa_lx60.c  |   2 -
>>  hw/xtensa/xtensa_sim.c   |   1 -
>>  include/hw/boards.h  |   7 +-
>>  vl.c |   4 +-
>>  61 files changed, 130 insertions(+), 349 deletions(-)




Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Ian Main
On Thu, Jul 18, 2013 at 11:27:21AM -0600, Eric Blake wrote:
> On 07/17/2013 02:04 PM, Ian Main wrote:
> > qcow2 supports backing files so it makes sense to default to qcow2
> > for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing
> > drive and export it via nbd.  Defaulting FULL and TOP to SYNC_MODE_NONE
> > breaks tests but that could be fixed if we wanted it.
> > 
> > Signed-off-by: Ian Main 
> > ---
> >  blockdev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/blockdev.c b/blockdev.c
> > index 000dea6..805b0e5 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -1462,7 +1462,7 @@ void qmp_drive_backup(const char *device, const char 
> > *target,
> >  }
> >  
> >  if (!has_format) {
> > -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : 
> > bs->drv->format_name;
> > +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : "qcow2";
> 
> Is this the right thing to do?  Or should we do:
> 
> if (!has_format) {
> if (mode == NEW_IMAGE_MODE_EXISTING) {
> format = NULL;
> } else {
> format = bs->drv->format_name ?: "qcow2";
> }
> }
> 
> That is, I think we should default to doing a backup in the format given
> by the original (what if the original is qed, which also supports
> backing files), and only use qcow2 when there is no guidance whatsoever.
> 
> But in practice, I don't care - format probing is a security hole, so
> libvirt should always be passing a format, at which point the entire
> !has_format condition should be skipped when called by libvirt.

Heh, actually that is basically what I have now, as with the doc change
I forgot to git add it.  Doh!  I'll repost.. I'm also not opposed to
format being non-optional.

Ian



Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Ian Main
On Thu, Jul 18, 2013 at 11:32:52AM -0600, Eric Blake wrote:
> On 07/18/2013 11:27 AM, Eric Blake wrote:
> 
> >>  if (!has_format) {
> >> -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : 
> >> bs->drv->format_name;
> >> +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : "qcow2";
> > 
> > Is this the right thing to do?  Or should we do:
> > 
> > if (!has_format) {
> > if (mode == NEW_IMAGE_MODE_EXISTING) {
> > format = NULL;
> > } else {
> > format = bs->drv->format_name ?: "qcow2";
> > }
> > }
> > 
> > That is, I think we should default to doing a backup in the format given
> > by the original (what if the original is qed, which also supports
> > backing files), and only use qcow2 when there is no guidance whatsoever.
> > 
> > But in practice, I don't care
> 
> Well, I _DO_ care about one thing - make sure that the qapi-schema.json
> page accurately documents how this variable is defaulted for callers
> that don't care about the implications of omitting a format.
> 
> Or we could simplify life by making 'format' mandatory for drive-backup;
> it was optional for 'drive-mirror' due to incremental implementation,
> but for 'drive-backup', we still have the opportunity to do things right
> from the first release.

Ah, I did make a doc change, I must have forgotten to add it.

I'm all for making format mandatory if that is ok with everyone.  Maybe
that is the best solution.

Ian



Re: [Qemu-devel] [PATCH v8 0/3] Throttle-down guest to help with live migration convergence

2013-07-18 Thread Vinod, Chegu
Hi Peter,

Thanks for catching this.
Tthis was perhaps accidentally left out during merge to Juan's migration.next 
tree.  
I have informed Juan and he said he would take care of it.

Vinod

-Original Message-
From: Peter Lieven [mailto:lieven-li...@dlhnet.de] 
Sent: Wednesday, July 17, 2013 11:25 PM
To: Vinod, Chegu
Cc: ebl...@redhat.com; anth...@codemonkey.ws; quint...@redhat.com; 
owass...@redhat.com; qemu-devel@nongnu.org; pbonz...@redhat.com
Subject: Re: [Qemu-devel] [PATCH v8 0/3] Throttle-down guest to help with live 
migration convergence

Hi all,

is it possible that not v8 of this patch got merged?
Without checking line-by-line I see at least that this here

+# @auto-converge: If enabled, QEMU will automatically throttle down the guest
+#  to speed up convergence of RAM migration. (since 1.6)
+#

is missing in qapi-schema.json.

BR,
Peter



On 24.06.2013 11:49, Chegu Vinod wrote:
> Busy enterprise workloads hosted on large sized VM's tend to dirty 
> memory faster than the transfer rate achieved via live guest migration.
> Despite some good recent improvements (& using dedicated 10Gig NICs 
> between hosts) the live migration does NOT converge.
>
> If a user chooses to force convergence of their migration via a new 
> migration capability "auto-converge" then this change will auto-detect 
> lack of convergence scenario and trigger a slow down of the workload 
> by explicitly disallowing the VCPUs from spending much time in the VM 
> context.
>
> The migration thread tries to catchup and this eventually leads to 
> convergence in some "deterministic" amount of time. Yes it does impact 
> the performance of all the VCPUs but in my observation that lasts only 
> for a short duration of time. i.e. end up entering stage 3 (downtime 
> phase) soon after that. No external trigger is required.
>
> Thanks to Juan and Paolo for their useful suggestions.
>
> ---
> Changes from v7:
> - added a missing else to patch 3/3.
>
> Changes from v6:
> - incorporated feedback from Paolo.
> - rebased to latest qemu.git and removing RFC
>
> Changes from v5:
> - incorporated feedback from Paolo & Igor.
> - rebased to latest qemu.git
>
> Changes from v4:
> - incorporated feedback from Paolo.
> - split into 3 patches.
>
> Changes from v3:
> - incorporated feedback from Paolo and Eric
> - rebased to latest qemu.git
>
> Changes from v2:
> - incorporated feedback from Orit, Juan and Eric
> - stop the throttling thread at the start of stage 3
> - rebased to latest qemu.git
>
> Changes from v1:
> - rebased to latest qemu.git
> - added auto-converge capability(default off) - suggested by Anthony Liguori &
>  Eric Blake.
>
> Signed-off-by: Chegu Vinod 
> ---
>
> Chegu Vinod (3):
>Introduce async_run_on_cpu()
>Add 'auto-converge' migration capability
>Force auto-convegence of live migration
>
>   arch_init.c   |   85 
> +
>   cpus.c|   29 ++
>   include/migration/migration.h |2 +
>   include/qemu-common.h |1 +
>   include/qom/cpu.h |   10 +
>   migration.c   |9 
>   qapi-schema.json  |5 ++-
>   7 files changed, 140 insertions(+), 1 deletions(-)
>
>




Re: [Qemu-devel] Watchdog device in Qemu user mode

2013-07-18 Thread kbastian

Hi Andreas,

thanks alot. I only used user mode because I thought it is easier to  
implement.

I will take a look into system mode.

Best regards

Bastian




Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Eric Blake
On 07/18/2013 11:27 AM, Eric Blake wrote:

>>  if (!has_format) {
>> -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : 
>> bs->drv->format_name;
>> +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : "qcow2";
> 
> Is this the right thing to do?  Or should we do:
> 
> if (!has_format) {
> if (mode == NEW_IMAGE_MODE_EXISTING) {
> format = NULL;
> } else {
> format = bs->drv->format_name ?: "qcow2";
> }
> }
> 
> That is, I think we should default to doing a backup in the format given
> by the original (what if the original is qed, which also supports
> backing files), and only use qcow2 when there is no guidance whatsoever.
> 
> But in practice, I don't care

Well, I _DO_ care about one thing - make sure that the qapi-schema.json
page accurately documents how this variable is defaulted for callers
that don't care about the implications of omitting a format.

Or we could simplify life by making 'format' mandatory for drive-backup;
it was optional for 'drive-mirror' due to incremental implementation,
but for 'drive-backup', we still have the opportunity to do things right
from the first release.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Watchdog device in Qemu user mode

2013-07-18 Thread Andreas Färber
Hi Bastian,

Am 18.07.2013 19:09, schrieb kbast...@mail.uni-paderborn.de:
> The processor is usually embedded and
> programs run without operating systems or with custom ones designed for
> hard realtime. Would it be much work to port from user mode to system mode?
> I am using a c-compiler to create testprograms and these programs are
> serving the watchdog. This results in segmentation faults when I try to
> execute loads or stores on addresses from the watchdog.

You need a softmmu a.k.a. system emulation for that. linux-user is only
for emulating programs that would actually run under Linux operating
system. You'd use the -bios command line option to load bare metal software.

Having no link to your code it's hard to judge how much work it might
be, but in general system emulation is easier - you need to write a
simple board file to load your binary, the instruction emulation remains
unchanged, maybe one or two stub functions for interrupts and MMU faults.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] [PATCH V4 4/4] Change default to qcow2 for sync mode none.

2013-07-18 Thread Eric Blake
On 07/17/2013 02:04 PM, Ian Main wrote:
> qcow2 supports backing files so it makes sense to default to qcow2
> for MIRROR_SYNC_MODE_NONE so that we can use the source as a backing
> drive and export it via nbd.  Defaulting FULL and TOP to SYNC_MODE_NONE
> breaks tests but that could be fixed if we wanted it.
> 
> Signed-off-by: Ian Main 
> ---
>  blockdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 000dea6..805b0e5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1462,7 +1462,7 @@ void qmp_drive_backup(const char *device, const char 
> *target,
>  }
>  
>  if (!has_format) {
> -format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : 
> bs->drv->format_name;
> +format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : "qcow2";

Is this the right thing to do?  Or should we do:

if (!has_format) {
if (mode == NEW_IMAGE_MODE_EXISTING) {
format = NULL;
} else {
format = bs->drv->format_name ?: "qcow2";
}
}

That is, I think we should default to doing a backup in the format given
by the original (what if the original is qed, which also supports
backing files), and only use qcow2 when there is no guidance whatsoever.

But in practice, I don't care - format probing is a security hole, so
libvirt should always be passing a format, at which point the entire
!has_format condition should be skipped when called by libvirt.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Watchdog device in Qemu user mode

2013-07-18 Thread Peter Maydell
On 18 July 2013 18:09,   wrote:
> thanks for the fast response. The processor is usually embedded and programs
> run without operating systems or with custom ones designed for hard
> realtime.

If this is your use case you should just implement system mode
and not worry about implementing linux-user mode at all.

thanks
-- PMM



Re: [Qemu-devel] [PATCH V4 1/4] Implement sync modes for drive-backup.

2013-07-18 Thread Eric Blake
On 07/17/2013 02:04 PM, Ian Main wrote:
> This patch adds sync-modes to the drive-backup interface and
> implements the FULL, NONE and TOP modes of synchronization.
> 
> FULL performs as before copying the entire contents of the drive
> while preserving the point-in-time using CoW.
> NONE only copies new writes to the target drive.
> TOP copies changes to the topmost drive image and preserves the
> point-in-time using CoW.
> 

> +++ b/qapi-schema.json
> @@ -1807,6 +1807,10 @@
>  # @format: #optional the format of the new destination, default is to
>  #  probe if @mode is 'existing', else the format of the source
>  #
> +# @sync: what parts of the disk image should be copied to the destination
> +#(all the disk, only the sectors allocated in the topmost image, or
> +#only new I/O).
> +#
>  # @mode: #optional whether and how QEMU should create a new image, default is
>  #'absolute-paths'.

This hunk looks bogus; the 'DriveBackup' type already documents @sync as
of commit b53169e, which makes me suspect this hunk got misapplied
during rebasing.

>  #
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index e075df4..f3f6b3d 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -957,6 +957,7 @@ Arguments:
>  
>  Example:
>  -> { "execute": "drive-backup", "arguments": { "device": "drive0",
> +   "sync": "full",
> "target": "backup.img" } }

Ouch - commit b53169e made 'sync' mandatory, but forgot to update the
example to match; perhaps this hunk ought to be applied independently?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] Bug Fix:Segmentation fault when use usb-ehci device

2013-07-18 Thread Andreas Färber
Hi,

Am 18.07.2013 17:27, schrieb Mike Qiu:
> Hi all
> 
> Any comments ?

You should've CCed the USB maintainer whose file you are touching for
review rather than just ppc people, see ./MAINTAINERS.

There's some typos in the commit message, but the change looks okay to
me - although there were discussions to catch this on the memory API
side of things instead.

Regards,
Andreas

> 
> Thanks
> Mike
> 2013/7/16 11:50, Mike Qiu wrote:
>> For usb-ehci in qemu, its caps just has read() operation,
>> the write() operation does not exist.
>>
>> This cause a Segmentation fault when use usb-ehci device in ppc64
>> platform.
>>
>> here is gdb output:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x3fffa7fcef20 (LWP 6793)]
>> 0x103f5244 in memory_region_oldmmio_write_accessor
>> (opaque=0x113e9e78, addr=9, value=0x3fffa7fce088,
>> size=1, shift=0, mask=255) at /home/Mike/qemu-impreza/memory.c:384
>> 384  mr->ops->old_mmio.write[ctz32(size)](mr->opaque, addr, tmp);
>> (gdb) p *mr->ops
>> $1 = {read = @0x10716f68: 0x1020699c , write = 0,
>>endianness = DEVICE_LITTLE_ENDIAN, valid = {min_access_size = 1,
>>max_access_size = 4, unaligned = false, accepts = 0}, impl =
>>{min_access_size = 1, max_access_size = 1, unaligned = false},
>>old_mmio = {read = {0, 0, 0}, write = {0, 0, 0}}}
>>
>> Becasue function write() of mr->ops has not been implement, in
>> function memory_region_dispatch_write(), it call
>> oldmmio write accessor, but at the same time old_mmio still not
>> been implement by default.
>>
>> That is the root cause of the Segmentation fault.
>>
>> To solve this problem, add empty function: ehci_caps_write()
>>
>> Signed-off-by: Mike Qiu 
>> ---
>>  hw/usb/hcd-ehci.c |7 +++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
>> index 67e4b24..6c8a439 100644
>> --- a/hw/usb/hcd-ehci.c
>> +++ b/hw/usb/hcd-ehci.c
>> @@ -1072,6 +1072,12 @@ static void ehci_port_write(void *ptr, hwaddr addr,
>>  trace_usb_ehci_portsc_change(addr + s->portscbase, addr >> 2, *portsc, 
>> old);
>>  }
>>
>> +static void ehci_caps_write(void *ptr, hwaddr addr, uint64_t val,
>> +   unsigned size)
>> +{
>> +/* nothing */
>> +}
>> +
>>  static void ehci_opreg_write(void *ptr, hwaddr addr,
>>   uint64_t val, unsigned size)
>>  {
>> @@ -2380,6 +2386,7 @@ static void ehci_frame_timer(void *opaque)
>>
>>  static const MemoryRegionOps ehci_mmio_caps_ops = {
>>  .read = ehci_caps_read,
>> +.write = ehci_caps_write,
>>  .valid.min_access_size = 1,
>>  .valid.max_access_size = 4,
>>  .impl.min_access_size = 1,
> 
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] Watchdog device in Qemu user mode

2013-07-18 Thread kbastian

Hi Peter,

thanks for the fast response. The processor is usually embedded and  
programs run without operating systems or with custom ones designed  
for hard realtime. Would it be much work to port from user mode to  
system mode?
I am using a c-compiler to create testprograms and these programs are  
serving the watchdog. This results in segmentation faults when I try  
to execute loads or stores on addresses from the watchdog.


Best regards

Bastian




Re: [Qemu-devel] [PATCH] util/iov: Fix -O1 uninitialized variable warning

2013-07-18 Thread Richard Henderson
On 07/18/2013 09:36 AM, Peter Maydell wrote:
> On 18 July 2013 17:14, Richard Henderson  wrote:
>> At -O2, code in the form
>>
>>   if (p) A; B; if (p) C;
>>
>> may be rearranged via "jump threading" into
>>
>>   if (p) { A; B; C; } else { B; }
>>
>> But at -O1 this doesn't happen and we -Werror out here on
>> the "may be used uninitialized" orig_len.  Perform this transform
>> by hand so that -O1 remains a viable debugging alternative.
>>
>> Signed-off-by: Richard Henderson 
> 
> This is the same issue fixed by this (reviewed but
> never applied) patch from June, isn't it?
> 
> http://patchwork.ozlabs.org/patch/251410/

Yes, it is.


r~




Re: [Qemu-devel] Watchdog device in Qemu user mode

2013-07-18 Thread Peter Maydell
On 18 July 2013 17:44,   wrote:
> Hey everybody,
>
> I am fairly new to qemu development. So here is my question:
> I would like to write a virtual watchdog device, which
> is part of the emulated cpu. This is part of an Infineon
> Tricore Implementation I am writing. For simplicities sake
> I use qemu user mode, to test my translation.
> How could I implement such a watchdog device into qemu?
> Can you give me some clues? Is it even possible to use a
> device in user mode?

To a first approximation, no, you can't use devices in
user-mode. (Is the device really accessible to a normal
Linux process running on this hardware? Usually the
kernel would make it only kernel-accessible with the MMU.)

-- PMM



[Qemu-devel] Watchdog device in Qemu user mode

2013-07-18 Thread kbastian

Hey everybody,

I am fairly new to qemu development. So here is my question:
I would like to write a virtual watchdog device, which is part of the  
emulated cpu. This is part of an Infineon Tricore Implementation I am  
writing. For simplicities sake I use qemu user mode, to test my  
translation. How could I implement such a watchdog device into qemu?  
Can you give me some clues? Is it even possible to use a device in  
user mode?


Best regards

Bastian








Re: [Qemu-devel] [PATCH v2 0/6] Clean up bogus default boot order

2013-07-18 Thread Markus Armbruster
Ping?

Markus Armbruster  writes:

> This is on top of "[PATCH v4 00/12] Boot order tests", so it's
> protected by these tests.
>
> The first five patches are admittedly related to the stated purpose of
> this series pretty much only by "I can't stand perpetuating this
> stupid crap".  Max Filippov and Peter Maydell already cleaned up
> Xtensa and ARM the same way.
>
> v2:
> * Straightforward rebase
> * PATCH 6 improved commit message (Alex)
>
> Markus Armbruster (6):
>   pc: Don't prematurely explode QEMUMachineInitArgs
>   pc: Don't explode QEMUMachineInitArgs into local variables needlessly
>   sun4: Don't prematurely explode QEMUMachineInitArgs
>   ppc: Don't explode QEMUMachineInitArgs into local variables
> needlessly
>   ppc: Don't duplicate QEMUMachineInitArgs in PPCE500Params
>   hw: Clean up bogus default boot order
>
>  hw/alpha/dp264.c |   1 -
>  hw/arm/collie.c  |   1 -
>  hw/arm/exynos4_boards.c  |   2 -
>  hw/arm/gumstix.c |   2 -
>  hw/arm/highbank.c|   1 -
>  hw/arm/integratorcp.c|   1 -
>  hw/arm/kzm.c |   1 -
>  hw/arm/mainstone.c   |   1 -
>  hw/arm/musicpal.c|   1 -
>  hw/arm/nseries.c |   6 +-
>  hw/arm/omap_sx1.c|   2 -
>  hw/arm/palm.c|   1 -
>  hw/arm/realview.c|   4 -
>  hw/arm/spitz.c   |   4 -
>  hw/arm/stellaris.c   |   2 -
>  hw/arm/tosa.c|   1 -
>  hw/arm/versatilepb.c |   2 -
>  hw/arm/vexpress.c|   2 -
>  hw/arm/xilinx_zynq.c |   1 -
>  hw/arm/z2.c  |   1 -
>  hw/core/null-machine.c   |   1 -
>  hw/cris/axis_dev88.c |   1 -
>  hw/i386/pc_piix.c|  89 +++--
>  hw/i386/pc_q35.c |  28 +++
>  hw/i386/xen_machine_pv.c |   1 -
>  hw/lm32/lm32_boards.c|   2 -
>  hw/lm32/milkymist.c  |   1 -
>  hw/m68k/an5206.c |   1 -
>  hw/m68k/dummy_m68k.c |   1 -
>  hw/m68k/mcf5208.c|   1 -
>  hw/microblaze/petalogix_ml605_mmu.c  |   1 -
>  hw/microblaze/petalogix_s3adsp1800_mmu.c |   1 -
>  hw/mips/mips_fulong2e.c  |   1 -
>  hw/mips/mips_jazz.c  |   2 -
>  hw/mips/mips_malta.c |   1 -
>  hw/mips/mips_mipssim.c   |   1 -
>  hw/mips/mips_r4k.c   |   1 -
>  hw/openrisc/openrisc_sim.c   |   1 -
>  hw/ppc/e500.c|  35 +
>  hw/ppc/e500.h|  13 +--
>  hw/ppc/e500plat.c|  15 +---
>  hw/ppc/mac_newworld.c|   4 +-
>  hw/ppc/mac_oldworld.c|   4 +-
>  hw/ppc/mpc8544ds.c   |  15 +---
>  hw/ppc/ppc405_boards.c   |   2 -
>  hw/ppc/ppc440_bamboo.c   |   1 -
>  hw/ppc/prep.c|   4 +-
>  hw/ppc/spapr.c   |   4 +-
>  hw/ppc/virtex_ml507.c|   1 -
>  hw/s390x/s390-virtio-ccw.c   |   1 -
>  hw/s390x/s390-virtio.c   |   1 -
>  hw/sh4/r2d.c |   1 -
>  hw/sh4/shix.c|   1 -
>  hw/sparc/leon3.c |   1 -
>  hw/sparc/sun4m.c | 131 
> ---
>  hw/sparc64/sun4u.c   |  58 +-
>  hw/unicore32/puv3.c  |   1 -
>  hw/xtensa/xtensa_lx60.c  |   2 -
>  hw/xtensa/xtensa_sim.c   |   1 -
>  include/hw/boards.h  |   7 +-
>  vl.c |   4 +-
>  61 files changed, 130 insertions(+), 349 deletions(-)



Re: [Qemu-devel] [PATCH v4 00/12] Boot order tests

2013-07-18 Thread Markus Armbruster
Ping?

Markus Armbruster  writes:

> v4:
> * Old PATCH 1-6 got merged, only tests left; cover letter changed
>   accordingly
> * PATCH 1,6 unchanged
> * PATCH 2 fix system-reset race (Anthony)
> * New PATCH 4,11 to make libqos/fw_cfg usable for the ppc, Sun4 tests
> * New PATCH 3 shanghaied from Anthony, to make PATCH 4 compile without
>   make clean.
> * PATCH 5,10,12 use libqos/fw_cfg (Anthony)
> * PATCH 7-9 rebased, minor tweaks
> v3:
> * Rebased, with only trivial conflicts
> * PATCH 08 cosmetic improvements
> * More test cases: new PATCH 09-16
> v2:
> * New PATCH 7 to make testing -boot once possible
> * Old PATCH 5 reworked and extended became PATCH 8
> * Writing more tests uncovered -no-fd-bootchk weirdness, cleaned up in
>   new PATCH 5+6
>
> Andreas Färber (1):
>   boot-order-test: Add tests for PowerMacs
>
> Anthony Liguori (1):
>   libqos: include dependencies
>
> Markus Armbruster (10):
>   qtest: Don't reset on qtest chardev connect
>   boot-order-test: New; covering just PC for now
>   libqos: Add support for memory-mapped fw_cfg
>   boot-order-test: Cover -boot once in ppc tests
>   boot-order-test: Better separate target-specific and generic parts
>   boot-order-test: Code motion for better readability
>   boot-order-test: Add tests for PowerPC PREP
>   boot-order-test: Add tests for Sun4m
>   libqos: Generalize I/O-mapped fw_cfg
>   boot-order-test: Add tests for Sun4u
>
>  qtest.c  |   7 +-
>  tests/Makefile   |   7 +-
>  tests/boot-order-test.c  | 209 
> +++
>  tests/fw_cfg-test.c  |   2 +-
>  tests/libqos/fw_cfg-pc.c |  40 -
>  tests/libqos/fw_cfg-pc.h |  20 -
>  tests/libqos/fw_cfg.c|  55 +
>  tests/libqos/fw_cfg.h|   9 ++
>  tests/libqos/malloc-pc.c |   2 +-
>  9 files changed, 287 insertions(+), 64 deletions(-)
>  create mode 100644 tests/boot-order-test.c
>  delete mode 100644 tests/libqos/fw_cfg-pc.c
>  delete mode 100644 tests/libqos/fw_cfg-pc.h



Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6

2013-07-18 Thread Paolo Bonzini
Il 18/07/2013 18:35, Eduardo Otubo ha scritto:
> 
> 
> On 07/18/2013 01:28 PM, Anthony Liguori wrote:
>> Eduardo Otubo  writes:
>>
>>> Hello all,
>>>
>>> In this small patch series I basically:
>>
>> Cover letter should be marked [PATCH 0/2].  Otherwise it defeats
>> filtering.
>>
>> Would like to see a Reviewed-by from someone before applying this.
> 
> I'm running some tests with qemu && xen, I'll post a v3 by the end of
> the day. I'll format the cover letter in the correct way next time.

I feel that, at some point, grep and code review must trump experiments...

Paul, how did you guys handle this in other projects?

Paolo




Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6

2013-07-18 Thread Eduardo Otubo



On 07/18/2013 01:28 PM, Anthony Liguori wrote:

Eduardo Otubo  writes:


Hello all,

In this small patch series I basically:


Cover letter should be marked [PATCH 0/2].  Otherwise it defeats
filtering.

Would like to see a Reviewed-by from someone before applying this.


I'm running some tests with qemu && xen, I'll post a v3 by the end of 
the day. I'll format the cover letter in the correct way next time.


Thanks,



Regards,

Anthony Liguori



   v2 update:
   - set libseccomp 2.1.0 as requirement on configure script.
   - removed setrlimit and added sendfile64 to the whitelist.

1) Remove the ifdef's for the (not so) new libseccomp version that does 
a
   best effort and translates x86_32 syscalls into x86_64 when possible.

   2) Remove unused syscalls on the seccomp whitelist. For that removal, I've 
been
   running several instances of Qemu using a script written on top of
   virt-test[0]. After some weeks testing I could come up with this small list,
   and safely remove them without breaking anything.

[0] - https://github.com/autotest/virt-test/wiki




--
Eduardo Otubo
IBM Linux Technology Center




Re: [Qemu-devel] [PATCH] util/iov: Fix -O1 uninitialized variable warning

2013-07-18 Thread Peter Maydell
On 18 July 2013 17:14, Richard Henderson  wrote:
> At -O2, code in the form
>
>   if (p) A; B; if (p) C;
>
> may be rearranged via "jump threading" into
>
>   if (p) { A; B; C; } else { B; }
>
> But at -O1 this doesn't happen and we -Werror out here on
> the "may be used uninitialized" orig_len.  Perform this transform
> by hand so that -O1 remains a viable debugging alternative.
>
> Signed-off-by: Richard Henderson 

This is the same issue fixed by this (reviewed but
never applied) patch from June, isn't it?

http://patchwork.ozlabs.org/patch/251410/

thanks
-- PMM



Re: [Qemu-devel] [RFC 0/3] Determinitic behaviour with icount.

2013-07-18 Thread Paolo Bonzini
Il 18/07/2013 18:31, Frederic Konrad ha scritto:
> On 18/07/2013 17:35, Paolo Bonzini wrote:
>> Il 18/07/2013 17:06, Peter Maydell ha scritto:
>>> On 18 July 2013 16:02,   wrote:
 As I said in the last email, we have issues with determinism with
 icount.
 We are wondering if determinism is really ensured with icount?
>>> My opinion is that it *should* be deterministic but it would
>>> be unsurprising if the determinism had got broken along the way.
>> First of all, it can only be deterministic if the guest satisfies (at
>> least) all the following condition:
>>
>> 1) only uses timer that QEMU bases on vm_clock (which means that you
>> should use "-rtc clock=vm"---sorry Fred, didn't think about this in the
>> previous answer);
> 
> Oops sorry, I didn't mentioned that, but we used rtc clock=vm for our
> tests.
>> 2) never does any network operation nor any asynchronous disk I/O
>> operation
>>
>> 3) never halts the VCPU waiting for an interrupt
>>
>>
>> Point 1 is obvious.
>>
>>
>> To explain points 2, let's consider what happens if a block device uses
>> synchronous vs. asynchronous I/O.
>>
>> With synchronous I/O, each block device operation will complete
>> immediately.  All clocks are stalled during the operation.
>>
>> With asynchronous I/O, each block device operation will be done while
>> the CPU is running.  If the CPU is polling a completion flag, the number
>> of instructions executed (thus icount) depends on how long it takes to
>> do I/O.
> 
> So I suppose this can happen even if there are any network card or block
> device.
> 
> We probably need to disable it until we finally save and replay IO, to
> get this thing working.

Are you aware of the work that was done on fault tolerance (Kemari)?
Orit is working on resurrecting it.

Paolo



Re: [Qemu-devel] [RFC 0/3] Determinitic behaviour with icount.

2013-07-18 Thread Frederic Konrad

On 18/07/2013 17:35, Paolo Bonzini wrote:

Il 18/07/2013 17:06, Peter Maydell ha scritto:

On 18 July 2013 16:02,   wrote:

As I said in the last email, we have issues with determinism with icount.
We are wondering if determinism is really ensured with icount?

My opinion is that it *should* be deterministic but it would
be unsurprising if the determinism had got broken along the way.

First of all, it can only be deterministic if the guest satisfies (at
least) all the following condition:

1) only uses timer that QEMU bases on vm_clock (which means that you
should use "-rtc clock=vm"---sorry Fred, didn't think about this in the
previous answer);


Oops sorry, I didn't mentioned that, but we used rtc clock=vm for our tests.

2) never does any network operation nor any asynchronous disk I/O operation

3) never halts the VCPU waiting for an interrupt


Point 1 is obvious.


To explain points 2, let's consider what happens if a block device uses
synchronous vs. asynchronous I/O.

With synchronous I/O, each block device operation will complete
immediately.  All clocks are stalled during the operation.

With asynchronous I/O, each block device operation will be done while
the CPU is running.  If the CPU is polling a completion flag, the number
of instructions executed (thus icount) depends on how long it takes to
do I/O.


So I suppose this can happen even if there are any network card or block 
device.


We probably need to disable it until we finally save and replay IO, to 
get this thing

working.




To explain point 3 (which is the only one that _might_ be fixable),
let's see what happens if the VCPU halts waiting for an interrupt.  If
that is the case, and you haven't done any asynchronous I/O, there
should be active vm_clock timers, and you have another possible source
of non-deterministic behavior.

The current QEMU behavior is (and has always been) to start tracking
rt_clock.  This is obviously not deterministic.  Note that with the
switch to separate threads for iothread/VCPU, the algorithm to do this
has become much better.  Let's look at a couple possibilities:

2) jump to the next vm_clock deadline.  This sounds appealing, but it is
still nondeterministic in the general case when the guest *is* doing
asynchronous I/O too.  How many vm_clock timers do you run before I/O
finishes?  Furthermore, the vm_clock might move too fast.  Think of an
RTC clock whose alarm registers are 0/0/0 so it fires at midnight; if it
is the only active vm_clock timer, you end up in 2107 even before the
kernel boots!


Yes I didn't think about that :).


3) do not process vm_clock timers at all unless there is no pending I/O
(block/network); if there is none, track rt_clock as in current
behavior.  I just made it up, but it sounds promising and similar to
synchronous I/O.  It should not be extremely hard to implement, and it
can remove this kind of nondeterminism.  But it won't fix the case when
the CPU is polling.


Thanks, I need to take a look at all this.


Paolo

ps: I'm not an expert on icount at all, I'm only reasoning of the
possible interactions with the main loop.


Both icount and reverse execution need an instruction counter. icount use a
count-down mechanism but reverse execution need a continuous counter. For now
we have build a separate counter and we think that these two counters can be
merged. However we would like feedback about this before modifying this.

I definitely think that there should only be one counter, not two.

thanks
-- PMM






Re: [Qemu-devel] [PATCH v2 0/8] Guest memory allocation fixes & cleanup

2013-07-18 Thread Markus Armbruster
Ping?

Markus Armbruster  writes:

> All I wanted to do is exit(1) instead of abort() on guest memory
> allocation failure [07/08].  But that lead me into a minor #ifdef bog,
> and here's what I brought back.  Enjoy!
>
> Testing:
> * Christian Borntraeger reports v1 works fine under LPAR (new S390
>   KVM, i.e. generic allocation) and as second guest under z/VM (old
>   S390 KVM, i.e. legacy S390 allocation).  Thanks for testing, and for
>   catching a stupid mistake.  v2 differs from v1 only in code that
>   isn't reachable on S390.
>
> Changes since v1:
> * 5/8: Fix assertion in qemu_ram_remap() (Paolo)
> * All other patches unchanged except for Acked-by in commit messages
> Changes since RFC:
> * 1-3+8/8 unchanged except for commit message tweaks
> * 4+6/8 rewritten to address Paolo's review
> * 5/8 rewritten: don't fix dead code, just assert it's dead
> * 7/8 fix mistakes caught by Richard Henderson and Peter Maydell
>
> Markus Armbruster (8):
>   exec: Fix Xen RAM allocation with unusual options
>   exec: Clean up fall back when -mem-path allocation fails
>   exec: Reduce ifdeffery around -mem-path
>   exec: Simplify the guest physical memory allocation hook
>   exec: Drop incorrect & dead S390 code in qemu_ram_remap()
>   exec: Clean up unnecessary S390 ifdeffery
>   exec: Don't abort when we can't allocate guest memory
>   pc_sysfw: Fix ISA BIOS init for ridiculously big flash
>
>  exec.c  | 121 
> ++--
>  hw/block/pc_sysfw.c |   5 +-
>  include/exec/cpu-all.h  |   2 -
>  include/exec/exec-all.h |   2 +
>  include/sysemu/kvm.h|   5 --
>  kvm-all.c   |  13 --
>  target-s390x/kvm.c  |  23 +++--
>  util/oslib-posix.c  |   4 +-
>  util/oslib-win32.c  |   5 +-
>  9 files changed, 78 insertions(+), 102 deletions(-)



[Qemu-devel] large size PCI windows vs ivshmem vs windows guests

2013-07-18 Thread Michael S. Tsirkin
Currently seabios simply looks for 64 bit BARs and sums them up
to get the size of the region.
The result is reported in the CRS method in ACPI tables.
But this means that e.g. hotplug is broken - if we try to
add a huge ivshmem device there won't be place for it
in the 32 bit PCI hole, and a 64 bit one isn't declared.

So we really should declare some 64 bit windows,
but the question is - what's a good value for CRS?

It would appear that we should be able to declare
a very large window in CRS and be done with it.
Works fine for linux, but it turns out adding the following in CRS is
enough to make e.g. win7 x86 crash on boot:

diff --git a/src/acpi-dsdt-pci-crs.dsl b/src/acpi-dsdt-pci-crs.dsl
index d421891..405859d 100644
--- a/src/acpi-dsdt-pci-crs.dsl
+++ b/src/acpi-dsdt-pci-crs.dsl
@@ -36,6 +36,13 @@ Scope(\_SB.PCI0) {
 0x, // Address Translation Offset
 0x0002, // Address Length
 ,, , AddressRangeMemory, TypeStatic)
+QWordMemory(ResourceProducer, PosDecode, MinFixed, MaxFixed, 
Cacheable, ReadWrite,
+0x,  // Address Space Granularity
+0x80,// Address Range Minimum
+0x80,// Address Range Maximum
+0x,  // Address Translation Offset
+0x01,// Address Length
+,, , AddressRangeMemory, TypeStatic)
 DWordMemory(ResourceProducer, PosDecode, MinFixed, MaxFixed, 
NonCacheable, ReadWrite,
 0x, // Address Space Granularity
 0xE000, // Address Range Minimum


The issue is the Length field - if it's 0x008000 or
less win7 32 bit boots (an interesting tidbit - it seems
possible to declare more than one 2G resource,
but each one needs to be at most 2G - did they use some
signed integer somewhere?).

I'd like to propose the following:
- declare a safe value (2G) in CRS by default
you won't be able to use ivshmem with huge sizes
we can detect common cases and warn users
- add a flag controlling size of 64 bit window declared

I am guessing most people use <2G shared memory so won't
be affected.

Comments?

-- 
MST



Re: [Qemu-devel] [PATCH v2 0/2] libqtest leak fix & cleanup

2013-07-18 Thread Markus Armbruster
Ping?

Markus Armbruster  writes:

> v2: qtest_start() function comment
>
> Markus Armbruster (2):
>   libqtest: Plug fd and memory leaks in qtest_quit()
>   libqtest: New qtest_end() to go with qtest_start()
>
>  tests/fdc-test.c|  2 +-
>  tests/hd-geo-test.c |  8 
>  tests/ide-test.c|  2 +-
>  tests/libqtest.c|  4 
>  tests/libqtest.h| 12 
>  5 files changed, 22 insertions(+), 6 deletions(-)



Re: [Qemu-devel] seccomp: remove unused syscalls - for 1.6

2013-07-18 Thread Anthony Liguori
Eduardo Otubo  writes:

> Hello all,
>
> In this small patch series I basically:

Cover letter should be marked [PATCH 0/2].  Otherwise it defeats
filtering.

Would like to see a Reviewed-by from someone before applying this.

Regards,

Anthony Liguori

>
>   v2 update:
>   - set libseccomp 2.1.0 as requirement on configure script.
>   - removed setrlimit and added sendfile64 to the whitelist.
>
>   1) Remove the ifdef's for the (not so) new libseccomp version that does 
> a
>   best effort and translates x86_32 syscalls into x86_64 when possible.
>
>   2) Remove unused syscalls on the seccomp whitelist. For that removal, I've 
> been
>   running several instances of Qemu using a script written on top of
>   virt-test[0]. After some weeks testing I could come up with this small list,
>   and safely remove them without breaking anything.
>
> [0] - https://github.com/autotest/virt-test/wiki



Re: [Qemu-devel] [RFC 1/3] icount: base rt_clock on icount.

2013-07-18 Thread Paolo Bonzini
Il 18/07/2013 18:23, Frederic Konrad ha scritto:
> On 18/07/2013 17:36, Paolo Bonzini wrote:
>> Il 18/07/2013 17:02, fred.kon...@greensocs.com ha scritto:
>>> From: KONRAD Frederic 
>>>
>>> This bases rt_clock on icount, as vm_clock.
>>> So vm_clock = rt_clock.
>>>
>>> Signed-off-by: KONRAD Frederic 
>>> ---
>>>   qemu-timer.c | 6 +-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qemu-timer.c b/qemu-timer.c
>>> index b2d95e2..6c607e5 100644
>>> --- a/qemu-timer.c
>>> +++ b/qemu-timer.c
>>> @@ -401,7 +401,11 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>>> switch(clock->type) {
>>>   case QEMU_CLOCK_REALTIME:
>>> -return get_clock();
>>> +if (use_icount) {
>>> +return cpu_get_icount();
>>> +} else {
>>> +return get_clock();
>>> +}
>>>   default:
>>>   case QEMU_CLOCK_VIRTUAL:
>>>   if (use_icount) {
>>>
>> rt_clock is very little used in general.  You should use "-rtc clock=vm"
>> if you want to base the RTC on vm_clock.
>>
>> Paolo
> 
> True but it seems used in some place:
> 
> For example: ui/console.c:
> ds->gui_timer = qemu_new_timer_ms(rt_clock, gui_update, ds);
> 
> Maybe it can cause trouble no?

In theory it is only used in places where it shouldn't cause trouble
(those that do should use the similarly named rtc_clock variable).

Paolo




Re: [Qemu-devel] [RFC 1/3] icount: base rt_clock on icount.

2013-07-18 Thread Frederic Konrad

On 18/07/2013 17:36, Paolo Bonzini wrote:

Il 18/07/2013 17:02, fred.kon...@greensocs.com ha scritto:

From: KONRAD Frederic 

This bases rt_clock on icount, as vm_clock.
So vm_clock = rt_clock.

Signed-off-by: KONRAD Frederic 
---
  qemu-timer.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index b2d95e2..6c607e5 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -401,7 +401,11 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
  
  switch(clock->type) {

  case QEMU_CLOCK_REALTIME:
-return get_clock();
+if (use_icount) {
+return cpu_get_icount();
+} else {
+return get_clock();
+}
  default:
  case QEMU_CLOCK_VIRTUAL:
  if (use_icount) {


rt_clock is very little used in general.  You should use "-rtc clock=vm"
if you want to base the RTC on vm_clock.

Paolo


True but it seems used in some place:

For example: ui/console.c:
ds->gui_timer = qemu_new_timer_ms(rt_clock, gui_update, ds);

Maybe it can cause trouble no?

Fred



[Qemu-devel] [PATCH] util/iov: Fix -O1 uninitialized variable warning

2013-07-18 Thread Richard Henderson
At -O2, code in the form

  if (p) A; B; if (p) C;

may be rearranged via "jump threading" into

  if (p) { A; B; C; } else { B; }

But at -O1 this doesn't happen and we -Werror out here on
the "may be used uninitialized" orig_len.  Perform this transform
by hand so that -O1 remains a viable debugging alternative.

Signed-off-by: Richard Henderson 
---
 util/iov.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/util/iov.c b/util/iov.c
index cc6e837..a92eb3a 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -146,7 +146,7 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
unsigned iov_cnt,
 {
 ssize_t total = 0;
 ssize_t ret;
-size_t orig_len, tail;
+size_t tail;
 unsigned niov;
 
 while (bytes > 0) {
@@ -174,21 +174,22 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
unsigned iov_cnt,
 for (niov = 0; niov < iov_cnt && iov[niov].iov_len <= tail; ++niov) {
 tail -= iov[niov].iov_len;
 }
+
 if (tail) {
 /* second, fixup the last element, and remember the original
  * length */
+size_t orig_len = iov[niov].iov_len;
 assert(niov < iov_cnt);
-assert(iov[niov].iov_len > tail);
-orig_len = iov[niov].iov_len;
+assert(orig_len > tail);
 iov[niov++].iov_len = tail;
-}
 
-ret = do_send_recv(sockfd, iov, niov, do_send);
+ret = do_send_recv(sockfd, iov, niov, do_send);
 
-/* Undo the changes above before checking for errors */
-if (tail) {
 iov[niov-1].iov_len = orig_len;
+} else {
+ret = do_send_recv(sockfd, iov, niov, do_send);
 }
+
 if (offset) {
 iov[0].iov_base -= offset;
 iov[0].iov_len += offset;
-- 
1.8.1.4




Re: [Qemu-devel] [RFC 2/6] OptsVisitor: introduce list modes for interval flattening

2013-07-18 Thread Laszlo Ersek
On 07/18/13 16:56, Paolo Bonzini wrote:
> Il 18/07/2013 15:59, Laszlo Ersek ha scritto:
>> The new modes are equal-rank, exclusive sub-modes of LM_IN_PROGRESS. Teach
>> opts_next_list(), opts_type_int() and opts_type_uint64() to handle them.
> 
> Perhaps you could use a bitmap then:
> 
>  LM_NONE = 0
>  LM_STARTED = 1
>  LM_IN_PROGRESS = 2
>  LM_SIGNED_INTERVAL = LM_IN_PROGRESS | 4
>  LM_UNSIGNED_INTERVAL = LM_IN_PROGRESS | 8
> 
> I think the only change would be that this hunk:
> 
>> @@ -211,7 +238,10 @@ opts_end_list(Visitor *v, Error **errp)
>>  {
>>  OptsVisitor *ov = DO_UPCAST(OptsVisitor, visitor, v);
>>  
>> -assert(ov->list_mode == LM_STARTED || ov->list_mode == LM_IN_PROGRESS);
>> +assert(ov->list_mode == LM_STARTED ||
>> +   ov->list_mode == LM_IN_PROGRESS ||
>> +   ov->list_mode == LM_SIGNED_INTERVAL ||
>> +   ov->list_mode == LM_UNSIGNED_INTERVAL);
>>  ov->repeated_opts = NULL;
>>  ov->list_mode = LM_NONE;
>>  }
> 
> could be changed to
> 
>   assert(ov->list_mode == LM_STARTED ||
>(ov->list_mode & LM_IN_PROGRESS));

If you don't insist (please don't :)), I wouldn't like to do this.

(a) I wanted to represent and query each individual mode explicitly
(helps with code search),

(b) the "sub-mode" nature is a theoretical thing. It only applies to the
two interval modes. This small orthogonality is limited, it doesn't
cause "combinatorial explosion" (I didn't have to double the states or
such). Most importantly, I specifically wanted one state in general to
exclude any other state. Bitmaps beget thinking about the meaning of
arbitrary variations, like 1|4, 0|2 etc; I intended to prevent such
thoughts right in the type.

Thanks
Laszlo



  1   2   3   >