date:20180814

Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions

2018-08-14 Thread Denis Plotnikov





On 13.08.2018 19:30, Kevin Wolf wrote:

Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:

Ping ping!

On 16.07.2018 21:59, John Snow wrote:



On 07/16/2018 11:01 AM, Denis Plotnikov wrote:

Ping!



I never saw a reply to Stefan's question on July 2nd, did you reply
off-list?

--js

Yes, I did. I talked to Stefan why the patch set appeared.


The rest of us still don't know the answer. I had the same question.

Kevin

Yes, that's my fault. I should have post it earlier.

I reviewed the problem once again and come up with the following 
explanation.
Indeed, if the global lock has been taken by the main thread the vCPU 
threads won't be able to execute mmio ide.

But, if the main thread will release the lock then nothing will prevent
vCPU threads form execution what they want, e.g writing to the block device.

In case of running the mirroring it is possible. Let's take a look
at the following snippet of mirror_run. This is a part the mirroring 
completion part.


bdrv_drained_begin(bs);
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>  if (cnt > 0 || mirror_flush(s) < 0) {
bdrv_drained_end(bs);
continue;
}

(X) assert(QLIST_EMPTY(&bs->tracked_requests));

mirror_flush here can yield the current coroutine so nothing more can be 
executed.
We could end up with the situation when the main loop have to revolve to 
poll for another timer/bh to process. While revolving it releases the 
global lock. If the global lock is waited for by a vCPU (any other) 
thread, the waiting thread will get the lock and make what it intends.


This is something that I can observe:

mirror_flush yields coroutine, the main thread revolves and locks 
because a vCPU was waiting for the lock. Now the vCPU thread owns the 
lock and the main thread waits for the lock releasing.

The vCPU thread does cmd_write_dma and releases the lock. Then, the main
thread gets the lock and continues to run eventually proceeding with the 
coroutine yeiled.
If the vCPU requests aren't completed by the moment we will assert at 
(X). If the vCPU requests are completed we won't even notice that we had 
some writes while in the drained section.


Denis



On 29.06.2018 15:40, Denis Plotnikov wrote:

There are cases when a request to a block driver state shouldn't have
appeared producing dangerous race conditions.
This misbehaviour is usually happens with storage devices emulated
without eventfd for guest to host notifications like IDE.

The issue arises when the context is in the "drained" section
and doesn't expect the request to come, but request comes from the
device not using iothread and which context is processed by the main
loop.

The main loop apart of the iothread event loop isn't blocked by the
"drained" section.
The request coming and processing while in "drained" section can spoil
the
block driver state consistency.

This behavior can be observed in the following KVM-based case:

1. Setup a VM with an IDE disk.
2. Inside a VM start a disk writing load for the IDE device
     e.g: dd if= of= bs=X count=Y oflag=direct
3. On the host create a mirroring block job for the IDE device
     e.g: drive_mirror  
4. On the host finish the block job
     e.g: block_job_complete 
    Having done the 4th action, you could get an assert:
assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
On my setup, the assert is 1/3 reproducible.

The patch series introduces the mechanism to postpone the requests
until the BDS leaves "drained" section for the devices not using
iothreads.
Also, it modifies the asynchronous block backend infrastructure to use
that mechanism to release the assert bug for IDE devices.

Denis Plotnikov (2):
     async: add infrastructure for postponed actions
     block: postpone the coroutine executing if the BDS's is drained

    block/block-backend.c | 58 ++-
    include/block/aio.h   | 63 +++
    util/async.c  | 33 +++
    3 files changed, 142 insertions(+), 12 deletions(-)





--
Best,
Denis


--
Best,
Denis

Re: [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser

2018-08-14 Thread Markus Armbruster

Eric Blake  writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Both lexer and parser reject invalid interpolation specifications.
>> The parser's check is useless.
>>
>> The lexer ends the token right after the first bad character.  This
>> tends to lead to suboptimal error reporting.  For instance, input
>>
>>  [ %11d ]
>
> With no context of other characters on this line, it took me a while
> to notice that this was '1' (one) not 'l' (ell), even though my
> current font renders the two quite distinctly. (And now you know why
> base32 avoids 0/1 in its alphabet, to minimize confusion with O/l -
> see RFC 4648).  A better example might be %22d.

I made up an example that could plausibly escape review.  I guess the
realism is too... realistic.  Also, not really helpful.

>> produces the tokens
>>
>>  JSON_LSQUARE  [
>>  JSON_ERROR%1
>>  JSON_INTEGER  1
>
> And that's in spite of the context you have here, which makes it
> obvious that the parser saw an integer.
>
>>  JSON_KEYWORD  d
>>  JSON_RSQUARE  ]
>>
>> The parser then yields an error, an object and two more errors:
>>
>>  error: Invalid JSON syntax
>>  object: 1
>>  error: JSON parse error, invalid keyword
>>  error: JSON parse error, expecting value
>>
>> Change the lexer to accept [A-Za-z0-9]*[duipsf].  It now produces
>
> That regex doesn't match the code.

Left over from a previous, unpublished iteration.  I'll fix it.

>>
>>  JSON_LSQUARE  [
>>  JSON_INTERPOLATION %11d
>>  JSON_RSQUARE  ]
>>
>> and the parser reports just
>>
>>  JSON parse error, invalid interpolation '%11d'
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>   qobject/json-lexer.c  | 52 +--
>>   qobject/json-parser.c |  1 +
>>   2 files changed, 11 insertions(+), 42 deletions(-)
>>
>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
>> index 0ea1eae4aa..7a82aab88b 100644
>> --- a/qobject/json-lexer.c
>> +++ b/qobject/json-lexer.c
>> @@ -93,7 +93,8 @@
>>*   (apostrophe) instead of %x22 (quotation mark), and can't contain
>>*   unescaped apostrophe, but can contain unescaped quotation mark.
>>* - Interpolation:
>> - *   interpolation = %((l|ll|I64)[du]|[ipsf])
>> + *   The lexer accepts [A-Za-z0-9]*, and leaves rejecting invalid ones
>> + *   to the parser.
>
> This comment is more apropos.  But is it worth spelling "The lexer
> accepts %[A-Za-z0-9]*",

Yes.

> and/or documenting that recognizing
> interpolation during lexing is now optional (thanks to the previous
> patch)?

Should be done right when I make it optional, in PATCH 37, perhaps like
this:

@@ -92,7 +92,7 @@
  *   Like double-quoted strings, except they're delimited by %x27
  *   (apostrophe) instead of %x22 (quotation mark), and can't contain
  *   unescaped apostrophe, but can contain unescaped quotation mark.
- * - Interpolation:
+ * - Interpolation, if enabled:
  *   interpolation = %((l|ll|I64)[du]|[ipsf])
  *
  * Note:

>> @@ -278,6 +238,14 @@ static const uint8_t json_lexer[][256] =  {
>>   ['\n'] = IN_WHITESPACE,
>>   },
>>   +/* interpolation */
>> +[IN_INTERPOL] = {
>> +TERMINAL(JSON_INTERPOL),
>> +['A' ... 'Z'] = IN_INTERPOL,
>> +['a' ... 'z'] = IN_INTERPOL,
>> +['0' ... '9'] = IN_INTERPOL,
>> +},
>> +
>
> Note that we still treat code like "'foo': %#x" as an invalid
> interpolation request (it would be a valid printf format), while at
> the same time passing "%1" on to the parser (which is not a valid
> printf format, so -Wformat would have flagged it).  In fact, "%d1"
> which is valid for printf as "%d" followed by a literal "1" gets
> thrown to the parser as one chunk - but it is not valid in JSON to
> have %d adjacent to another integer any more than it is to reject
> "%d1" as an unknown sequence.  I don't think that catering to the
> remaining printf metacharacters (' ', '#', '\'', '-', '*', '+') nor
> demanding that things end on a letter is worth the effort, since it
> just makes the lexer more complicated when your goal was to do as
> little as possible to get things thrown over to the parser.

Yes.

The goal is to catch all mistakes safely, and as many as practical at
compile time.

Invalid interpolation specifications get rejected at run time, by
parse_interpolation().  We can't check them at compile time.

We can't check the variadic arguments match the interpolation
specifications at run time.  We can enlist gcc to check at compile time.
The only remaining hole is '%' in strings.  I intend to plug it.

> So even though your lexing is now somewhat different from printf, I
> don't see that as a serious drawback (the uses we care about are still
> -Wformat clean, and the uses we don't like are properly handled in the
> parser).

Exactly.

> Perhaps you want to enhance the testsuite (earlier in the series) to
> cover "%22d", "%d1", "%1" as various unaccepted interpolation requests
> (if you di

Re: [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64

2018-08-14 Thread Markus Armbruster

Eric Blake  writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Support for %I64d got addded in commit 2c0d4b36e7f "json: fix PRId64
>
> s/addded/added/

Fixing...

>> on Win32".  We had to hard-code I64d because we used the lexer's
>> finite state machine to check interpolations.  No more, so clean this
>> up.
>>
>> Additional conversion specifications would be easy enough to implement
>> when needed.
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>   qobject/json-parser.c | 10 ++
>>   tests/check-qjson.c   | 10 ++
>>   2 files changed, 16 insertions(+), 4 deletions(-)
>>
>
> Reviewed-by: Eric Blake 
>
>>  return QOBJECT(qnum_from_int(va_arg(*ap, long)));
>> -} else if (!strcmp(token->str, "%lld") ||
>> -   !strcmp(token->str, "%I64d")) {
>> +} else if (!strcmp(token->str, "%lld")) {
>>  return QOBJECT(qnum_from_int(va_arg(*ap, long long)));
>> +} else if (!strcmp(token->str, "%" PRId64)) {
>> +return QOBJECT(qnum_from_int(va_arg(*ap, int64_t)));
>
> I had a double-take to make sure this still works on mingw. The trick
> used to be that the lexer had to parse the union of all forms
> understood by any libc (making Linux understand %I64d even though only
> mingw would generate it) then the parser had to accept all forms
> allowed through by the lexer.  Now the lexer accepts all forms with no
> effort (because it is no longer validates), and the parser is made
> stricter (%I64d no longer works on Linux, where we have two redundant
> 'if' clauses; but mingw has two distinct 'if' clauses and works as
> desired).

Exactly.  Thanks for checking!

[Qemu-devel] [PATCH] luks: Allow share-rw=on

2018-08-14 Thread Fam Zheng

Format drivers such as qcow2 don't allow sharing the same image between
two QEMU instances in order to prevent image corruptions, because of
metadata cache. LUKS driver don't modify metadata except for when
creating image, so it is safe to relax the permission. This makes
share-rw=on property work on virtual devices.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Fam Zheng 
---
 block/crypto.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 146d81c90a..33ee01bebd 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -627,7 +627,9 @@ BlockDriver bdrv_crypto_luks = {
 .bdrv_probe = block_crypto_probe_luks,
 .bdrv_open  = block_crypto_open_luks,
 .bdrv_close = block_crypto_close,
-.bdrv_child_perm= bdrv_format_default_perms,
+/* This driver doesn't modify LUKS metadata except when creating image.
+ * Allow share-rw=on as a special case. */
+.bdrv_child_perm= bdrv_filter_default_perms,
 .bdrv_co_create = block_crypto_co_create_luks,
 .bdrv_co_create_opts = block_crypto_co_create_opts_luks,
 .bdrv_co_truncate   = block_crypto_co_truncate,
-- 
2.17.1

Re: [Qemu-devel] [PATCH] luks: Allow share-rw=on

2018-08-14 Thread Daniel P . Berrangé

On Tue, Aug 14, 2018 at 03:25:51PM +0800, Fam Zheng wrote:
> Format drivers such as qcow2 don't allow sharing the same image between
> two QEMU instances in order to prevent image corruptions, because of
> metadata cache. LUKS driver don't modify metadata except for when
> creating image, so it is safe to relax the permission. This makes
> share-rw=on property work on virtual devices.
> 
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Fam Zheng 
> ---
>  block/crypto.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/crypto.c b/block/crypto.c
> index 146d81c90a..33ee01bebd 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -627,7 +627,9 @@ BlockDriver bdrv_crypto_luks = {
>  .bdrv_probe = block_crypto_probe_luks,
>  .bdrv_open  = block_crypto_open_luks,
>  .bdrv_close = block_crypto_close,
> -.bdrv_child_perm= bdrv_format_default_perms,
> +/* This driver doesn't modify LUKS metadata except when creating image.
> + * Allow share-rw=on as a special case. */
> +.bdrv_child_perm= bdrv_filter_default_perms,
>  .bdrv_co_create = block_crypto_co_create_luks,
>  .bdrv_co_create_opts = block_crypto_co_create_opts_luks,
>  .bdrv_co_truncate   = block_crypto_co_truncate,

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH 3/4] target/i386/translate: use thread-local storage in !user-mode

2018-08-14 Thread Emilio G. Cota

On Tue, Aug 14, 2018 at 08:31:02 +0200, Paolo Bonzini wrote:
> On 14/08/2018 03:38, Emilio G. Cota wrote:
> > Needed for MTTCG.
> > 
> > Signed-off-by: Emilio G. Cota 
> 
> Why not always use TLS, even in user-mode?

To avoid TLS waste; user-mode uses a single TCGContext,
so a single copy of these variables is all that's needed because
code generation is serialized with a lock.

If in user-mode we just have a few threads it'd be no big deal,
but apps can easily spawn thousands of threads.

E.

Re: [Qemu-devel] [PATCH v2] nvme: correct locking around completion

2018-08-14 Thread Paolo Bonzini

On 14/08/2018 08:45, Fam Zheng wrote:
> On Tue, 08/14 08:27, Paolo Bonzini wrote:
>> nvme_poll_queues is already protected by q->lock, and
>> AIO callbacks are invoked outside the AioContext lock.
>> So remove the acquire/release pair in nvme_handle_event.
>>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  block/nvme.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/block/nvme.c b/block/nvme.c
>> index 6f71122bf5..42116907ed 100644
>> --- a/block/nvme.c
>> +++ b/block/nvme.c
>> @@ -489,10 +489,8 @@ static void nvme_handle_event(EventNotifier *n)
>>  BDRVNVMeState *s = container_of(n, BDRVNVMeState, irq_notifier);
>>  
>>  trace_nvme_handle_event(s);
>> -aio_context_acquire(s->aio_context);
>>  event_notifier_test_and_clear(n);
>>  nvme_poll_queues(s);
>> -aio_context_release(s->aio_context);
>>  }
>>  
>>  static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
>> -- 
>> 2.17.1
>>
> 
> This patch and the other
> 
> [PATCH v2] nvme: simplify code around completion
> 
> only differ in subject. Which one to ignore? :)

"Correct locking" is better.  I thought I had hit Ctrl-C in time. :)

Paolo

Re: [Qemu-devel] [PATCH 1/3] qsp: QEMU's Synchronization Profiler

2018-08-14 Thread Paolo Bonzini

On 13/08/2018 19:11, Emilio G. Cota wrote:
> +struct qsp_report rep;

Don't like camelcase?  But that's really all that I have to remark on
this lovely series.

Paolo

Re: [Qemu-devel] [PATCH v2] file-posix: Skip effectiveless OFD lock operations

2018-08-14 Thread Fam Zheng

On Mon, 08/13 15:42, Kevin Wolf wrote:
> Am 13.08.2018 um 04:39 hat Fam Zheng geschrieben:
> > If we know we've already locked the bytes, don't do it again; similarly
> > don't unlock a byte if we haven't locked it. This doesn't change the
> > behavior, but fixes a corner case explained below.
> > 
> > Libvirt had an error handling bug that an image can get its (ownership,
> > file mode, SELinux) permissions changed (RHBZ 1584982) by mistake behind
> > QEMU. Specifically, an image in use by Libvirt VM has:
> > 
> > $ ls -lhZ b.img
> > -rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c600,c690 b.img
> > 
> > Trying to attach it a second time won't work because of image locking.
> > And after the error, it becomes:
> > 
> > $ ls -lhZ b.img
> > -rw-r--r--. root root system_u:object_r:virt_image_t:s0 b.img
> > 
> > Then, we won't be able to do OFD lock operations with the existing fd.
> > In other words, the code such as in blk_detach_dev:
> > 
> > blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort);
> > 
> > can abort() QEMU, out of environmental changes.
> > 
> > This patch is an easy fix to this and the change is regardlessly
> > reasonable, so do it.
> > 
> > Signed-off-by: Fam Zheng 
> 
> Thanks, applied to the block branch.

Self-NACK. This breaks raw_abort_perm_update(). The extra bytes locked by
raw_check_perm() are not tracked by s->perm (it is only updated in
raw_set_perm()), thus will not get released. This patch is "misusing" s->perm
and s->shared_perm.

I'll revise the implementation and send another version together with dropping
s->lock_fd.

Fam

Re: [Qemu-devel] [PATCH 2/3] monitor: show sync profiling info with 'info sync'

2018-08-14 Thread Paolo Bonzini

On 13/08/2018 19:11, Emilio G. Cota wrote:
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index 70639f656a..56a3249bad 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -297,6 +297,22 @@ STEXI
>  @item info opcount
>  @findex info opcount
>  Show dynamic compiler opcode counters
> +ETEXI
> +
> +{
> +.name   = "sync",
> +.args_type  = "max:i?",
> +.params = "[max]",
> +.help   = "show sync profiling info for "
> +   "up to max entries (default: 10). "
> +   "Entries are sorted by wait time.",
> +.cmd= hmp_info_sync,
> +},
> +
> +STEXI

Would it make sense to add a flag to sort by average wait time, and one
to coalesce all mutexes for the same call site?

Can be done separately of course.

Paolo

Re: [Qemu-devel] RAMBlocks and memory_region_init_ram_nomigrate

2018-08-14 Thread Dr. David Alan Gilbert

* Frank Yang (l...@google.com) wrote:
> Ah got it, thanks for the replies / info!
> 
> We're using a modified QEMU 2.12, and I don't see the migratable-only loops
> and field, so it either got missed in the rebase or was added after 2.12.

It went in after 2.12; it's b895de50 (followed by some cleanups to get
some we missed).

Dave

> Frank
> 
> On Mon, Aug 13, 2018 at 9:45 AM Dr. David Alan Gilbert 
> wrote:
> 
> > * Paolo Bonzini (pbonz...@redhat.com) wrote:
> > > On 13/08/2018 18:16, Frank Yang wrote:
> > > > Hi Paolo,
> > > >
> > > > I see that migration/ram.c saves RAMBlocks associated with memory
> > > > regions initialized with nomigrate. Is this intended?
> > >
> > > Probably the name and size of the RAMBlocks must match but the contents
> > > need not (but honestly I haven't looked at the code to find the answer).
> > >  CCing the qemu mailing list (always a good idea) and a couple people
> > > that might know.
> >
> > All the migration code should now be using RAMBLOCK_FOREACH_MIGRATABLE and
> > qemu_ram_is_migratable whenever it's iterating the ramblock list,
> > so that *shouldn't* happen these days.
> > Of course we could have messed it up somewhere; what are you seeing?
> >
> > Dave
> > > Paolo
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> >
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v7 6/9] qcow2: Increase the default upper limit on the L2 cache size

2018-08-14 Thread Kevin Wolf

Am 13.08.2018 um 18:42 hat Leonid Bloch geschrieben:
> > I don't actually think it's so bad to keep the cache permanently
> > allocated, but I wouldn't object to a lower default for non-Linux hosts
> > either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB)
> > might be more adequate. My typical desktop VMs are larger than 8 GB, but
> > smaller than 32 GB.
> 
> And for a Windows VM just the OS installation takes above 40 GB. While we
> probably are not running Windows VMs for our own needs, it is very common
> that a customer of, for example, some cloud service uses QEMU (unknowingly)
> for a full-blown Windows. So 100 GB+ images which are quite heavily used is
> not a rare scenario. 256 GB - yeah, that would be on the higher end.

The OS installation is mostly sequential access, though. You only need
that much cache when you have completely random I/O across the whole
image. Otherwise the LRU based approach of the cache is good enough to
keep those tables cached that are actually in use.

The maximum cache size is maybe for huge databases or indeed random I/O
benchmarks, both of which are important to support (on Linux at least),
but probably not the most common use case.

> So 16 MB would indeed be a reasonable default for the *max.* L2 cache now,
> although below that would be too little, I think. 32 MB - if we want some
> future-proofing.

I think we all agree that 32 MB + cache-clean-interval is okay.

It's just that for non-Linux guests, cache-clean-interval doesn't work.
However, we probably care less about those large random I/O cases there,
so a smaller cache size like 1 MB can do on non-Linux.

Kevin

Re: [Qemu-devel] [PATCH v2] file-posix: Skip effectiveless OFD lock operations

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 10:12 hat Fam Zheng geschrieben:
> On Mon, 08/13 15:42, Kevin Wolf wrote:
> > Am 13.08.2018 um 04:39 hat Fam Zheng geschrieben:
> > > If we know we've already locked the bytes, don't do it again; similarly
> > > don't unlock a byte if we haven't locked it. This doesn't change the
> > > behavior, but fixes a corner case explained below.
> > > 
> > > Libvirt had an error handling bug that an image can get its (ownership,
> > > file mode, SELinux) permissions changed (RHBZ 1584982) by mistake behind
> > > QEMU. Specifically, an image in use by Libvirt VM has:
> > > 
> > > $ ls -lhZ b.img
> > > -rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c600,c690 
> > > b.img
> > > 
> > > Trying to attach it a second time won't work because of image locking.
> > > And after the error, it becomes:
> > > 
> > > $ ls -lhZ b.img
> > > -rw-r--r--. root root system_u:object_r:virt_image_t:s0 b.img
> > > 
> > > Then, we won't be able to do OFD lock operations with the existing fd.
> > > In other words, the code such as in blk_detach_dev:
> > > 
> > > blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort);
> > > 
> > > can abort() QEMU, out of environmental changes.
> > > 
> > > This patch is an easy fix to this and the change is regardlessly
> > > reasonable, so do it.
> > > 
> > > Signed-off-by: Fam Zheng 
> > 
> > Thanks, applied to the block branch.
> 
> Self-NACK. This breaks raw_abort_perm_update(). The extra bytes locked by
> raw_check_perm() are not tracked by s->perm (it is only updated in
> raw_set_perm()), thus will not get released. This patch is "misusing" s->perm
> and s->shared_perm.
> 
> I'll revise the implementation and send another version together with dropping
> s->lock_fd.

Oops! I'm dequeuing the patch for now. Also, getting rid of s->lock_fd
sounds good!

I wonder if we can give this some test coverage, too, so that we'll
notice the breakage earlier next time. Maybe we can check from Python
which bits are locked?

Kevin

Re: [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero

2018-08-14 Thread Markus Armbruster

Eric Blake  writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> For input 0123, the lexer produces the tokens
>>
>>  JSON_ERROR01
>>  JSON_INTEGER  23
>>
>> Reporting an error is correct; 0123 is invalid according to RFC 7159.
>> But the error recovery isn't nice.
>>
>> Make the finite state machine eat digits before going into the error
>> state.  The lexer now produces
>>
>>  JSON_ERROR0123
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>   qobject/json-lexer.c | 7 ++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>
>> @@ -158,10 +159,14 @@ static const uint8_t json_lexer[][256] =  {
>>   /* Zero */
>>   [IN_ZERO] = {
>>   TERMINAL(JSON_INTEGER),
>> -['0' ... '9'] = IN_ERROR,
>> +['0' ... '9'] = IN_BAD_ZERO,
>>   ['.'] = IN_MANTISSA,
>>   },
>>   +[IN_BAD_ZERO] = {
>> +['0' ... '9'] = IN_BAD_ZERO,
>> +},
>> +
>
> Should IN_BAD_ZERO also consume '.' and/or 'e' (after all, '01e2 is a
> valid C constant, but not a valid JSON literal)?  But I think your
> choice here is fine (again, add too much, and then the lexer has to
> track a lot of state; whereas this minimal addition catches the most
> obvious things with little effort).

My patch is of marginal value to begin with.  It improves error recovery
only for the "integer with redundant leading zero" case.  I guess that's
more common than "floating-point with redundant leading zero".

An obvious way to extend it to "number with redundant leading zero"
would be cloning the lexer states related to numbers.  Clean, but six
more states.  Meh.

Another way is to dumb down the lexer not to care about leading zero,
and catch it in parse_literal() instead.  Basically duplicates the lexer
state machine up to where leading zero is recognized.  Not much code,
but meh.

Yet another way is to have the lexer eat "digit salad" after redundant
leading zero:

[IN_BAD_ZERO] = {
['0' ... '9'] = IN_BAD_ZERO,
['.'] = IN_BAD_ZERO,
['e'] = IN_BAD_ZERO,
['-'] = IN_BAD_ZERO,
['+'] = IN_BAD_ZERO,
},

Eats even crap like 01e...  But then the same crap without leading zero
should also be eaten.  We'd have to add state transitions to IN_BAD_ZERO
to the six states related to numbers.  Perhaps clever use of gcc's range
initialization lets us do this compactly.  Otherwise, meh again.

Opinions?

I'd also be fine with dropping this patch.

> Reviewed-by: Eric Blake 

Thanks!

Re: [Qemu-devel] [PATCH] luks: Allow share-rw=on

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 09:27 hat Daniel P. Berrangé geschrieben:
> On Tue, Aug 14, 2018 at 03:25:51PM +0800, Fam Zheng wrote:
> > Format drivers such as qcow2 don't allow sharing the same image between
> > two QEMU instances in order to prevent image corruptions, because of
> > metadata cache. LUKS driver don't modify metadata except for when
> > creating image, so it is safe to relax the permission. This makes
> > share-rw=on property work on virtual devices.
> > 
> > Suggested-by: Daniel P. Berrangé 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/crypto.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/crypto.c b/block/crypto.c
> > index 146d81c90a..33ee01bebd 100644
> > --- a/block/crypto.c
> > +++ b/block/crypto.c
> > @@ -627,7 +627,9 @@ BlockDriver bdrv_crypto_luks = {
> >  .bdrv_probe = block_crypto_probe_luks,
> >  .bdrv_open  = block_crypto_open_luks,
> >  .bdrv_close = block_crypto_close,
> > -.bdrv_child_perm= bdrv_format_default_perms,
> > +/* This driver doesn't modify LUKS metadata except when creating image.
> > + * Allow share-rw=on as a special case. */
> > +.bdrv_child_perm= bdrv_filter_default_perms,
> >  .bdrv_co_create = block_crypto_co_create_luks,
> >  .bdrv_co_create_opts = block_crypto_co_create_opts_luks,
> >  .bdrv_co_truncate   = block_crypto_co_truncate,
> 
> Reviewed-by: Daniel P. Berrangé 

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH v2] file-posix: Skip effectiveless OFD lock operations

2018-08-14 Thread Fam Zheng

On Tue, 08/14 10:22, Kevin Wolf wrote:
> Am 14.08.2018 um 10:12 hat Fam Zheng geschrieben:
> > On Mon, 08/13 15:42, Kevin Wolf wrote:
> > > Am 13.08.2018 um 04:39 hat Fam Zheng geschrieben:
> > > > If we know we've already locked the bytes, don't do it again; similarly
> > > > don't unlock a byte if we haven't locked it. This doesn't change the
> > > > behavior, but fixes a corner case explained below.
> > > > 
> > > > Libvirt had an error handling bug that an image can get its (ownership,
> > > > file mode, SELinux) permissions changed (RHBZ 1584982) by mistake behind
> > > > QEMU. Specifically, an image in use by Libvirt VM has:
> > > > 
> > > > $ ls -lhZ b.img
> > > > -rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c600,c690 
> > > > b.img
> > > > 
> > > > Trying to attach it a second time won't work because of image locking.
> > > > And after the error, it becomes:
> > > > 
> > > > $ ls -lhZ b.img
> > > > -rw-r--r--. root root system_u:object_r:virt_image_t:s0 b.img
> > > > 
> > > > Then, we won't be able to do OFD lock operations with the existing fd.
> > > > In other words, the code such as in blk_detach_dev:
> > > > 
> > > > blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort);
> > > > 
> > > > can abort() QEMU, out of environmental changes.
> > > > 
> > > > This patch is an easy fix to this and the change is regardlessly
> > > > reasonable, so do it.
> > > > 
> > > > Signed-off-by: Fam Zheng 
> > > 
> > > Thanks, applied to the block branch.
> > 
> > Self-NACK. This breaks raw_abort_perm_update(). The extra bytes locked by
> > raw_check_perm() are not tracked by s->perm (it is only updated in
> > raw_set_perm()), thus will not get released. This patch is "misusing" 
> > s->perm
> > and s->shared_perm.
> > 
> > I'll revise the implementation and send another version together with 
> > dropping
> > s->lock_fd.
> 
> Oops! I'm dequeuing the patch for now. Also, getting rid of s->lock_fd
> sounds good!
> 
> I wonder if we can give this some test coverage, too, so that we'll
> notice the breakage earlier next time. Maybe we can check from Python
> which bits are locked?

I can write a unit test around open/close/reopen in C, where it is convenient to
check the lock status with F_OFD_GETLK/F_OFD_GETLK before/after the operations.

Fam

Re: [Qemu-devel] [PATCH 0/4] target/arm: Fix int64_to_float16 double-rounding

2018-08-14 Thread Alex Bennée



Richard Henderson  writes:

> In 88808a022c0, I tried to fix an overflow problem that affected float16
> scaling by coverting first to float64 and then rounding after that.
>
> However, Laurent reported that -0x3ff401 converted to float16
> resulted in 0xfbfe instead of the expected 0xfbff.  This is caused by
> the inexact conversion to float64.
>
> Rather than build more logic into target/arm to compensate, just add
> a function that takes a scaling parameter so that the whole thing is
> done all at once with only one rounding.
>
> I don't have a failing test case for the float-to-int paths, but it
> seemed best to apply the same solution.

Can't we add the constants to the fcvt test case?

>
>
> r~
>
>
> Richard Henderson (4):
>   softfloat: Add scaling int-to-float routines
>   softfloat: Add scaling float-to-int routines
>   target/arm: Use the int-to-float-scale softfloat routines
>   target/arm: Use the float-to-int-scale softfloat routines
>
>  include/fpu/softfloat.h | 169 
>  fpu/softfloat.c | 579 +++-
>  target/arm/helper.c | 130 -
>  3 files changed, 628 insertions(+), 250 deletions(-)


--
Alex Bennée

Re: [Qemu-devel] [PATCH] hw/pci: add pci capability to pci-pci bridge

2018-08-14 Thread Liu, Jing2


Hi Laszlo,
Sorry for late reply. I missed these mails because of wrong filter.
And thanks very much for comments. My reply as belows.

On 8/7/2018 8:19 PM, Laszlo Ersek wrote:

On 08/07/18 09:04, Jing Liu wrote:

[...]

@@ -46,6 +46,12 @@ struct PCIBridgeDev {
  uint32_t flags;
  
  OnOffAuto msi;

+
+/* additional resources to reserve on firmware init */
+uint64_t io_reserve;
+uint64_t mem_reserve;
+uint64_t pref32_reserve;
+uint64_t pref64_reserve;
  };
  typedef struct PCIBridgeDev PCIBridgeDev;


(1) Please evaluate the following idea:


This is really good and I'm only not sure if DEFINE_PROP_ like,
DEFINE_PROP_UINT32("bus-reserve", GenPCIERootPort,
res_reserve.bus_reserve, -1), is a right/good way?
I mean the third parameter "_f".

[...]


- gen_rp_props should be updated accordingly. The property names should
stay the same, but they should reference fields of the "res_reserve" field.

(2) Then, in a separate patch, you can add another "res_reserve" field
to "PCIBridgeDev". (As a consequence, "bus_reserve" will also be added,
which I think is the right thing to do.)

I consider whether we need limit the bus_reserve of pci-pci bridge. For 
old codes, we could add "unlimited" numbers of devices (but less than 
than max) onto pci-pci bridge.



(3) There is a class that is derived from TYPE_PCI_BRIDGE_DEV:
TYPE_PCI_BRIDGE_SEAT_DEV. Is it correct to add the new properties /
fields to the latter as well?


Umm, I looked into the codes that doesn't have hotplug property?
So do we need add resource reserve for it?

  
@@ -95,6 +101,13 @@ static void pci_bridge_dev_realize(PCIDevice *dev, Error **errp)

  error_free(local_err);
  }
  
+err = pci_bridge_qemu_reserve_cap_init(dev, 0, 0,

+  bridge_dev->io_reserve, bridge_dev->mem_reserve,
+  bridge_dev->pref32_reserve, bridge_dev->pref64_reserve, errp);
+if (err) {
+goto cap_error;
+}
+
  if (shpc_present(dev)) {
  /* TODO: spec recommends using 64 bit prefetcheable BAR.
   * Check whether that works well. */
@@ -103,6 +116,8 @@ static void pci_bridge_dev_realize(PCIDevice *dev, Error 
**errp)
  }
  return;
  
+cap_error:

+msi_uninit(dev);


(4) This error handler doesn't look entirely correct; we can reach it
without having initialized MSI. (MSI setup is conditional; and even if
we attempt it, it is permitted to fail with "msi=auto".) Therefore,
msi_uninit() should be wrapped into "if (msi_present())", same as you
see in pci_bridge_dev_exitfn().


sure, can add a pre-check. But I don't understand why we need that,
msi_uninit() will check msi_present()?




  msi_error:
  slotid_cap_cleanup(dev);
  slotid_error:
@@ -162,6 +177,11 @@ static Property pci_bridge_dev_properties[] = {
  ON_OFF_AUTO_AUTO),
  DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags,
  PCI_BRIDGE_DEV_F_SHPC_REQ, true),
+DEFINE_PROP_SIZE("io-reserve", PCIBridgeDev, io_reserve, -1),
+DEFINE_PROP_SIZE("mem-reserve", PCIBridgeDev, mem_reserve, -1),
+DEFINE_PROP_SIZE("pref32-reserve", PCIBridgeDev, pref32_reserve, -1),
+DEFINE_PROP_SIZE("pref64-reserve", PCIBridgeDev, pref64_reserve, -1),
+
  DEFINE_PROP_END_OF_LIST(),
  };
  



(5) Similarly to (2), this property list should cover "bus-reserve" too.
If we are exposing the same capability in PCI config space, then I think
we should let the user control all fields of it as well.

(6) Please clarify the capability ID in the subject line of the patch.

OK, will add the details.


Thanks very much.
Jing


For example:

   hw/pci: support resource reservation capability on "pci-bridge"

Thanks
Laszlo

Re: [Qemu-devel] [PATCH] qemu-img.c: increase spacing between commands in documentation

2018-08-14 Thread Kevin Wolf

Am 13.08.2018 um 20:19 hat Eric Blake geschrieben:
> On 08/13/2018 11:56 AM, Max Reitz wrote:
> > 
> > Ah, hm, so much for that.  Hm...  I don't quite know what to think of
> > this.  It does indeed improve legibility.  But the question is whether
> > --help should be as condensed as possible, and if the user finds it hard
> > to read, whether they should not just open the man page...
> > 
> > Then again, our --help text already has 102 lines.
> > 
> > Let me just think about it for a bit longer. :-)
> > (And see whether others have opinions.)
> 
> And I've already expressed my opinion that it is already rather long, where
> making it longer is not necessarily making it smarter.

I think if we want to improve the help text, we should split it up.

$ qemu-img --help
qemu-img version 2.12.94 (v3.0.0-rc4-5-g4fe9c3e13d-dirty)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
usage: qemu-img [standard options] command [command options]
QEMU disk image utility

'-h', '--help'   display this help and exit
'-V', '--version'output version information and exit
'-T', '--trace'  [[enable=]][,events=][,file=]
 specify tracing options

Commands:

amend   Change options of an existing disk image
bench   Run benchmarks on a given disk image
check   Check the disk image for consistency or repair it
commit  Merge the disk image into its backing file
...

Run 'qemu-img  --help' for details.

See  for how to report bugs.
More information on the QEMU project at .

$ qemu-img check --help
usage: qemu-img check [--object objectdef] [--image-opts] [-q] [-f fmt] 
[--output=ofmt] [-r [leaks | all]] [-T src_cache] [-U] filename

Command parameters:
  -f'fmt' is the disk image format. It is guessed automatically
in most cases.

  -quse Quiet mode - do not print any output (except errors).

  -rtries to repair any inconsistencies that are found during the
check. '-r leaks' repairs only cluster leaks, whereas '-r
all' fixes all kinds of errors, with a higher risk of
choosing the wrong fix or hiding corruption that has already
occurred.

  -T'src_cache' is the cache mode used to read input disk images,
the valid options are the same as for the 'cache' option.

  --object  'objectdef' is a QEMU user creatable object definition. See
the qemu(1) manual page for a description of the object
properties. The most common object type is a 'secret', which
is used to supply passwords and/or encryption keys.

  --output  'ofmt' is 'human' for human-readable output (default) or
'json' for JSON output.

Examples:

...

Kevin

Re: [Qemu-devel] [Bug 1785698] Re: Solaris build error: unknown type name ‘gcry_error_t’

2018-08-14 Thread Peter Maydell

On 13 August 2018 at 19:58, Michele Denber <1785...@bugs.launchpad.net> wrote:
> I don't know why qemu is picky about
> POSIX, but there you have it.

We do assume a posix shell and that that shell is /bin/sh.
We may have bugs where we assume non-posix behaviour
from it, since almost all users are going to be on systems
where /bin/sh is bash or dash or whatever the BSD /bin/sh is.

(dtc is a sort-of-third-party module, not part of QEMU
proper.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH] virtio: add support for in-order feature

2018-08-14 Thread Ilya Maximets

On 13.08.2018 18:35, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2018 at 06:28:06PM +0300, Ilya Maximets wrote:
>> On 13.08.2018 12:56, Michael S. Tsirkin wrote:
>>> On Mon, Aug 13, 2018 at 10:55:23AM +0300, Ilya Maximets wrote:
 On 10.08.2018 22:19, Michael S. Tsirkin wrote:
> On Fri, Aug 10, 2018 at 02:04:47PM +0300, Ilya Maximets wrote:
>> On 10.08.2018 12:34, Michael S. Tsirkin wrote:
>>> On Fri, Aug 10, 2018 at 11:28:47AM +0300, Ilya Maximets wrote:
 On 10.08.2018 01:51, Michael S. Tsirkin wrote:
> On Thu, Aug 09, 2018 at 07:54:37PM +0300, Ilya Maximets wrote:
>> New feature bit for in-order feature of the upcoming
>> virtio 1.1. It's already supported by DPDK vhost-user
>> and virtio implementations. These changes required to
>> allow feature negotiation.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>
>> I just wanted to test this new feature in DPDK but failed
>> to found required patch for QEMU side. So, I implemented it.
>> At least it will be helpful for someone like me, who wants
>> to evaluate VIRTIO_F_IN_ORDER with DPDK.
>>
>>  hw/net/vhost_net.c |  1 +
>>  include/hw/virtio/virtio.h | 12 +++-
>>  include/standard-headers/linux/virtio_config.h |  7 +++
>>  3 files changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>> index e037db6..86879c5 100644
>> --- a/hw/net/vhost_net.c
>> +++ b/hw/net/vhost_net.c
>> @@ -78,6 +78,7 @@ static const int user_feature_bits[] = {
>>  VIRTIO_NET_F_MRG_RXBUF,
>>  VIRTIO_NET_F_MTU,
>>  VIRTIO_F_IOMMU_PLATFORM,
>> +VIRTIO_F_IN_ORDER,
>>  
>>  /* This bit implies RARP isn't sent by QEMU out of band */
>>  VIRTIO_NET_F_GUEST_ANNOUNCE,
>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> index 9c1fa07..a422025 100644
>> --- a/include/hw/virtio/virtio.h
>> +++ b/include/hw/virtio/virtio.h
>> @@ -254,16 +254,18 @@ typedef struct virtio_input_conf 
>> virtio_input_conf;
>>  typedef struct VirtIOSCSIConf VirtIOSCSIConf;
>>  typedef struct VirtIORNGConf VirtIORNGConf;
>>  
>> -#define DEFINE_VIRTIO_COMMON_FEATURES(_state, _field) \
>> +#define DEFINE_VIRTIO_COMMON_FEATURES(_state, _field) \
>>  DEFINE_PROP_BIT64("indirect_desc", _state, _field,\
>>VIRTIO_RING_F_INDIRECT_DESC, true), \
>>  DEFINE_PROP_BIT64("event_idx", _state, _field,\
>>VIRTIO_RING_F_EVENT_IDX, true), \
>>  DEFINE_PROP_BIT64("notify_on_empty", _state, _field,  \
>> -  VIRTIO_F_NOTIFY_ON_EMPTY, true), \
>> -DEFINE_PROP_BIT64("any_layout", _state, _field, \
>> -  VIRTIO_F_ANY_LAYOUT, true), \
>> -DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
>> +  VIRTIO_F_NOTIFY_ON_EMPTY, true),\
>> +DEFINE_PROP_BIT64("any_layout", _state, _field,   \
>> +  VIRTIO_F_ANY_LAYOUT, true), \
>> +DEFINE_PROP_BIT64("in_order", _state, _field, \
>> +  VIRTIO_F_IN_ORDER, true),   \
>> +DEFINE_PROP_BIT64("iommu_platform", _state, _field,   \
>>VIRTIO_F_IOMMU_PLATFORM, false)
>
> Is in_order really right for all virtio devices?

 I see nothing device specific in this feature. It just specifies
 some restrictions on the descriptors handling. All virtio devices
 could use it to have performance benefits. Also, upcoming packed
 rings should give a good performance boost in case of enabled
 in-order feature. And packed rings RFC [1] implements
 VIRTIO_F_RING_PACKED for all virtio devices. So, I see no issues
 in enabling in-order negotiation for all of them.

 What do you think?

 [1] https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01028.html

 Best regards, Ilya Maximets.
>>>
>>> If guest assumes in-order use of buffers but device uses them out of
>>> order then guest will crash. So there's a missing piece where
>>> you actually make devices use buffers in order when the flag is set.
>>
>> I thought that feature negotiation is the mechanism that should
>> protect us from situations like that. Isn't it?
>> If device negotiates in-order feature, when it MUST (as described
>> in spec) use buffers in the same order in which they have been
>> available.
>
> Exactly. And

Re: [Qemu-devel] [PATCH 3/5] block: Simplify bdrv_reopen_abort()

2018-08-14 Thread Kevin Wolf

Am 29.06.2018 um 13:37 hat Alberto Garcia geschrieben:
> If a bdrv_reopen_multiple() call fails, then the explicit_options
> QDict has to be deleted for every entry in the reopen queue. This must
> happen regardless of whether that entry's bdrv_reopen_prepare() call
> succeeded or not.
> 
> This patch simplifies the cleanup code a bit.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 5aaed424b9..48e8f4814c 100644
> --- a/block.c
> +++ b/block.c
> @@ -3060,9 +3060,10 @@ int bdrv_reopen_multiple(AioContext *ctx, 
> BlockReopenQueue *bs_queue, Error **er
>  
>  cleanup:
>  QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
> -if (ret && bs_entry->prepared) {
> -bdrv_reopen_abort(&bs_entry->state);
> -} else if (ret) {
> +if (ret) {
> +if (bs_entry->prepared) {
> +bdrv_reopen_abort(&bs_entry->state);
> +}
>  qobject_unref(bs_entry->state.explicit_options);
>  }
>  qobject_unref(bs_entry->state.options);
> @@ -3351,8 +3352,6 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
>  drv->bdrv_reopen_abort(reopen_state);
>  }
>  
> -qobject_unref(reopen_state->explicit_options);
> -
>  bdrv_abort_perm_update(reopen_state->bs);
>  }

I think this is only correct if we make bdrv_reopen_prepare/commit/abort
private. At the moment they are public interfaces, so we must have
thought that some callers might want to integrate them into other
transactions.

Kevin

Re: [Qemu-devel] [PATCH] hw/pci: add pci capability to pci-pci bridge

2018-08-14 Thread Liu, Jing2


Hi Marcel,

On 8/12/2018 3:11 PM, Marcel Apfelbaum wrote:

Hi Laszlo,


[...]

  hw/pci-bridge/pci_bridge_dev.c | 20 
  1 file changed, 20 insertions(+)
+cap_error:
+    msi_uninit(dev);

(4) This error handler doesn't look entirely correct; we can reach it
without having initialized MSI. (MSI setup is conditional; and even if
we attempt it, it is permitted to fail with "msi=auto".) Therefore,
msi_uninit() should be wrapped into "if (msi_present())", same as you
see in pci_bridge_dev_exitfn().


You are right.  msi_present should be checked.


I looked at the codes calling this function. It need be added to be strong.
But could I ask why we need check twice? msi_unint() help check again.





  msi_error:
  slotid_cap_cleanup(dev);
  slotid_error:

I tried to understand the error handling a bit better. I'm confused.


[...]

Second, msi_uninit() and shpc_cleanup() are internally inconsistent
between each other. The former removes the respective capability from
config space -- with pci_del_capability() --,


Right


  but the latter only has a
comment, "/* TODO: cleanup config space changes? */".


But also disables the QEMU_PCI_CAP_SLOTID (but no code checks it anyway)
I agree it should call pci_del_capability to delete the SHPC capability,
maybe is some "debt" from early development stages.


  The same comment
is present in the slotid_cap_cleanup() function. Given this
inconsistency,


Here we also miss a call to pci_del_capability.


I don't know what to suggest for the new capability's
teardown, in pci_bridge_dev_exitfn()  

Aha, yes, the teardown needs to be added here.
Will add that in next version.

-- should we just ignore it (as

suggested by this patch)?


No, we should remove it properly.

I think it is not considered a "big" issue since adding/removing PCI 
capabilities
is not an operation that deals with resources, we only edit an array 
(the config space)

that will not be used anyway if the device init sequence failed.

That does not mean the code should not be clean.

Do we need set up another serial patches (separated from this
one) to add pci_del_capability() for slotid_cap_cleanup() and 
shpc_cleanup()?


Thanks,
Jing

[...]

Re: [Qemu-devel] [PATCH 5/5] block: Simplify append_open_options()

2018-08-14 Thread Kevin Wolf

Am 29.06.2018 um 13:37 hat Alberto Garcia geschrieben:
> This function returns a BDS's driver-specific options, excluding also
> those from its children. Since we have just removed all children
> options from bs->options there's no need to do this last step.
> 
> We allow references to children, though ("backing": "node0"), so those
> we still have to remove.
> 
> Signed-off-by: Alberto Garcia 

Hmm, yes, but if I compare this with the example you gave in an earlier
patch:

  $ qemu-img create -f qcow2 hd0.qcow2 10M
  $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
  $ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2

  $ $QEMU -drive file=hd2.qcow2,node-name=hd2,backing.node-name=hd1

This opens a chain of images hd0 <- hd1 <- hd2. Now let's remove hd1
using block_stream:

  (qemu) block_stream hd2 0 hd0.qcow2

After this hd2 contains backing.node-name=hd1, which is no longer
correct because hd1 doesn't exist anymore.

Doesn't backing=hd1 actually result in the same kind of inconsistency?
That is, bs->option will still have backing=hd1, but in reality we
reference hd0 now?

Should we get rid of child references in bs->(explicit_)options as well?

Kevin

Re: [Qemu-devel] [PATCH] hw/pci: add pci capability to pci-pci bridge

2018-08-14 Thread Liu, Jing2





On 8/12/2018 3:11 PM, Marcel Apfelbaum wrote:
[...]


I don't know what to suggest for the new capability's
teardown, in pci_bridge_dev_exitfn()  -- should we just ignore it (as
suggested by this patch)?


No, we should remove it properly.

I think it is not considered a "big" issue since adding/removing PCI 
capabilities
is not an operation that deals with resources, we only edit an array 
(the config space)

that will not be used anyway if the device init sequence failed.

That does not mean the code should not be clean.

I just search some PCI_CAP_ID_* to see what they do for exitfn/uninit.
It's weird that some more capabilities aren't being deleted, like 
pci_ich9_uninit().


Thanks,
Jing



Thanks,
Marcel



Thanks
Laszlo

[Qemu-devel] [PATCH] qemu-img: fix regression copying secrets during convert

2018-08-14 Thread Daniel P . Berrangé

When the convert command is creating an output file that needs
secrets, we need to ensure those secrets are passed to both the
blk_new_open and bdrv_create API calls.

This is done by qemu-img extracting all opts matching the name
suffix "key-secret". Unfortunately the code doing this was run after the
call to bdrv_create(), which meant the QemuOpts it was extracting
secrets from was now empty.

Previously this worked by luks as a bug meant the "key-secret"
parameters were not purged from the QemuOpts. This bug was fixed in

  commit b76b4f604521e59f857d6177bc55f6f2e41fd392
  Author: Kevin Wolf 
  Date:   Thu Jan 11 16:18:08 2018 +0100

qcow2: Use visitor for options in qcow2_create()

Exposing the latent bug in qemu-img. This fix simply moves the copying
of secrets to before the bdrv_create() call.

Signed-off-by: Daniel P. Berrangé 
---
 qemu-img.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 1acddf693c..66c388986e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -345,21 +345,6 @@ static int img_add_key_secrets(void *opaque,
 return 0;
 }
 
-static BlockBackend *img_open_new_file(const char *filename,
-   QemuOpts *create_opts,
-   const char *fmt, int flags,
-   bool writethrough, bool quiet,
-   bool force_share)
-{
-QDict *options = NULL;
-
-options = qdict_new();
-qemu_opt_foreach(create_opts, img_add_key_secrets, options, &error_abort);
-
-return img_open_file(filename, options, fmt, flags, writethrough, quiet,
- force_share);
-}
-
 
 static BlockBackend *img_open(bool image_opts,
   const char *filename,
@@ -2018,6 +2003,7 @@ static int img_convert(int argc, char **argv)
 BlockDriverState *out_bs;
 QemuOpts *opts = NULL, *sn_opts = NULL;
 QemuOptsList *create_opts = NULL;
+QDict *open_opts = NULL;
 char *options = NULL;
 Error *local_err = NULL;
 bool writethrough, src_writethrough, quiet = false, image_opts = false,
@@ -2362,6 +2348,16 @@ static int img_convert(int argc, char **argv)
 }
 }
 
+/*
+ * The later open call will need any decryption secrets, and
+ * bdrv_create() will purge "opts", so extract them now before
+ * they are lost.
+ */
+if (!skip_create) {
+open_opts = qdict_new();
+qemu_opt_foreach(opts, img_add_key_secrets, open_opts, &error_abort);
+}
+
 if (!skip_create) {
 /* Create the new image */
 ret = bdrv_create(drv, out_filename, opts, &local_err);
@@ -2388,8 +2384,9 @@ static int img_convert(int argc, char **argv)
  * That has to wait for bdrv_create to be improved
  * to allow filenames in option syntax
  */
-s.target = img_open_new_file(out_filename, opts, out_fmt,
- flags, writethrough, quiet, false);
+s.target = img_open_file(out_filename, open_opts, out_fmt,
+ flags, writethrough, quiet, false);
+open_opts = NULL; /* blk_new_open will have freed it */
 }
 if (!s.target) {
 ret = -1;
@@ -2461,9 +2458,9 @@ out:
 qemu_progress_print(100, 0);
 }
 qemu_progress_end();
-qemu_opts_del(opts);
 qemu_opts_free(create_opts);
 qemu_opts_del(sn_opts);
+qobject_unref(open_opts);
 blk_unref(s.target);
 if (s.src) {
 for (bs_i = 0; bs_i < s.src_num; bs_i++) {
-- 
2.17.1

Re: [Qemu-devel] [PATCH 2/3] monitor: show sync profiling info with 'info sync'

2018-08-14 Thread Dr. David Alan Gilbert

* Emilio G. Cota (c...@braap.org) wrote:
> Signed-off-by: Emilio G. Cota 
> ---
>  monitor.c|  7 +++
>  hmp-commands-info.hx | 16 
>  2 files changed, 23 insertions(+)
> 
> diff --git a/monitor.c b/monitor.c
> index 77861e96af..66d8d85b97 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -1454,6 +1454,13 @@ static void hmp_info_opcount(Monitor *mon, const QDict 
> *qdict)
>  }
>  #endif
>  
> +static void hmp_info_sync(Monitor *mon, const QDict *qdict)
> +{
> +int64_t max = qdict_get_try_int(qdict, "max", 10);
> +
> +qsp_report((FILE *)mon, monitor_fprintf, max);
> +}
> +
>  static void hmp_info_history(Monitor *mon, const QDict *qdict)
>  {
>  int i;
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index 70639f656a..56a3249bad 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -297,6 +297,22 @@ STEXI
>  @item info opcount
>  @findex info opcount
>  Show dynamic compiler opcode counters
> +ETEXI
> +
> +{
> +.name   = "sync",
> +.args_type  = "max:i?",
> +.params = "[max]",
> +.help   = "show sync profiling info for "
> +   "up to max entries (default: 10). "
> +   "Entries are sorted by wait time.",
> +.cmd= hmp_info_sync,
> +},
> +
> +STEXI
> +@item info sync
> +@findex sync
> +Show sync profiling info.
>  ETEXI
>  
>  {

As long as this is just for devs I'm OK with this from the HMP side;
however, if you want to automate the display or wire it to other
tools then you should probably wire it up via QMP.

Dave

> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH] mirror: Fail gracefully for source == target

2018-08-14 Thread Kevin Wolf

blockdev-mirror with the same node for source and target segfaults
today: A node is in its own backing chain, so mirror_start_job() decides
that this is an active commit. When adding the intermediate nodes with
block_job_add_bdrv(), it starts the iteration through the subchain with
the backing file of source, though, so it never reaches target and
instead runs into NULL at the base.

While we could fix that by starting with source itself, there is no
point in allowing mirroring a node into itself and I wouldn't be
surprised if this caused more problems later.

So just check for this scenario and error out.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
---
 block/mirror.c | 5 +
 tests/qemu-iotests/041 | 6 ++
 tests/qemu-iotests/041.out | 4 ++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b48c3f8cf5..dd5ca02b09 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1499,6 +1499,11 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 buf_size = DEFAULT_MIRROR_BUF_SIZE;
 }
 
+if (bs == target) {
+error_setg(errp, "Can't mirror node into itself");
+return;
+}
+
 /* In the case of active commit, add dummy driver to provide consistent
  * reads on the top, while disabling it in the intermediate nodes, and make
  * the backing chain writable. */
diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index c20ac7da87..9336ab6ff5 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -234,6 +234,12 @@ class TestSingleBlockdev(TestSingleDrive):
 result = self.vm.qmp("blockdev-add", **args)
 self.assert_qmp(result, 'return', {})
 
+def test_mirror_to_self(self):
+result = self.vm.qmp(self.qmp_cmd, job_id='job0',
+ device=self.qmp_target, sync='full',
+ target=self.qmp_target)
+self.assert_qmp(result, 'error/class', 'GenericError')
+
 test_large_cluster = None
 test_image_not_found = None
 test_small_buffer2 = None
diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
index c28b392b87..e071d0b261 100644
--- a/tests/qemu-iotests/041.out
+++ b/tests/qemu-iotests/041.out
@@ -1,5 +1,5 @@
-.
+
 --
-Ran 85 tests
+Ran 88 tests
 
 OK
-- 
2.13.6

Re: [Qemu-devel] [PATCH v2 1/4] hw/char: Implement nRF51 SoC UART

2018-08-14 Thread Julia Suvorova via Qemu-devel


On 13.08.2018 12:47, Stefan Hajnoczi wrote:

On Mon, Aug 13, 2018 at 10:08 AM Julia Suvorova  wrote:

On 10.08.2018 09:02, Stefan Hajnoczi wrote:

On Wed, Aug 8, 2018 at 10:07 PM, Julia Suvorova  wrote:

+static uint64_t uart_read(void *opaque, hwaddr addr, unsigned int size)
+{
+NRF51UARTState *s = NRF51_UART(opaque);
+uint64_t r;
+
+if (!s->enabled) {
+return 0;
+}
+
+switch (addr) {
+case A_UART_RXD:
+r = s->rx_fifo[s->rx_fifo_pos];
+if (s->rx_started && s->rx_fifo_len) {
+qemu_chr_fe_accept_input(&s->chr);


Should this be called after popping a byte from the rx fifo?  That way
.can_receive() will return true again.


Could you explain more, please?


This calls into the chardev's ->chr_accept_input() function.  That
function may do anything it wants.

At this point we haven't popped a byte from our rx fifo yet, so if
->chr_accept_input() calls back into the chardev frontend (us!) it
sees that we cannot receive.  That's strange since we just told the
backend we want to accept input!

I haven't checked if there is any code path where this can happen, but
it's safer to first update internal state before letting the outside
world know that we can accept more input.


Thanks, I got it. I'll change the order.


+static void nrf51_uart_realize(DeviceState *dev, Error **errp)
+{
+NRF51UARTState *s = NRF51_UART(dev);
+
+qemu_chr_fe_set_handlers(&s->chr, uart_can_receive, uart_receive,
+ uart_event, NULL, s, NULL, true);
+}


unrealize() should set the handlers to NULL.  That way the device can
be removed without leaving callbacks registered.


I don't know the reason, but almost all char devices do not implement
this function. Maybe, because when you quit qemu, qemu_chr_cleanup() is called.


It's an assumption that on-board devices cannot be hot unplugged and
that the machine type stays alive until QEMU terminates.

Making this assumption saves 1 call to qemu_chr_fe_set_handlers().
The cost is that we cannot safely stop the system-on-chip because its
devices don't clean up properly.

Since cleanup is so trivial here I think it's worthwhile.


Ok, I'll implement unrealize with a qemu_chr_fe_deinit() in it.

Best regards, Julia Suvorova.

[Qemu-devel] [PATCH] hw/rdma: Abort send-op if fail to create addr handler

2018-08-14 Thread Yuval Shaia

Function create_ah might return NULL, let's exit with an error.

Signed-off-by: Yuval Shaia 
---
 hw/rdma/rdma_backend.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 35726bda2e..59d02eb567 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -402,6 +402,10 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 if (qp_type == IBV_QPT_UD) {
 wr.wr.ud.ah = create_ah(backend_dev, qp->ibpd,
 backend_dev->backend_gid_idx, dgid);
+if (!wr.wr.ud.ah) {
+comp_handler(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx, 0);
+goto out_dealloc_cqe_ctx;
+}
 wr.wr.ud.remote_qpn = dqpn;
 wr.wr.ud.remote_qkey = dqkey;
 }
-- 
2.17.1

Re: [Qemu-devel] [PATCH v9 6/6] tpm: add ACPI memory clear interface

2018-08-14 Thread Marc-André Lureau

Hi
On Mon, Aug 13, 2018 at 11:09 AM Igor Mammedov  wrote:
>
> On Fri, 10 Aug 2018 17:32:23 +0200
> Marc-André Lureau  wrote:
>
> > This allows to pass the last failing test from the Windows HLK TPM 2.0
> > TCG PPI 1.3 tests.
> >
> > The interface is described in the "TCG Platform Reset Attack
> > Mitigation Specification", chapter 6 "ACPI _DSM Function". According
> > to Laszlo, it's not so easy to implement in OVMF, he suggested to do
> > it in qemu instead.
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  hw/tpm/tpm_ppi.h |  2 ++
> >  hw/i386/acpi-build.c | 46 
> >  hw/tpm/tpm_crb.c |  1 +
> >  hw/tpm/tpm_ppi.c | 23 ++
> >  hw/tpm/tpm_tis.c |  1 +
> >  docs/specs/tpm.txt   |  2 ++
> >  hw/tpm/trace-events  |  3 +++
> >  7 files changed, 78 insertions(+)
> >
> > diff --git a/hw/tpm/tpm_ppi.h b/hw/tpm/tpm_ppi.h
> > index f6458bf87e..3239751e9f 100644
> > --- a/hw/tpm/tpm_ppi.h
> > +++ b/hw/tpm/tpm_ppi.h
> > @@ -23,4 +23,6 @@ typedef struct TPMPPI {
> >  bool tpm_ppi_init(TPMPPI *tpmppi, struct MemoryRegion *m,
> >hwaddr addr, Object *obj, Error **errp);
> >
> > +void tpm_ppi_reset(TPMPPI *tpmppi);
> > +
> >  #endif /* TPM_TPM_PPI_H */
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index c5e9a6e11d..271c7240dc 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -1824,6 +1824,13 @@ build_tpm_ppi(TPMIf *tpm, Aml *dev)
> >  pprq = aml_name("PPRQ");
> >  pprm = aml_name("PPRM");
> >
> > +aml_append(dev,
> > +   aml_operation_region("TPP3", AML_SYSTEM_MEMORY,
> > +aml_int(TPM_PPI_ADDR_BASE + 0x200),
> > +0x1));
> > +field = aml_field("TPP3", AML_BYTE_ACC, AML_NOLOCK, AML_PRESERVE);
> > +aml_append(field, aml_named_field("MOVV", 8));
> > +aml_append(dev, field);
> >  /*
> >   * DerefOf in Windows is broken with SYSTEM_MEMORY.  Use a dynamic
> >   * operation region inside of a method for getting FUNC[op].
> > @@ -2166,7 +2173,46 @@ build_tpm_ppi(TPMIf *tpm, Aml *dev)
> >  aml_append(ifctx, aml_return(aml_buffer(1, zerobyte)));
> >  }
> >  aml_append(method, ifctx);
> > +
> > +ifctx = aml_if(
> > +aml_equal(uuid,
> > +  aml_touuid("376054ED-CC13-4675-901C-4756D7F2D45D")));
> > +{
> > +/* standard DSM query function */
> > +ifctx2 = aml_if(aml_equal(function, zero));
> > +{
> > +uint8_t byte_list[1] = { 0x03 };
> > +aml_append(ifctx2, aml_return(aml_buffer(1, byte_list)));
> > +}
> > +aml_append(ifctx, ifctx2);
> > +
> > +/*
> > + * TCG Platform Reset Attack Mitigation Specification 1.0 Ch.6
> > + *
> > + * Arg 2 (Integer): Function Index = 1
> > + * Arg 3 (Package): Arguments = Package: Type: Integer
> > + *  Operation Value of the Request
> > + * Returns: Type: Integer
> > + *  0: Success
> > + *  1: General Failure
> > + */
> > +ifctx2 = aml_if(aml_equal(function, one));
> > +{
> > +aml_append(ifctx2,
> > +   aml_store(aml_derefof(aml_index(arguments, 
> > zero)),
> > + op));
> > +{
> > +aml_append(ifctx2, aml_store(op, aml_name("MOVV")));
> > +
> > +/* 0: success */
> > +aml_append(ifctx2, aml_return(zero));
> > +}
> > +}
> > +aml_append(ifctx, ifctx2);
> > +}
> > +aml_append(method, ifctx);
> >  }
> > +
> >  aml_append(dev, method);
> >  }
> >
> > diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c
> > index b243222fd6..48f6a716ad 100644
> > --- a/hw/tpm/tpm_crb.c
> > +++ b/hw/tpm/tpm_crb.c
> > @@ -233,6 +233,7 @@ static void tpm_crb_reset(void *dev)
> >  {
> >  CRBState *s = CRB(dev);
> >
> > +tpm_ppi_reset(&s->ppi);
> >  tpm_backend_reset(s->tpmbe);
> >
> >  memset(s->regs, 0, sizeof(s->regs));
> > diff --git a/hw/tpm/tpm_ppi.c b/hw/tpm/tpm_ppi.c
> > index 8b46b9dd4b..dbfe342ca2 100644
> > --- a/hw/tpm/tpm_ppi.c
> > +++ b/hw/tpm/tpm_ppi.c
> > @@ -16,8 +16,30 @@
> >  #include "qapi/error.h"
> >  #include "cpu.h"
> >  #include "sysemu/memory_mapping.h"
> > +#include "sysemu/reset.h"
> >  #include "migration/vmstate.h"
> >  #include "tpm_ppi.h"
> > +#include "trace.h"
> > +
> > +void tpm_ppi_reset(TPMPPI *tpmppi)
> > +{
> > +char *ptr = memory_region_get_ram_ptr(&tpmppi->ram);
> > +
> > +if (ptr[0x200] & 0x1) {
> > +GuestPhysBlockList guest_phys_blocks;
> > +GuestPhysBlock *block;
> > +
> > +guest_phys_blocks_init(&guest_phys_blocks);
> > +guest_p

Re: [Qemu-devel] [PATCH v2 1/4] hw/char: Implement nRF51 SoC UART

2018-08-14 Thread Peter Maydell

On 13 August 2018 at 10:47, Stefan Hajnoczi  wrote:
> It's an assumption that on-board devices cannot be hot unplugged and
> that the machine type stays alive until QEMU terminates.
>
> Making this assumption saves 1 call to qemu_chr_fe_set_handlers().
> The cost is that we cannot safely stop the system-on-chip because its
> devices don't clean up properly.
>
> Since cleanup is so trivial here I think it's worthwhile.

I would be more in favour of adding an unrealize function to
random always-present-never-unpluggable devices if we had
better documentation of exactly what the QOM lifecycle is
and what needs to be done in init/realize/unrealize/etc.
As it is, I don't know myself and therefore can't review
whether devices with unrealize methods get it right. And
obviously the code is completely untestable because it can
never run...

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 3/3] Change other funcitons referring to feature_word_info[]

2018-08-14 Thread Robert Hoo

On Fri, 2018-08-10 at 10:17 -0500, Eric Blake wrote:
> On 08/10/2018 09:06 AM, Robert Hoo wrote:
> 
> In the subject: s/funcitons/functions/
> 
> Also, it may be worth using a topic prefix (most of our commit messages 
> resemble:
> 
> topic: Description of patch
> 
> to make it easier to spot patches by topic).
> 
> > Add an util function feature_word_description(), which help construct the 
> > string
> 
> s/an util/a util/
> s/help/helps/
> 
Thanks Eric, will fix these typos in v2.
> > describing the feature word (both CPUID and MSR types).
> > 
> > report_unavailable_features(): add MSR_FEATURE_WORD type support.
> > x86_cpu_get_feature_words(): limit to CPUID_FEATURE_WORD only.
> > x86_cpu_get_supported_feature_word(): add MSR_FEATURE_WORD type support.
> > x86_cpu_adjust_feat_level(): assert the requested feature must be
> > CPUID_FEATURE_WORD type.
> > 
> > Signed-off-by: Robert Hoo 
> > ---
> >   target/i386/cpu.c | 77 
> > +--
> >   1 file changed, 58 insertions(+), 19 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index f7c70d9..21ed599 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -3024,21 +3024,51 @@ static const TypeInfo host_x86_cpu_type_info = {
> >   
> >   #endif
> >   
> > +/*
> > +*caller should have input str no less than 64 byte length.
> > +*/
> > +#define FEATURE_WORD_DESCPTION_LEN 64
> 
> s/DESCPTION/DESCRIPTION/
> 
> > +static int feature_word_description(char str[], FeatureWordInfo *f,
> > +uint32_t bit)
> > +{
> 
> Prone to buffer overflow if the caller doesn't pass in the length. 
> Would it be better to just g_strdup_printf() into a malloc'd result 
> instead of trying to snprintf()'ing into a buffer that you hope the 
> caller sized large enough?
> 
Oh, wasn't aware of such good util functions. Sure I'd like to use them.
Is there some web page introducing them?

Eduardo/Paolo, do you have more comments?
> > +int ret;
> > +
> > +assert(f->type == CPUID_FEATURE_WORD ||
> > +f->type == MSR_FEATURE_WORD);
> > +switch (f->type) {
> > +case CPUID_FEATURE_WORD:
> > +{
> > +const char *reg = get_register_name_32(f->cpuid.reg);
> > +assert(reg);
> > +ret = snprintf(str, FEATURE_WORD_DESCPTION_LEN,
> > +"CPUID.%02XH:%s%s%s [bit %d]",
> > +f->cpuid.eax, reg,
> > +f->feat_names[bit] ? "." : "",
> > +f->feat_names[bit] ? f->feat_names[bit] : "", bit);
> > +break;
> > +}
> > +case MSR_FEATURE_WORD:
> > +ret = snprintf(str, FEATURE_WORD_DESCPTION_LEN,
> > +"MSR(%xH).%s [bit %d]",
> > +f->msr.index,
> > +f->feat_names[bit] ? f->feat_names[bit] : "", bit);
> > +break;
> > +}
>

Re: [Qemu-devel] [PATCH] chardev: Add websocket support

2018-08-14 Thread Julia Suvorova via Qemu-devel


On 13.08.2018 15:02, Paolo Bonzini wrote:

Thanks Julia, just a few cleanups to simplify the prototypes of some
functions.


Thanks for the review, I'll do the refactoring.

Best regards, Julia Suvorova.

Re: [Qemu-devel] [PATCH 3/5] block: Simplify bdrv_reopen_abort()

2018-08-14 Thread Alberto Garcia

On Tue 14 Aug 2018 11:07:42 AM CEST, Kevin Wolf wrote:
> Am 29.06.2018 um 13:37 hat Alberto Garcia geschrieben:
>> If a bdrv_reopen_multiple() call fails, then the explicit_options
>> QDict has to be deleted for every entry in the reopen queue. This must
>> happen regardless of whether that entry's bdrv_reopen_prepare() call
>> succeeded or not.
>> 
>> This patch simplifies the cleanup code a bit.
>> 
>> Signed-off-by: Alberto Garcia 
>> ---
>>  block.c | 9 -
>>  1 file changed, 4 insertions(+), 5 deletions(-)
>> 
>> diff --git a/block.c b/block.c
>> index 5aaed424b9..48e8f4814c 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3060,9 +3060,10 @@ int bdrv_reopen_multiple(AioContext *ctx, 
>> BlockReopenQueue *bs_queue, Error **er
>>  
>>  cleanup:
>>  QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
>> -if (ret && bs_entry->prepared) {
>> -bdrv_reopen_abort(&bs_entry->state);
>> -} else if (ret) {
>> +if (ret) {
>> +if (bs_entry->prepared) {
>> +bdrv_reopen_abort(&bs_entry->state);
>> +}
>>  qobject_unref(bs_entry->state.explicit_options);
>>  }
>>  qobject_unref(bs_entry->state.options);
>> @@ -3351,8 +3352,6 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
>>  drv->bdrv_reopen_abort(reopen_state);
>>  }
>>  
>> -qobject_unref(reopen_state->explicit_options);
>> -
>>  bdrv_abort_perm_update(reopen_state->bs);
>>  }
>
> I think this is only correct if we make bdrv_reopen_prepare/commit/abort
> private. At the moment they are public interfaces, so we must have
> thought that some callers might want to integrate them into other
> transactions.

But that's not a problem as long as the new callers unref
state.explicit_options when bdrv_reopen_prepare() fails. They need to do
it with state.options anyway.

Berto

Re: [Qemu-devel] [PATCH v4 00/14] fp-test + hardfloat

2018-08-14 Thread Alex Bennée

Emilio G. Cota  writes:

> On Mon, Jun 11, 2018 at 21:48:46 -0400, Emilio G. Cota wrote:
>> Sending this respin (little more than a rebase) in case there's
>> reviewer bandwidth available until the soft-freeze in 3 weeks.
>
> *ping*
>
> Would be great to get this in for 3.1.

I would like this merged by 3.1 as well. However I think there is still
some work to be done on the testing side. IIRC the fptest case works
with whitelists and I'd like to understand more about why we can't use
the whole test corpus? Is it testing features we don't have on all
architectures or just because it wouldn't pass because of holes in our
current softfloat?

Our experience of SVE has shown that despite the fairly extensive
testing we did there are still a bunch of corner cases we missed.
Hopefully the last few patches have fixed that but I guess it pays to be
exhaustive.

We now have the check-tcg infrastructure in place so it would be nice to
have proper native tests in place for each architecture. My experience
of the fcvt.c test case however is you end up using inline assembler to
ensure you exercise the right guest opcodes which makes it hard to
generalise for lots of architectures. I had written a bunch of patches
against the fptest to get it built under check-tcg but it was painful:

  * a bit hacky to build as unit test and as tcg test
  * needed a lot of boilerplate for each new operation

> BTW I have a paper under submission that covers this work among
> other improvements to QEMU. Reading this paper might ease a
> subsequent review of this patch series; ask me privately if
> you want to get a copy.
>
> Thanks,
>
>   Emilio

--
Alex Bennée

Re: [Qemu-devel] [PATCH] chardev: Add websocket support

2018-08-14 Thread Julia Suvorova via Qemu-devel


On 13.08.2018 15:21, Daniel P. Berrangé wrote:

On Mon, Aug 13, 2018 at 01:20:37PM +0300, Julia Suvorova via Qemu-devel wrote:

New option "websock" added to allow using websocket protocol for
chardev socket backend.
Example:
 -chardev socket,websock,id=...

Signed-off-by: Julia Suvorova 
---
  chardev/char-socket.c | 75 ---
  chardev/char.c|  3 ++
  qapi/char.json|  3 ++
  3 files changed, 70 insertions(+), 11 deletions(-)





@@ -699,6 +706,45 @@ cont:
  }
  
  
+static void tcp_chr_websock_handshake(QIOTask *task, gpointer user_data)

+{
+Chardev *chr = user_data;
+
+if (qio_task_propagate_error(task, NULL)) {
+tcp_chr_disconnect(chr);
+} else {
+tcp_chr_connect(chr);
+}
+}
+
+
+static void tcp_chr_websock_init(Chardev *chr)
+{
+SocketChardev *s = SOCKET_CHARDEV(chr);
+QIOChannelWebsock *wioc;
+gchar *name;
+
+if (s->is_listen) {
+wioc = qio_channel_websock_new_server(s->ioc);
+} else {
+/* Websocket client is not yet implemented */
+return;
+}
+if (wioc == NULL) {
+tcp_chr_disconnect(chr);
+return;
+}
+
+name = g_strdup_printf("chardev-websock-server-%s", chr->label);
+qio_channel_set_name(QIO_CHANNEL(wioc), name);
+g_free(name);
+object_unref(OBJECT(s->ioc));
+s->ioc = QIO_CHANNEL(wioc);
+
+qio_channel_websock_handshake(wioc, tcp_chr_websock_handshake, chr, NULL);
+}
+
+
  static void tcp_chr_tls_handshake(QIOTask *task,
gpointer user_data)
  {
@@ -710,6 +756,8 @@ static void tcp_chr_tls_handshake(QIOTask *task,
  } else {
  if (s->do_telnetopt) {
  tcp_chr_telnet_init(chr);
+} else if (s->is_websock) {
+tcp_chr_websock_init(chr);
  } else {
  tcp_chr_connect(chr);
  }
@@ -799,12 +847,12 @@ static int tcp_chr_new_client(Chardev *chr, 
QIOChannelSocket *sioc)
  
  if (s->tls_creds) {

  tcp_chr_tls_init(chr);
+} else if (s->do_telnetopt) {
+tcp_chr_telnet_init(chr);
+} else if (s->is_websock) {
+tcp_chr_websock_init(chr);
  } else {
-if (s->do_telnetopt) {
-tcp_chr_telnet_init(chr);
-} else {
-tcp_chr_connect(chr);
-}
+tcp_chr_connect(chr);



I don't think the ordering of init calls is correct when we have multiple 
options
enabled.

With tls=y & telnet=y we initialize TLS, then initialize telnet.

With tls=y & telnet=y & websock=y, we  initialize TLS, then initialize telnet 
and
then websock.

This means we're setting up a telnet session, and then running websocket over 
the top
of it.  This is wrong because there's no telnet client that exists which would 
be
able to use that.

The purpose of TLS & websock options in combination with telnet is to allow a 
normal
telnet session to be tunnelled over a TLS+websocket proxy.

So we must initailize TLS, then websock, then telnet


Okay, it's reasonable if all three properties are on. I'll change the order.

Best regards, Julia Suvorova.

[Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Jay Zhou

I got the following error when migrating a VM with watchdog
device:

{"timestamp": {"seconds": 1533884471, "microseconds": 668099},
"event": "WATCHDOG", "data": {"action": "reset"}}
{"timestamp": {"seconds": 1533884471, "microseconds": 677658},
"event": "RESET", "data": {"guest": true}}
{"timestamp": {"seconds": 1533884471, "microseconds": 677874},
"event": "STOP"}
qemu-system-x86_64: invalid runstate transition: 'prelaunch' -> 'postmigrate'
Aborted

The run state transition is RUN_STATE_FINISH_MIGRATE to RUN_STATE_PRELAUNCH,
then the migration thread aborted when it tries to set RUN_STATE_POSTMIGRATE.
There is a race between the main loop thread and the migration thread I think.

Signed-off-by: Jay Zhou 
---
 vl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/vl.c b/vl.c
index 16b913f..298ce91 100644
--- a/vl.c
+++ b/vl.c
@@ -637,6 +637,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
 { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+{ RUN_STATE_PRELAUNCH, RUN_STATE_POSTMIGRATE },
 
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 5/5] block: Simplify append_open_options()

2018-08-14 Thread Alberto Garcia

On Tue 14 Aug 2018 11:17:50 AM CEST, Kevin Wolf wrote:
> Am 29.06.2018 um 13:37 hat Alberto Garcia geschrieben:
>> This function returns a BDS's driver-specific options, excluding also
>> those from its children. Since we have just removed all children
>> options from bs->options there's no need to do this last step.
>> 
>> We allow references to children, though ("backing": "node0"), so those
>> we still have to remove.
>> 
>> Signed-off-by: Alberto Garcia 
>
> Hmm, yes, but if I compare this with the example you gave in an earlier
> patch:
>
>   $ qemu-img create -f qcow2 hd0.qcow2 10M
>   $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
>   $ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2
>
>   $ $QEMU -drive file=hd2.qcow2,node-name=hd2,backing.node-name=hd1
>
> This opens a chain of images hd0 <- hd1 <- hd2. Now let's remove hd1
> using block_stream:
>
>   (qemu) block_stream hd2 0 hd0.qcow2
>
> After this hd2 contains backing.node-name=hd1, which is no longer
> correct because hd1 doesn't exist anymore.
>
> Doesn't backing=hd1 actually result in the same kind of inconsistency?
> That is, bs->option will still have backing=hd1, but in reality we
> reference hd0 now?
>
> Should we get rid of child references in bs->(explicit_)options as
> well?

I don't think so, at least not in general.

The main difference is that if you set backing.node-name=foo then it
means that 'node-name=foo' is an option of the child, the option doesn't
belong in the parent at all. So once it's transferred to the child
there's no reason why it shoud remain in the parent. It's redundant and
can interfere with the reopen operation (e.g. you change the option in
the child but when you reopen the parent it will try to revert the child
option to the previous value).

In the case of 'backing=foo' that's clearly an option of the parent,
there's no other place to put it. We could probably remove it as well,
but we have to see carefully that no parent needs to keep that
information. I think we want to keep child references because in general
we don't allow them to be changed in reopen.

Example: you cannot pass 'file=bar' on reopen unless 'bar' was already
the existing value of 'file' (i.e. you're not changing anything). We
need to look for the previous value in bs->options in order to know
that.

After x-blockdev-reopen is ready, 'backing' will perhaps be the
first/only exception, because we'll be able to change it and the
previous value doesn't matter.

But that's part of the patches I'm working on.

Berto

Re: [Qemu-devel] [PATCH 3/5] block: Simplify bdrv_reopen_abort()

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 12:15 hat Alberto Garcia geschrieben:
> On Tue 14 Aug 2018 11:07:42 AM CEST, Kevin Wolf wrote:
> > Am 29.06.2018 um 13:37 hat Alberto Garcia geschrieben:
> >> If a bdrv_reopen_multiple() call fails, then the explicit_options
> >> QDict has to be deleted for every entry in the reopen queue. This must
> >> happen regardless of whether that entry's bdrv_reopen_prepare() call
> >> succeeded or not.
> >> 
> >> This patch simplifies the cleanup code a bit.
> >> 
> >> Signed-off-by: Alberto Garcia 
> >> ---
> >>  block.c | 9 -
> >>  1 file changed, 4 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/block.c b/block.c
> >> index 5aaed424b9..48e8f4814c 100644
> >> --- a/block.c
> >> +++ b/block.c
> >> @@ -3060,9 +3060,10 @@ int bdrv_reopen_multiple(AioContext *ctx, 
> >> BlockReopenQueue *bs_queue, Error **er
> >>  
> >>  cleanup:
> >>  QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
> >> -if (ret && bs_entry->prepared) {
> >> -bdrv_reopen_abort(&bs_entry->state);
> >> -} else if (ret) {
> >> +if (ret) {
> >> +if (bs_entry->prepared) {
> >> +bdrv_reopen_abort(&bs_entry->state);
> >> +}
> >>  qobject_unref(bs_entry->state.explicit_options);
> >>  }
> >>  qobject_unref(bs_entry->state.options);
> >> @@ -3351,8 +3352,6 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
> >>  drv->bdrv_reopen_abort(reopen_state);
> >>  }
> >>  
> >> -qobject_unref(reopen_state->explicit_options);
> >> -
> >>  bdrv_abort_perm_update(reopen_state->bs);
> >>  }
> >
> > I think this is only correct if we make bdrv_reopen_prepare/commit/abort
> > private. At the moment they are public interfaces, so we must have
> > thought that some callers might want to integrate them into other
> > transactions.
> 
> But that's not a problem as long as the new callers unref
> state.explicit_options when bdrv_reopen_prepare() fails. They need to do
> it with state.options anyway.

Actually that makes sense because the code creating the object isn't
even in bdrv_reopen_prepare(), but in bdrv_reopen_queue_child(). So it's
the caller that owns the object and should free them too. I just
confused the two functions.

Kevin

Re: [Qemu-devel] [PATCH v8 03/87] target/mips: Avoid case statements formulated by ranges - part 2

2018-08-14 Thread Aleksandar Markovic

> From: Aleksandar Markovic 
> Sent: Monday, August 13, 2018 7:52 PM
> 
> Subject: [PATCH v8 03/87] target/mips: Avoid case statements formulated by 
> ranges - part 2
> 
> From: Aleksandar Rikalo 
> 
> Remove "range style" case statements to make code analysis easier.
> This patch handles cases when the values in the range in question
> were not properly defined.
> 
> Signed-off-by: Aleksandar Markovic 
> Signed-off-by: Stefan Markovic 
> ---
>  target/mips/translate.c | 78 
> -
>  1 file changed, 71 insertions(+), 7 deletions(-)

Reviewed-by: Aleksandar Markovic

Re: [Qemu-devel] [PATCH 5/5] block: Simplify append_open_options()

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 12:52 hat Alberto Garcia geschrieben:
> On Tue 14 Aug 2018 11:17:50 AM CEST, Kevin Wolf wrote:
> > Am 29.06.2018 um 13:37 hat Alberto Garcia geschrieben:
> >> This function returns a BDS's driver-specific options, excluding also
> >> those from its children. Since we have just removed all children
> >> options from bs->options there's no need to do this last step.
> >> 
> >> We allow references to children, though ("backing": "node0"), so those
> >> we still have to remove.
> >> 
> >> Signed-off-by: Alberto Garcia 
> >
> > Hmm, yes, but if I compare this with the example you gave in an earlier
> > patch:
> >
> >   $ qemu-img create -f qcow2 hd0.qcow2 10M
> >   $ qemu-img create -f qcow2 -b hd0.qcow2 hd1.qcow2
> >   $ qemu-img create -f qcow2 -b hd1.qcow2 hd2.qcow2
> >
> >   $ $QEMU -drive file=hd2.qcow2,node-name=hd2,backing.node-name=hd1
> >
> > This opens a chain of images hd0 <- hd1 <- hd2. Now let's remove hd1
> > using block_stream:
> >
> >   (qemu) block_stream hd2 0 hd0.qcow2
> >
> > After this hd2 contains backing.node-name=hd1, which is no longer
> > correct because hd1 doesn't exist anymore.
> >
> > Doesn't backing=hd1 actually result in the same kind of inconsistency?
> > That is, bs->option will still have backing=hd1, but in reality we
> > reference hd0 now?
> >
> > Should we get rid of child references in bs->(explicit_)options as
> > well?
> 
> I don't think so, at least not in general.
> 
> The main difference is that if you set backing.node-name=foo then it
> means that 'node-name=foo' is an option of the child, the option doesn't
> belong in the parent at all. So once it's transferred to the child
> there's no reason why it shoud remain in the parent. It's redundant and
> can interfere with the reopen operation (e.g. you change the option in
> the child but when you reopen the parent it will try to revert the child
> option to the previous value).

That's a rather reopen-centric point of view, though.

Redundant information isn't nice already, but what really makes it a
problem is that it's potentially incorrect information because it isn't
kept up to date. There is no point in keeping incorrect information, so
I agree that deleting it is best.

> In the case of 'backing=foo' that's clearly an option of the parent,
> there's no other place to put it. We could probably remove it as well,
> but we have to see carefully that no parent needs to keep that
> information. I think we want to keep child references because in general
> we don't allow them to be changed in reopen.
> 
> Example: you cannot pass 'file=bar' on reopen unless 'bar' was already
> the existing value of 'file' (i.e. you're not changing anything). We
> need to look for the previous value in bs->options in order to know
> that.

My point is the backing=foo has the same problem: Storing it in
bs->options is not only redundant, but potentially incorrect because we
never update it when we modify the graph. There is no point in keeping
potentially incorrect information.

When you actually use that incorrect information, you've got a bug.
Reopen with file=bar doesn't have to check whether 'file=bar' is in
bs->options (because that may be outdated information), but whether the
BdrvChild with the name 'file' points to a node called 'bar'.

Getting rid of the potentially incorrect information will make it more
obvious what you have to check to make things work correctly.

> After x-blockdev-reopen is ready, 'backing' will perhaps be the
> first/only exception, because we'll be able to change it and the
> previous value doesn't matter.

I certainly hope that it will not be the only kind of node references
that you can change. Adding/removing filter nodes requires to be able to
change more or less any kind of node references. But we'll see.

Kevin

Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Paolo Bonzini

On 14/08/2018 12:48, Jay Zhou wrote:
> I got the following error when migrating a VM with watchdog
> device:
> 
> {"timestamp": {"seconds": 1533884471, "microseconds": 668099},
> "event": "WATCHDOG", "data": {"action": "reset"}}
> {"timestamp": {"seconds": 1533884471, "microseconds": 677658},
> "event": "RESET", "data": {"guest": true}}
> {"timestamp": {"seconds": 1533884471, "microseconds": 677874},
> "event": "STOP"}
> qemu-system-x86_64: invalid runstate transition: 'prelaunch' -> 'postmigrate'
> Aborted
> 
> The run state transition is RUN_STATE_FINISH_MIGRATE to RUN_STATE_PRELAUNCH,
> then the migration thread aborted when it tries to set RUN_STATE_POSTMIGRATE.
> There is a race between the main loop thread and the migration thread I think.

In that case I think you shouldn't go to POSTMIGRATE at all, because the
VM has been reset.

Alternatively, when the watchdog fires in RUN_STATE_FINISH_MIGRATE
state, it might delay the action until after the "cont" command is
invoked on the source, but I'm not sure what's the best way to achieve
that...

Paolo

Re: [Qemu-devel] [PATCH v7 6/9] qcow2: Increase the default upper limit on the L2 cache size

2018-08-14 Thread Leonid Bloch


On 8/14/18 11:18 AM, Kevin Wolf wrote:

Am 13.08.2018 um 18:42 hat Leonid Bloch geschrieben:

I don't actually think it's so bad to keep the cache permanently
allocated, but I wouldn't object to a lower default for non-Linux hosts
either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB)
might be more adequate. My typical desktop VMs are larger than 8 GB, but
smaller than 32 GB.


And for a Windows VM just the OS installation takes above 40 GB. While we
probably are not running Windows VMs for our own needs, it is very common
that a customer of, for example, some cloud service uses QEMU (unknowingly)
for a full-blown Windows. So 100 GB+ images which are quite heavily used is
not a rare scenario. 256 GB - yeah, that would be on the higher end.


The OS installation is mostly sequential access, though. You only need
that much cache when you have completely random I/O across the whole
image. Otherwise the LRU based approach of the cache is good enough to
keep those tables cached that are actually in use.


Sorry, by "OS installation" I meant the installed size of the OS, which 
should be available for fast and frequent access, not the installation 
process itself. Obviously for one-time tasks like the installation 
process it's not worth it, unless one installs all the time, instead of 
using ready images, for some reason. :)




The maximum cache size is maybe for huge databases or indeed random I/O
benchmarks, both of which are important to support (on Linux at least),
but probably not the most common use case.


So 16 MB would indeed be a reasonable default for the *max.* L2 cache now,
although below that would be too little, I think. 32 MB - if we want some
future-proofing.


I think we all agree that 32 MB + cache-clean-interval is okay.

It's just that for non-Linux guests, cache-clean-interval doesn't work.
However, we probably care less about those large random I/O cases there,
so a smaller cache size like 1 MB can do on non-Linux.

Kevin

Re: [Qemu-devel] [PATCH] qemu-img: fix regression copying secrets during convert

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 11:35 hat Daniel P. Berrangé geschrieben:
> When the convert command is creating an output file that needs
> secrets, we need to ensure those secrets are passed to both the
> blk_new_open and bdrv_create API calls.
> 
> This is done by qemu-img extracting all opts matching the name
> suffix "key-secret". Unfortunately the code doing this was run after the
> call to bdrv_create(), which meant the QemuOpts it was extracting
> secrets from was now empty.
> 
> Previously this worked by luks as a bug meant the "key-secret"
> parameters were not purged from the QemuOpts. This bug was fixed in
> 
>   commit b76b4f604521e59f857d6177bc55f6f2e41fd392
>   Author: Kevin Wolf 
>   Date:   Thu Jan 11 16:18:08 2018 +0100
> 
> qcow2: Use visitor for options in qcow2_create()
> 
> Exposing the latent bug in qemu-img. This fix simply moves the copying
> of secrets to before the bdrv_create() call.
> 
> Signed-off-by: Daniel P. Berrangé 

Cc: qemu-sta...@nongnu.org

> @@ -2461,9 +2458,9 @@ out:
>  qemu_progress_print(100, 0);
>  }
>  qemu_progress_end();
> -qemu_opts_del(opts);

Why don't we need to free opts any more?

>  qemu_opts_free(create_opts);
>  qemu_opts_del(sn_opts);
> +qobject_unref(open_opts);
>  blk_unref(s.target);
>  if (s.src) {
>  for (bs_i = 0; bs_i < s.src_num; bs_i++) {

Kevin

Re: [Qemu-devel] [PATCH v2 02/22] config: CONFIG_SERIAL* is already in pci.mak

2018-08-14 Thread Juan Quintela

Paolo Bonzini  wrote:
> On 08/08/2018 13:48, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela 
>> ---
>>  default-configs/alpha-softmmu.mak   | 2 --
>>  default-configs/arm-softmmu.mak | 2 --
>>  default-configs/hppa-softmmu.mak| 3 ---
>>  default-configs/i386-softmmu.mak| 2 --
>>  default-configs/mips-softmmu-common.mak | 2 --
>>  default-configs/ppc-softmmu.mak | 1 -
>>  default-configs/ppcemb-softmmu.mak  | 2 --
>>  default-configs/sh4-softmmu.mak | 2 --
>>  default-configs/sh4eb-softmmu.mak   | 2 --
>>  default-configs/sparc64-softmmu.mak | 2 --
>>  default-configs/x86_64-softmmu.mak  | 2 --
>>  11 files changed, 22 deletions(-)
>
> I don't think CONFIG_SERIAL_ISA should be in pci.mak though

agreed.

>, and
> CONFIG_SERIAL is a dependency of both CONFIG_SERIAL and
> CONFIG_SERIAL_PCI.

I guess you here mean CONFIG_SERIAL_ISA or CONFIG_SERIAL_PCI.  That is
not enough.  CONFIG_SERIAL really means CONFIG_SERIAL_COMMON, and things
like riscv* require it (i.e. basically all the weird(*) boards have a
serial port)

> Perhaps introduce a superio.mak with all the legacy
> ISA devices?

What are the surperio.mak or isa.mak.  And once there, what are the isa
devices?  Weird things that came to my mind that can't be disabled:

- mmmouse
- serial-isa
- fdc
- vmport
- vga-isa (this is another weird thing)
- ipmi-isa?
- what else?

Later, Juan.

(*): weird here means that they don't have PCI available.

Re: [Qemu-devel] [PATCH v7 6/9] qcow2: Increase the default upper limit on the L2 cache size

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 13:34 hat Leonid Bloch geschrieben:
> On 8/14/18 11:18 AM, Kevin Wolf wrote:
> > Am 13.08.2018 um 18:42 hat Leonid Bloch geschrieben:
> > > > I don't actually think it's so bad to keep the cache permanently
> > > > allocated, but I wouldn't object to a lower default for non-Linux hosts
> > > > either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB)
> > > > might be more adequate. My typical desktop VMs are larger than 8 GB, but
> > > > smaller than 32 GB.
> > > 
> > > And for a Windows VM just the OS installation takes above 40 GB. While we
> > > probably are not running Windows VMs for our own needs, it is very common
> > > that a customer of, for example, some cloud service uses QEMU 
> > > (unknowingly)
> > > for a full-blown Windows. So 100 GB+ images which are quite heavily used 
> > > is
> > > not a rare scenario. 256 GB - yeah, that would be on the higher end.
> > 
> > The OS installation is mostly sequential access, though. You only need
> > that much cache when you have completely random I/O across the whole
> > image. Otherwise the LRU based approach of the cache is good enough to
> > keep those tables cached that are actually in use.
> 
> Sorry, by "OS installation" I meant the installed size of the OS, which
> should be available for fast and frequent access, not the installation
> process itself. Obviously for one-time tasks like the installation process
> it's not worth it, unless one installs all the time, instead of using ready
> images, for some reason. :)

But you never use everything that is present in an OS installation of
40 GB (is it really _that_ huge these days?), and you don't read OS
files non-stop. The most frequently used parts of the OS are actually in
the guest RAM.

I don't think you'll really notice the difference in qcow2 unless you
have a really I/O intensive workload - and that is not usually for OS
files, but for user data. For only occasional accesses, the additional
64k for the metadata table wouldn't play a big role.

Kevin

Re: [Qemu-devel] [PATCH 5/5] block: Simplify append_open_options()

2018-08-14 Thread Alberto Garcia

On Tue 14 Aug 2018 01:14:42 PM CEST, Kevin Wolf wrote:
>> The main difference is that if you set backing.node-name=foo then it
>> means that 'node-name=foo' is an option of the child, the option
>> doesn't belong in the parent at all. So once it's transferred to the
>> child there's no reason why it shoud remain in the parent. It's
>> redundant and can interfere with the reopen operation (e.g. you
>> change the option in the child but when you reopen the parent it will
>> try to revert the child option to the previous value).
>
> That's a rather reopen-centric point of view, though.
>
> Redundant information isn't nice already, but what really makes it a
> problem is that it's potentially incorrect information because it
> isn't kept up to date. There is no point in keeping incorrect
> information, so I agree that deleting it is best.

Yes, it's indeed reopen-centric :-) I agree with you though, the reopen
problem is a practical example of the consequences of keeping both
options, but the main problem is that they can be different and
therefore wrong.

>> In the case of 'backing=foo' that's clearly an option of the parent,
>> there's no other place to put it. We could probably remove it as
>> well, but we have to see carefully that no parent needs to keep that
>> information. I think we want to keep child references because in
>> general we don't allow them to be changed in reopen.
>> 
>> Example: you cannot pass 'file=bar' on reopen unless 'bar' was
>> already the existing value of 'file' (i.e. you're not changing
>> anything). We need to look for the previous value in bs->options in
>> order to know that.
>
> My point is the backing=foo has the same problem: Storing it in
> bs->options is not only redundant, but potentially incorrect because
> we never update it when we modify the graph. There is no point in
> keeping potentially incorrect information.

I tend to agree, but there's one reason why it's not exactly the same
case: with "backing=foo" we know not only that the backing node name is
'foo', but that it was added using a reference rather than with
backing.node-name=foo. I'm not sure if that's information that we need
for anything, however (probably not).

> When you actually use that incorrect information, you've got a bug.
> Reopen with file=bar doesn't have to check whether 'file=bar' is in
> bs->options (because that may be outdated information), but whether
> the BdrvChild with the name 'file' points to a node called 'bar'.

Again, this is correct if we open the BDS with

  file.node-name=bar

and then we allow reopening it with

  file=bar

Different set of options, but the result still makes sense. For this we
need a specific check to verify that this is correct. For the current
behavior we don't need anything now: we only allow the exact same
option.

>> After x-blockdev-reopen is ready, 'backing' will perhaps be the
>> first/only exception, because we'll be able to change it and the
>> previous value doesn't matter.
>
> I certainly hope that it will not be the only kind of node references
> that you can change. Adding/removing filter nodes requires to be able
> to change more or less any kind of node references. But we'll see.

No, but anything that we allow changing has to be done on a case-by-case
basis. You can't simply allow changing any child reference, the specific
driver has to be modified in order to allow that.

Until the driver is changed, the current behavior prevails: if an option
was modified, we return an error.

Berto

Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Dr. David Alan Gilbert

* Paolo Bonzini (pbonz...@redhat.com) wrote:
> On 14/08/2018 12:48, Jay Zhou wrote:
> > I got the following error when migrating a VM with watchdog
> > device:
> > 
> > {"timestamp": {"seconds": 1533884471, "microseconds": 668099},
> > "event": "WATCHDOG", "data": {"action": "reset"}}
> > {"timestamp": {"seconds": 1533884471, "microseconds": 677658},
> > "event": "RESET", "data": {"guest": true}}
> > {"timestamp": {"seconds": 1533884471, "microseconds": 677874},
> > "event": "STOP"}
> > qemu-system-x86_64: invalid runstate transition: 'prelaunch' -> 
> > 'postmigrate'
> > Aborted
> > 
> > The run state transition is RUN_STATE_FINISH_MIGRATE to RUN_STATE_PRELAUNCH,
> > then the migration thread aborted when it tries to set 
> > RUN_STATE_POSTMIGRATE.
> > There is a race between the main loop thread and the migration thread I 
> > think.
> 
> In that case I think you shouldn't go to POSTMIGRATE at all, because the
> VM has been reset.

Migration has the VM stopped; it's not expecting the state to change at
that point.

> Alternatively, when the watchdog fires in RUN_STATE_FINISH_MIGRATE
> state, it might delay the action until after the "cont" command is
> invoked on the source, but I'm not sure what's the best way to achieve
> that...

Jay: Which watchdog were you using?

 a) Should the watchdog expire when the VM is stopped; I think it
shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
is the bug here that the watchdog being used didn't use a virtual
timer?

 b) If the watchdog expires just before the VM gets stopped, is there
a race which could hit this?  Possibly.

 c) Could main_loop_should_exit guard all the 'request's by
something that checks whether the VM is stopped?

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 02/22] config: CONFIG_SERIAL* is already in pci.mak

2018-08-14 Thread Paolo Bonzini

On 14/08/2018 13:40, Juan Quintela wrote:
>> CONFIG_SERIAL is a dependency of both CONFIG_SERIAL and
>> CONFIG_SERIAL_PCI.
> 
> I guess you here mean CONFIG_SERIAL_ISA or CONFIG_SERIAL_PCI.  That is
> not enough.  CONFIG_SERIAL really means CONFIG_SERIAL_COMMON, and things
> like riscv* require it

Right, I would put

   CONFIG_SERIAL=y
   CONFIG_SERIAL_ISA=y

in superio.mak and

   CONFIG_SERIAL=y
   CONFIG_SERIAL_PCI=y

in pci.mak.

>> Perhaps introduce a superio.mak with all the legacy
>> ISA devices?
> What are the surperio.mak or isa.mak.  And once there, what are the isa
> devices?  Weird things that came to my mind that can't be disabled:
> 
> - mmmouse

msmouse is a character device backend, not a device model

> - serial-isa
> - fdc
> - vmport
> - vga-isa (this is another weird thing)
> - ipmi-isa?
> - what else?

SuperIO is RTC, I8254, I8257, I8259, ISA serial, parallel, floppy.  They
can all be disabled (well, no guarantee that it compiles but the option
is there).

Paolo

Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Paolo Bonzini

On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
>  a) Should the watchdog expire when the VM is stopped; I think it
> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> is the bug here that the watchdog being used didn't use a virtual
> timer?

All watchdogs do.

>  b) If the watchdog expires just before the VM gets stopped, is there
> a race which could hit this?  Possibly.

Yes, I think it is a race that happens just before vm_stop, but I don't
understand why the "qemu_clock_enable" in pause_all_vcpus does not
prevent it.

It should be possible to write a deterministic testcase with qtest...

Paolo

Re: [Qemu-devel] [PATCH v2 02/22] config: CONFIG_SERIAL* is already in pci.mak

2018-08-14 Thread Peter Maydell

On 14 August 2018 at 12:52, Paolo Bonzini  wrote:
> On 14/08/2018 13:40, Juan Quintela wrote:
>>> CONFIG_SERIAL is a dependency of both CONFIG_SERIAL and
>>> CONFIG_SERIAL_PCI.
>>
>> I guess you here mean CONFIG_SERIAL_ISA or CONFIG_SERIAL_PCI.  That is
>> not enough.  CONFIG_SERIAL really means CONFIG_SERIAL_COMMON, and things
>> like riscv* require it
>
> Right, I would put
>
>CONFIG_SERIAL=y
>CONFIG_SERIAL_ISA=y
>
> in superio.mak and
>
>CONFIG_SERIAL=y
>CONFIG_SERIAL_PCI=y
>
> in pci.mak.

What about the boards that use the serial.c code but do not
have PCI, ISA or a superio chip? That is, all the boards/devices
that call serial_mm_init() directly to create a memory-mapped
16550.

(If anybody feels like trying to do a complicated refactoring,
serial_mm_init() is a pretty ugly legacy API around some
non-QOMified core 16550 code...)

thanks
-- PMM

Re: [Qemu-devel] [PATCH 5/5] block: Simplify append_open_options()

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 13:48 hat Alberto Garcia geschrieben:
> >> In the case of 'backing=foo' that's clearly an option of the parent,
> >> there's no other place to put it. We could probably remove it as
> >> well, but we have to see carefully that no parent needs to keep that
> >> information. I think we want to keep child references because in
> >> general we don't allow them to be changed in reopen.
> >> 
> >> Example: you cannot pass 'file=bar' on reopen unless 'bar' was
> >> already the existing value of 'file' (i.e. you're not changing
> >> anything). We need to look for the previous value in bs->options in
> >> order to know that.
> >
> > My point is the backing=foo has the same problem: Storing it in
> > bs->options is not only redundant, but potentially incorrect because
> > we never update it when we modify the graph. There is no point in
> > keeping potentially incorrect information.
> 
> I tend to agree, but there's one reason why it's not exactly the same
> case: with "backing=foo" we know not only that the backing node name is
> 'foo', but that it was added using a reference rather than with
> backing.node-name=foo. I'm not sure if that's information that we need
> for anything, however (probably not).

Isn't the proper way to check this foo->inherits_from?

But generally, it shouldn't make a difference for most purposes which
way a node was created.

> > When you actually use that incorrect information, you've got a bug.
> > Reopen with file=bar doesn't have to check whether 'file=bar' is in
> > bs->options (because that may be outdated information), but whether
> > the BdrvChild with the name 'file' points to a node called 'bar'.
> 
> Again, this is correct if we open the BDS with
> 
>   file.node-name=bar
> 
> and then we allow reopening it with
> 
>   file=bar
> 
> Different set of options, but the result still makes sense. For this we
> need a specific check to verify that this is correct. For the current
> behavior we don't need anything now: we only allow the exact same
> option.

That's yet another case and another reason why we want to check
BdrvChild instead. If we take BlockdevOptions for blockdev-reopen, you
need to put something there for nodes that you created inline
originally, and just putting the node name there probably makes the most
sense.

Anyway, the case that I had in mind is the case where you removed a
backing file with a block job:

base <- mid <- top

You stream top into mid, so at the end of the job, mid goes away and
base is the backing file of top. But since you opened top with
backing=mid, that's still what you get when you look at bs->options.

base <- top
(backing=mid)

If you now reopen top, and bdrv_reopen looks at bs->options to check
whether the operation is valid, you must specify backing=mid instead of
the correct backing=base so that reopen thinks it's unchanged.

> >> After x-blockdev-reopen is ready, 'backing' will perhaps be the
> >> first/only exception, because we'll be able to change it and the
> >> previous value doesn't matter.
> >
> > I certainly hope that it will not be the only kind of node references
> > that you can change. Adding/removing filter nodes requires to be able
> > to change more or less any kind of node references. But we'll see.
> 
> No, but anything that we allow changing has to be done on a case-by-case
> basis. You can't simply allow changing any child reference, the specific
> driver has to be modified in order to allow that.
> 
> Until the driver is changed, the current behavior prevails: if an option
> was modified, we return an error.

Makes sense to err on the safe side, though I expect that most drivers
don't need to do much more than just allowing the switch.

Maybe, if we want to keep things a bit safer, what we can do is check
that the same node is addressed when you skip all filters.

Kevin

[Qemu-devel] [PATCH v3 5/8] block/mirror: fix and improve do_sync_target_write

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

Use bdrv_dirty_bitmap_next_dirty_area() instead of
bdrv_dirty_iter_next_area(), because of the following problems of
bdrv_dirty_iter_next_area():

1. Using HBitmap iterators we should carefully handle unaligned offset,
as first call to hbitmap_iter_next() may return a value less than
original offset (actually, it will be original offset rounded down to
bitmap granularity). This handling is not done in
do_sync_target_write().

2. bdrv_dirty_iter_next_area() handles unaligned max_offset
incorrectly:

look at the code:
if (max_offset == iter->bitmap->size) {
/* If max_offset points to the image end, round it up by the
 * bitmap granularity */
gran_max_offset = ROUND_UP(max_offset, granularity);
} else {
gran_max_offset = max_offset;
}

ret = hbitmap_iter_next(&iter->hbi, false);
if (ret < 0 || ret + granularity > gran_max_offset) {
return false;
}

and assume that max_offset != iter->bitmap->size but still unaligned.
if 0 < ret < max_offset we found dirty area, but the function can
return false in this case (if ret + granularity > max_offset).

3. bdrv_dirty_iter_next_area() uses inefficient loop to find the end of
the dirty area. Let's use more efficient hbitmap_next_zero instead
(bdrv_dirty_bitmap_next_dirty_area() do so)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/mirror.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b48c3f8cf5..d2806812c8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1167,25 +1167,22 @@ static void do_sync_target_write(MirrorBlockJob *job, 
MirrorMethod method,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov, int flags)
 {
-BdrvDirtyBitmapIter *iter;
 QEMUIOVector target_qiov;
-uint64_t dirty_offset;
-int dirty_bytes;
+uint64_t dirty_offset = offset;
+uint64_t dirty_bytes;
 
 if (qiov) {
 qemu_iovec_init(&target_qiov, qiov->niov);
 }
 
-iter = bdrv_dirty_iter_new(job->dirty_bitmap);
-bdrv_set_dirty_iter(iter, offset);
-
 while (true) {
 bool valid_area;
 int ret;
 
 bdrv_dirty_bitmap_lock(job->dirty_bitmap);
-valid_area = bdrv_dirty_iter_next_area(iter, offset + bytes,
-   &dirty_offset, &dirty_bytes);
+valid_area = bdrv_dirty_bitmap_next_dirty_area(
+job->dirty_bitmap, &dirty_offset,
+MIN(offset + bytes, dirty_offset + INT_MAX), &dirty_bytes);
 if (!valid_area) {
 bdrv_dirty_bitmap_unlock(job->dirty_bitmap);
 break;
@@ -1241,9 +1238,10 @@ static void do_sync_target_write(MirrorBlockJob *job, 
MirrorMethod method,
 break;
 }
 }
+
+dirty_offset += dirty_bytes;
 }
 
-bdrv_dirty_iter_free(iter);
 if (qiov) {
 qemu_iovec_destroy(&target_qiov);
 }
-- 
2.11.1

[Qemu-devel] [PATCH v3 0/8] dirty-bitmap: rewrite bdrv_dirty_iter_next_area

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

Hi all.

1. bdrv_dirty_iter_next_area don't use hbitmap_next_zero and uses
inefficient loop instead. Let's improve it.

2. bdrv_dirty_iter_next_area don't handle unaligned offset and
max_offset correctly. I'm not sure that it is a real bug. But if it is,
we need these series in 3.0.

Details are in 05 commit message.

v3:
01: - change interface to start/end, and -1 as special end-marker instead of 0
- "not found" for invalid regions instead of assert
02: rebase on 01 changes
03: - fix mistake in hbitmap_iter_init arguments (mistake in
  hbitmap_next_zero arguments is fixed automatically due to 01 changes)
04: new


v2:

01: - improve comment
- s/bytes/count/
- fix forgotten function call in test
- introduce orig_size field here for HBitmap,
  make checking in hbitmap_next_zero more effective and safe
02: new
03: - orig_size already introduced in 01
- fix hbitmap_next_dirty_area to not return value less than
  offset on unaligned requests

Vladimir Sementsov-Ogievskiy (8):
  dirty-bitmap: improve bdrv_dirty_bitmap_next_zero
  tests: add tests for hbitmap_next_zero with specified end parameter
  dirty-bitmap: add bdrv_dirty_bitmap_next_dirty_area
  tests: add tests for hbitmap_next_dirty_area
  block/mirror: fix and improve do_sync_target_write
  Revert "block/dirty-bitmap: Add bdrv_dirty_iter_next_area"
  Revert "test-hbitmap: Add non-advancing iter_next tests"
  Revert "hbitmap: Add @advance param to hbitmap_iter_next()"

 include/block/dirty-bitmap.h |   8 +-
 include/qemu/hbitmap.h   |  30 ++--
 block/backup.c   |   4 +-
 block/dirty-bitmap.c |  69 +++--
 block/mirror.c   |  16 ++--
 nbd/server.c |   2 +-
 tests/test-hbitmap.c | 176 ---
 util/hbitmap.c   |  73 +++---
 8 files changed, 258 insertions(+), 120 deletions(-)

-- 
2.11.1

[Qemu-devel] [PATCH v3 6/8] Revert "block/dirty-bitmap: Add bdrv_dirty_iter_next_area"

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

This reverts commit 72d10a94213a954ad569095cb4491f2ae0853c40.

The function is unused now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/dirty-bitmap.h |  2 --
 block/dirty-bitmap.c | 55 
 2 files changed, 57 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index cb9162fa7e..20e1d86cb1 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -83,8 +83,6 @@ void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
 void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
 int64_t offset, int64_t bytes);
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
-bool bdrv_dirty_iter_next_area(BdrvDirtyBitmapIter *iter, uint64_t max_offset,
-   uint64_t *offset, int *bytes);
 void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
 int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 4af20a1beb..d9333175b3 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -528,61 +528,6 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 return hbitmap_iter_next(&iter->hbi, true);
 }
 
-/**
- * Return the next consecutively dirty area in the dirty bitmap
- * belonging to the given iterator @iter.
- *
- * @max_offset: Maximum value that may be returned for
- *  *offset + *bytes
- * @offset: Will contain the start offset of the next dirty area
- * @bytes:  Will contain the length of the next dirty area
- *
- * Returns: True if a dirty area could be found before max_offset
- *  (which means that *offset and *bytes then contain valid
- *  values), false otherwise.
- *
- * Note that @iter is never advanced if false is returned.  If an area
- * is found (which means that true is returned), it will be advanced
- * past that area.
- */
-bool bdrv_dirty_iter_next_area(BdrvDirtyBitmapIter *iter, uint64_t max_offset,
-   uint64_t *offset, int *bytes)
-{
-uint32_t granularity = bdrv_dirty_bitmap_granularity(iter->bitmap);
-uint64_t gran_max_offset;
-int64_t ret;
-int size;
-
-if (max_offset == iter->bitmap->size) {
-/* If max_offset points to the image end, round it up by the
- * bitmap granularity */
-gran_max_offset = ROUND_UP(max_offset, granularity);
-} else {
-gran_max_offset = max_offset;
-}
-
-ret = hbitmap_iter_next(&iter->hbi, false);
-if (ret < 0 || ret + granularity > gran_max_offset) {
-return false;
-}
-
-*offset = ret;
-size = 0;
-
-assert(granularity <= INT_MAX);
-
-do {
-/* Advance iterator */
-ret = hbitmap_iter_next(&iter->hbi, true);
-size += granularity;
-} while (ret + granularity <= gran_max_offset &&
- hbitmap_iter_next(&iter->hbi, false) == ret + granularity &&
- size <= INT_MAX - granularity);
-
-*bytes = MIN(size, max_offset - *offset);
-return true;
-}
-
 /* Called within bdrv_dirty_bitmap_lock..unlock */
 void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
   int64_t offset, int64_t bytes)
-- 
2.11.1

[Qemu-devel] [PATCH v3 7/8] Revert "test-hbitmap: Add non-advancing iter_next tests"

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

This reverts commit 269576848ec3d57d2d958cf5ac69b08c44adf816.

The functionality is unused. Drop tests.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/test-hbitmap.c | 36 
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index af5142b481..5e43056970 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -30,18 +30,6 @@ typedef struct TestHBitmapData {
 } TestHBitmapData;
 
 
-static int64_t check_hbitmap_iter_next(HBitmapIter *hbi)
-{
-int next0, next1;
-
-next0 = hbitmap_iter_next(hbi, false);
-next1 = hbitmap_iter_next(hbi, true);
-
-g_assert_cmpint(next0, ==, next1);
-
-return next0;
-}
-
 /* Check that the HBitmap and the shadow bitmap contain the same data,
  * ignoring the same "first" bits.
  */
@@ -58,7 +46,7 @@ static void hbitmap_test_check(TestHBitmapData *data,
 
 i = first;
 for (;;) {
-next = check_hbitmap_iter_next(&hbi);
+next = hbitmap_iter_next(&hbi, true);
 if (next < 0) {
 next = data->size;
 }
@@ -447,25 +435,25 @@ static void test_hbitmap_iter_granularity(TestHBitmapData 
*data,
 /* Note that hbitmap_test_check has to be invoked manually in this test.  
*/
 hbitmap_test_init(data, 131072 << 7, 7);
 hbitmap_iter_init(&hbi, data->hb, 0);
-g_assert_cmpint(check_hbitmap_iter_next(&hbi), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 
 hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
 hbitmap_iter_init(&hbi, data->hb, 0);
-g_assert_cmpint(check_hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
-g_assert_cmpint(check_hbitmap_iter_next(&hbi), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
+g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 
 hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
 g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 
 hbitmap_test_set(data, (131072 << 7) - 8, 8);
 hbitmap_iter_init(&hbi, data->hb, 0);
-g_assert_cmpint(check_hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
-g_assert_cmpint(check_hbitmap_iter_next(&hbi), ==, 131071 << 7);
-g_assert_cmpint(check_hbitmap_iter_next(&hbi), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
+g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
+g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 
 hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
-g_assert_cmpint(check_hbitmap_iter_next(&hbi), ==, 131071 << 7);
-g_assert_cmpint(check_hbitmap_iter_next(&hbi), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
+g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 }
 
 static void hbitmap_test_set_boundary_bits(TestHBitmapData *data, ssize_t diff)
@@ -905,7 +893,7 @@ static void test_hbitmap_serialize_zeroes(TestHBitmapData 
*data,
 for (i = 0; i < num_positions; i++) {
 hbitmap_deserialize_zeroes(data->hb, positions[i], min_l1, true);
 hbitmap_iter_init(&iter, data->hb, 0);
-next = check_hbitmap_iter_next(&iter);
+next = hbitmap_iter_next(&iter, true);
 if (i == num_positions - 1) {
 g_assert_cmpint(next, ==, -1);
 } else {
@@ -931,10 +919,10 @@ static void test_hbitmap_iter_and_reset(TestHBitmapData 
*data,
 
 hbitmap_iter_init(&hbi, data->hb, BITS_PER_LONG - 1);
 
-check_hbitmap_iter_next(&hbi);
+hbitmap_iter_next(&hbi, true);
 
 hbitmap_reset_all(data->hb);
-check_hbitmap_iter_next(&hbi);
+hbitmap_iter_next(&hbi, true);
 }
 
 static void test_hbitmap_next_zero_check_range(TestHBitmapData *data,
-- 
2.11.1

[Qemu-devel] [PATCH v3 1/8] dirty-bitmap: improve bdrv_dirty_bitmap_next_zero

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

Add bytes parameter to the function, to limit searched range.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/dirty-bitmap.h |  3 ++-
 include/qemu/hbitmap.h   | 10 --
 block/backup.c   |  2 +-
 block/dirty-bitmap.c |  5 +++--
 nbd/server.c |  2 +-
 tests/test-hbitmap.c |  2 +-
 util/hbitmap.c   | 25 -
 7 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 259bd27c40..27f5299c4e 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -98,7 +98,8 @@ bool bdrv_has_changed_persistent_bitmaps(BlockDriverState 
*bs);
 BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
 BdrvDirtyBitmap *bitmap);
 char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp);
-int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start);
+int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start,
+int64_t end);
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BlockDriverState *bs,
   BdrvDirtyBitmap *bitmap,
   Error **errp);
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index ddca52c48e..fe4dfde27a 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -295,10 +295,16 @@ unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi);
 /* hbitmap_next_zero:
  * @hb: The HBitmap to operate on
  * @start: The bit to start from.
+ * @end: End of range to search in. If @end is -1, search up to the bitmap
+ *   end.
  *
- * Find next not dirty bit.
+ * Find next not dirty bit within range [@start, @end), or from
+ * @start to the bitmap end if @end is -1. If not found, return -1.
+ *
+ * @end may be greater than original bitmap size, in this case, search up to
+ * the bitmap end.
  */
-int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start);
+int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, int64_t end);
 
 /* hbitmap_create_meta:
  * Create a "meta" hbitmap to track dirtiness of the bits in this HBitmap.
diff --git a/block/backup.c b/block/backup.c
index 8630d32926..9bfd3f7189 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -458,7 +458,7 @@ static void 
backup_incremental_init_copy_bitmap(BackupBlockJob *job)
 break;
 }
 
-offset = bdrv_dirty_bitmap_next_zero(job->sync_bitmap, offset);
+offset = bdrv_dirty_bitmap_next_zero(job->sync_bitmap, offset, -1);
 if (offset == -1) {
 hbitmap_set(job->copy_bitmap, cluster, end - cluster);
 break;
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index c9b8a6fd52..037ae62726 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -785,9 +785,10 @@ char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap 
*bitmap, Error **errp)
 return hbitmap_sha256(bitmap->bitmap, errp);
 }
 
-int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t offset)
+int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t offset,
+int64_t end)
 {
-return hbitmap_next_zero(bitmap->bitmap, offset);
+return hbitmap_next_zero(bitmap->bitmap, offset, end);
 }
 
 void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
diff --git a/nbd/server.c b/nbd/server.c
index ea5fe0eb33..07920d123b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1952,7 +1952,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
 assert(begin < overall_end && nb_extents);
 while (begin < overall_end && i < nb_extents) {
 if (dirty) {
-end = bdrv_dirty_bitmap_next_zero(bitmap, begin);
+end = bdrv_dirty_bitmap_next_zero(bitmap, begin, -1);
 } else {
 bdrv_set_dirty_iter(it, begin);
 end = bdrv_dirty_iter_next(it);
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 5e67ac1d3a..6b6a40bddd 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -939,7 +939,7 @@ static void test_hbitmap_iter_and_reset(TestHBitmapData 
*data,
 
 static void test_hbitmap_next_zero_check(TestHBitmapData *data, int64_t start)
 {
-int64_t ret1 = hbitmap_next_zero(data->hb, start);
+int64_t ret1 = hbitmap_next_zero(data->hb, start, -1);
 int64_t ret2 = start;
 for ( ; ret2 < data->size && hbitmap_get(data->hb, ret2); ret2++) {
 ;
diff --git a/util/hbitmap.c b/util/hbitmap.c
index bcd304041a..1687372504 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -53,6 +53,9 @@
  */
 
 struct HBitmap {
+/* Size of the bitmap, as requested in hbitmap_alloc. */
+uint64_t orig_size;
+
 /* Number of total bits in the bottom level.  */
 uint64_t size;
 
@@ -192,16 +195,26 @@ void h

[Qemu-devel] [PATCH v3 8/8] Revert "hbitmap: Add @advance param to hbitmap_iter_next()"

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

This reverts commit a33fbb4f8b64226becf502a123733776ce319b24.

The functionality is unused.

Note: in addition to automatic revert, drop second parameter in
hbitmap_iter_next() call from hbitmap_next_dirty_area() too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/qemu/hbitmap.h |  5 +
 block/backup.c |  2 +-
 block/dirty-bitmap.c   |  2 +-
 tests/test-hbitmap.c   | 26 +-
 util/hbitmap.c | 12 
 5 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 7800317bf3..6d205167ce 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -345,14 +345,11 @@ void hbitmap_free_meta(HBitmap *hb);
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
- * @advance: If true, advance the iterator.  Otherwise, the next call
- *   of this function will return the same result (if that
- *   position is still dirty).
  *
  * Return the next bit that is set in @hbi's associated HBitmap,
  * or -1 if all remaining bits are zero.
  */
-int64_t hbitmap_iter_next(HBitmapIter *hbi, bool advance);
+int64_t hbitmap_iter_next(HBitmapIter *hbi);
 
 /**
  * hbitmap_iter_next_word:
diff --git a/block/backup.c b/block/backup.c
index 9bfd3f7189..f033148f21 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -421,7 +421,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 HBitmapIter hbi;
 
 hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
-while ((cluster = hbitmap_iter_next(&hbi, true)) != -1) {
+while ((cluster = hbitmap_iter_next(&hbi)) != -1) {
 do {
 if (yield_and_check(job)) {
 return 0;
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index d9333175b3..d0d602ff52 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -525,7 +525,7 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)
 
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 {
-return hbitmap_iter_next(&iter->hbi, true);
+return hbitmap_iter_next(&iter->hbi);
 }
 
 /* Called within bdrv_dirty_bitmap_lock..unlock */
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 5e43056970..17a3c807de 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -46,7 +46,7 @@ static void hbitmap_test_check(TestHBitmapData *data,
 
 i = first;
 for (;;) {
-next = hbitmap_iter_next(&hbi, true);
+next = hbitmap_iter_next(&hbi);
 if (next < 0) {
 next = data->size;
 }
@@ -435,25 +435,25 @@ static void test_hbitmap_iter_granularity(TestHBitmapData 
*data,
 /* Note that hbitmap_test_check has to be invoked manually in this test.  
*/
 hbitmap_test_init(data, 131072 << 7, 7);
 hbitmap_iter_init(&hbi, data->hb, 0);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
 
 hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
 hbitmap_iter_init(&hbi, data->hb, 0);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
+g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
 
 hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
 
 hbitmap_test_set(data, (131072 << 7) - 8, 8);
 hbitmap_iter_init(&hbi, data->hb, 0);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
+g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
+g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
 
 hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
-g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
+g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
+g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
 }
 
 static void hbitmap_test_set_boundary_bits(TestHBitmapData *data, ssize_t diff)
@@ -893,7 +893,7 @@ static void test_hbitmap_serialize_zeroes(TestHBitmapData 
*data,
 for (i = 0; i < num_positions; i++) {
 hbitmap_deserialize_zeroes(data->hb, positions[i], min_l1, true);
 hbitmap_iter_init(&iter, data->hb, 0);
-next = hbitmap_iter_next(&iter, true);
+next = hbitmap_iter_next(&iter);
 if (i == num_positions - 1) {
 g_assert_cmpint(next, ==, -1);
 } else {
@@ -919,10 +919,10 @@ static void test_hbitmap_iter_and_reset(TestHBitmapData 
*data,
 
 hbitmap_iter_init(&hbi, data->hb, BITS_PER_LONG - 1);
 
-hbitmap_iter_next(&hbi, true);
+hbitmap_it

[Qemu-devel] [PATCH v3 4/8] tests: add tests for hbitmap_next_dirty_area

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/test-hbitmap.c | 106 +++
 1 file changed, 106 insertions(+)

diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index dddb67c3c5..af5142b481 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -1016,6 +1016,105 @@ static void test_hbitmap_next_zero_4(TestHBitmapData 
*data, const void *unused)
 test_hbitmap_next_zero_do(data, 4);
 }
 
+static void test_hbitmap_next_dirty_area_check(TestHBitmapData *data,
+   uint64_t offset,
+   uint64_t count)
+{
+uint64_t off1, off2;
+uint64_t len1 = 0, len2 = 0;
+bool ret1, ret2;
+int64_t end;
+
+off1 = offset;
+end = count ? offset + count : 0;
+ret1 = hbitmap_next_dirty_area(data->hb, &off1, end, &len1);
+
+end = count ? offset + count : data->size;
+
+for (off2 = offset; off2 < end && !hbitmap_get(data->hb, off2); off2++) {
+;
+}
+
+for (len2 = 1; off2 + len2 < end && hbitmap_get(data->hb, off2 + len2);
+ len2++)
+{
+}
+
+ret2 = off2 < end;
+if (!ret2) {
+/* leave unchanged */
+off2 = offset;
+len2 = 0;
+}
+
+g_assert_cmpint(ret1, ==, ret2);
+g_assert_cmpint(off1, ==, off2);
+g_assert_cmpint(len1, ==, len2);
+}
+
+static void test_hbitmap_next_dirty_area_do(TestHBitmapData *data,
+int granularity)
+{
+hbitmap_test_init(data, L3, granularity);
+test_hbitmap_next_dirty_area_check(data, 0, 0);
+test_hbitmap_next_dirty_area_check(data, 0, 1);
+test_hbitmap_next_dirty_area_check(data, L3 - 1, 1);
+
+hbitmap_set(data->hb, L2, 1);
+test_hbitmap_next_dirty_area_check(data, 0, 1);
+test_hbitmap_next_dirty_area_check(data, 0, L2);
+test_hbitmap_next_dirty_area_check(data, 0, 0);
+test_hbitmap_next_dirty_area_check(data, L2 - 1, 0);
+test_hbitmap_next_dirty_area_check(data, L2 - 1, 1);
+test_hbitmap_next_dirty_area_check(data, L2 - 1, 2);
+test_hbitmap_next_dirty_area_check(data, L2 - 1, 3);
+test_hbitmap_next_dirty_area_check(data, L2, 0);
+test_hbitmap_next_dirty_area_check(data, L2, 1);
+test_hbitmap_next_dirty_area_check(data, L2 + 1, 1);
+
+hbitmap_set(data->hb, L2 + 5, L1);
+test_hbitmap_next_dirty_area_check(data, 0, 0);
+test_hbitmap_next_dirty_area_check(data, L2 - 2, 8);
+test_hbitmap_next_dirty_area_check(data, L2 + 1, 5);
+test_hbitmap_next_dirty_area_check(data, L2 + 1, 3);
+test_hbitmap_next_dirty_area_check(data, L2 + 4, L1);
+test_hbitmap_next_dirty_area_check(data, L2 + 5, L1);
+test_hbitmap_next_dirty_area_check(data, L2 + 7, L1);
+test_hbitmap_next_dirty_area_check(data, L2 + L1, L1);
+test_hbitmap_next_dirty_area_check(data, L2, 0);
+test_hbitmap_next_dirty_area_check(data, L2 + 1, 0);
+
+hbitmap_set(data->hb, L2 * 2, L3 - L2 * 2);
+test_hbitmap_next_dirty_area_check(data, 0, 0);
+test_hbitmap_next_dirty_area_check(data, L2, 0);
+test_hbitmap_next_dirty_area_check(data, L2 + 1, 0);
+test_hbitmap_next_dirty_area_check(data, L2 + 5 + L1 - 1, 0);
+test_hbitmap_next_dirty_area_check(data, L2 + 5 + L1, 5);
+test_hbitmap_next_dirty_area_check(data, L2 * 2 - L1, L1 + 1);
+test_hbitmap_next_dirty_area_check(data, L2 * 2, L2);
+
+hbitmap_set(data->hb, 0, L3);
+test_hbitmap_next_dirty_area_check(data, 0, 0);
+}
+
+static void test_hbitmap_next_dirty_area_0(TestHBitmapData *data,
+   const void *unused)
+{
+test_hbitmap_next_dirty_area_do(data, 0);
+}
+
+static void test_hbitmap_next_dirty_area_1(TestHBitmapData *data,
+   const void *unused)
+{
+test_hbitmap_next_dirty_area_do(data, 1);
+}
+
+static void test_hbitmap_next_dirty_area_4(TestHBitmapData *data,
+   const void *unused)
+{
+test_hbitmap_next_dirty_area_do(data, 4);
+}
+
 int main(int argc, char **argv)
 {
 g_test_init(&argc, &argv, NULL);
@@ -1082,6 +1181,13 @@ int main(int argc, char **argv)
 hbitmap_test_add("/hbitmap/next_zero/next_zero_4",
  test_hbitmap_next_zero_4);
 
+hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_0",
+ test_hbitmap_next_dirty_area_0);
+hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_1",
+ test_hbitmap_next_dirty_area_1);
+hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_4",
+ test_hbitmap_next_dirty_area_4);
+
 g_test_run();
 
 return 0;
-- 
2.11.1

[Qemu-devel] [PATCH v3 3/8] dirty-bitmap: add bdrv_dirty_bitmap_next_dirty_area

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

The function alters bdrv_dirty_iter_next_area(), which is wrong and
less efficient (see further commit
"block/mirror: fix and improve do_sync_target_write" for description).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/dirty-bitmap.h |  3 +++
 include/qemu/hbitmap.h   | 15 +++
 block/dirty-bitmap.c |  7 +++
 util/hbitmap.c   | 38 ++
 4 files changed, 63 insertions(+)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 27f5299c4e..cb9162fa7e 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -100,6 +100,9 @@ BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState 
*bs,
 char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp);
 int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap *bitmap, uint64_t start,
 int64_t end);
+bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
+   uint64_t *offset, uint64_t end,
+   uint64_t *length);
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BlockDriverState *bs,
   BdrvDirtyBitmap *bitmap,
   Error **errp);
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index fe4dfde27a..7800317bf3 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -306,6 +306,21 @@ unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi);
  */
 int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t start, int64_t end);
 
+/* hbitmap_next_dirty_area:
+ * @hb: The HBitmap to operate on
+ * @offset: in-out parameter.
+ *  in: the offset to start from
+ *  out: (if area found) start of found area
+ * @end: end of requested region. (*@offset + *@length) will be <= @end
+ * @length: length of found area
+ *
+ * If dirty area found within [@offset, @end), returns true and sets @offset
+ * and @length appropriately. Otherwise returns true and leaves @offset and
+ * @length unchanged.
+ */
+bool hbitmap_next_dirty_area(const HBitmap *hb, uint64_t *offset,
+ uint64_t end, uint64_t *length);
+
 /* hbitmap_create_meta:
  * Create a "meta" hbitmap to track dirtiness of the bits in this HBitmap.
  * The caller owns the created bitmap and must call hbitmap_free_meta(hb) to
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 037ae62726..4af20a1beb 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -791,6 +791,13 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
 return hbitmap_next_zero(bitmap->bitmap, offset, end);
 }
 
+bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
+   uint64_t *offset, uint64_t end,
+   uint64_t *length)
+{
+return hbitmap_next_dirty_area(bitmap->bitmap, offset, end, length);
+}
+
 void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
  Error **errp)
 {
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 1687372504..bf88c1223e 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -244,6 +244,44 @@ int64_t hbitmap_next_zero(const HBitmap *hb, uint64_t 
start, int64_t end)
 return res;
 }
 
+bool hbitmap_next_dirty_area(const HBitmap *hb, uint64_t *offset,
+ uint64_t end, uint64_t *length)
+{
+HBitmapIter hbi;
+int64_t off1, off0;
+uint32_t granularity = 1UL << hb->granularity;
+
+if (end == 0) {
+end = hb->orig_size;
+}
+
+hbitmap_iter_init(&hbi, hb, *offset);
+off1 = hbitmap_iter_next(&hbi, true);
+
+if (off1 < 0 || off1 >= end) {
+return false;
+}
+
+if (off1 + granularity >= end) {
+if (off1 > *offset) {
+*offset = off1;
+}
+*length = end - *offset;
+return true;
+}
+
+off0 = hbitmap_next_zero(hb, off1 + granularity, end);
+if (off0 < 0) {
+off0 = end;
+}
+
+if (off1 > *offset) {
+*offset = off1;
+}
+*length = off0 - *offset;
+return true;
+}
+
 bool hbitmap_empty(const HBitmap *hb)
 {
 return hb->count == 0;
-- 
2.11.1

[Qemu-devel] [PATCH v3 2/8] tests: add tests for hbitmap_next_zero with specified end parameter

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/test-hbitmap.c | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 6b6a40bddd..dddb67c3c5 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -937,31 +937,49 @@ static void test_hbitmap_iter_and_reset(TestHBitmapData 
*data,
 check_hbitmap_iter_next(&hbi);
 }
 
-static void test_hbitmap_next_zero_check(TestHBitmapData *data, int64_t start)
+static void test_hbitmap_next_zero_check_range(TestHBitmapData *data,
+   int64_t start,
+   int64_t count)
 {
-int64_t ret1 = hbitmap_next_zero(data->hb, start, -1);
+int64_t ret1 = hbitmap_next_zero(data->hb, start,
+ count == -1 ? -1 : start + count);
 int64_t ret2 = start;
-for ( ; ret2 < data->size && hbitmap_get(data->hb, ret2); ret2++) {
+int64_t end = count == -1 ? data->size : start + count;
+
+for ( ; ret2 < end && hbitmap_get(data->hb, ret2); ret2++) {
 ;
 }
-if (ret2 == data->size) {
+if (ret2 == end) {
 ret2 = -1;
 }
 
 g_assert_cmpint(ret1, ==, ret2);
 }
 
+static void test_hbitmap_next_zero_check(TestHBitmapData *data, int64_t start)
+{
+test_hbitmap_next_zero_check_range(data, start, 0);
+}
+
 static void test_hbitmap_next_zero_do(TestHBitmapData *data, int granularity)
 {
 hbitmap_test_init(data, L3, granularity);
 test_hbitmap_next_zero_check(data, 0);
 test_hbitmap_next_zero_check(data, L3 - 1);
+test_hbitmap_next_zero_check_range(data, 0, 1);
+test_hbitmap_next_zero_check_range(data, L3 - 1, 1);
 
 hbitmap_set(data->hb, L2, 1);
 test_hbitmap_next_zero_check(data, 0);
 test_hbitmap_next_zero_check(data, L2 - 1);
 test_hbitmap_next_zero_check(data, L2);
 test_hbitmap_next_zero_check(data, L2 + 1);
+test_hbitmap_next_zero_check_range(data, 0, 1);
+test_hbitmap_next_zero_check_range(data, 0, L2);
+test_hbitmap_next_zero_check_range(data, L2 - 1, 1);
+test_hbitmap_next_zero_check_range(data, L2 - 1, 2);
+test_hbitmap_next_zero_check_range(data, L2, 1);
+test_hbitmap_next_zero_check_range(data, L2 + 1, 1);
 
 hbitmap_set(data->hb, L2 + 5, L1);
 test_hbitmap_next_zero_check(data, 0);
@@ -970,6 +988,10 @@ static void test_hbitmap_next_zero_do(TestHBitmapData 
*data, int granularity)
 test_hbitmap_next_zero_check(data, L2 + 5);
 test_hbitmap_next_zero_check(data, L2 + L1 - 1);
 test_hbitmap_next_zero_check(data, L2 + L1);
+test_hbitmap_next_zero_check_range(data, L2, 6);
+test_hbitmap_next_zero_check_range(data, L2 + 1, 3);
+test_hbitmap_next_zero_check_range(data, L2 + 4, L1);
+test_hbitmap_next_zero_check_range(data, L2 + 5, L1);
 
 hbitmap_set(data->hb, L2 * 2, L3 - L2 * 2);
 test_hbitmap_next_zero_check(data, L2 * 2 - L1);
@@ -977,6 +999,8 @@ static void test_hbitmap_next_zero_do(TestHBitmapData 
*data, int granularity)
 test_hbitmap_next_zero_check(data, L2 * 2 - 1);
 test_hbitmap_next_zero_check(data, L2 * 2);
 test_hbitmap_next_zero_check(data, L3 - 1);
+test_hbitmap_next_zero_check_range(data, L2 * 2 - L1, L1 + 1);
+test_hbitmap_next_zero_check_range(data, L2 * 2, L2);
 
 hbitmap_set(data->hb, 0, L3);
 test_hbitmap_next_zero_check(data, 0);
-- 
2.11.1

Re: [Qemu-devel] [PATCH v8 42/87] target/mips: Implement emulation of nanoMIPS LLWP/SCWP pair

2018-08-14 Thread Aleksandar Markovic

> From: Aleksandar Markovic 
> Sent: Monday, August 13, 2018 7:53 PM
> 
> Subject: [PATCH v8 42/87] target/mips: Implement emulation of nanoMIPS 
> LLWP/SCWP > pair
> 
> From: Aleksandar Rikalo 
> 
> Implement support for nanoMIPS LLWP/SCWP instruction pair.
> 
> Signed-off-by: Dimitrije Nikolic 
> Signed-off-by: Aleksandar Markovic 
> Signed-off-by: Stefan Markovic 
> ---
>  linux-user/mips/cpu_loop.c | 25 
>  target/mips/cpu.h  |  2 ++
>  target/mips/translate.c| 74 
> ++
>  3 files changed, 96 insertions(+), 5 deletions(-)
> 

I was told that all applicable tests now pass, so:

Reviewed-by: Aleksandar Markovic 

Many thanks to Richard Henderson for guidance and patience.


> diff --git a/linux-user/mips/cpu_loop.c b/linux-user/mips/cpu_loop.c
> index 084ad6a..1d3dc9e 100644
> --- a/linux-user/mips/cpu_loop.c
> +++ b/linux-user/mips/cpu_loop.c
> @@ -397,10 +397,13 @@ static int do_store_exclusive(CPUMIPSState *env)
>  target_ulong addr;
>  target_ulong page_addr;
>  target_ulong val;
> +uint32_t val_wp = 0;
> +uint32_t llnewval_wp = 0;
>  int flags;
>  int segv = 0;
>  int reg;
>  int d;
> +int wp;
> 
>  addr = env->lladdr;
>  page_addr = addr & TARGET_PAGE_MASK;
> @@ -412,19 +415,31 @@ static int do_store_exclusive(CPUMIPSState *env)
>  } else {
>  reg = env->llreg & 0x1f;
>  d = (env->llreg & 0x20) != 0;
> -if (d) {
> -segv = get_user_s64(val, addr);
> +wp = (env->llreg & 0x40) != 0;
> +if (!wp) {
> +if (d) {
> +segv = get_user_s64(val, addr);
> +} else {
> +segv = get_user_s32(val, addr);
> +}
>  } else {
>  segv = get_user_s32(val, addr);
> +segv |= get_user_s32(val_wp, addr);
> +llnewval_wp = env->llnewval_wp;
>  }
>  if (!segv) {
> -if (val != env->llval) {
> +if (val != env->llval && val_wp == llnewval_wp) {
>  env->active_tc.gpr[reg] = 0;
>  } else {
> -if (d) {
> -segv = put_user_u64(env->llnewval, addr);
> +if (!wp) {
> +if (d) {
> +segv = put_user_u64(env->llnewval, addr);
> +} else {
> +segv = put_user_u32(env->llnewval, addr);
> +}
>  } else {
>  segv = put_user_u32(env->llnewval, addr);
> +segv |= put_user_u32(env->llnewval_wp, addr + 4);
>  }
>  if (!segv) {
>  env->active_tc.gpr[reg] = 1;
> diff --git a/target/mips/cpu.h b/target/mips/cpu.h
> index 8a8782b..bf9c634 100644
> --- a/target/mips/cpu.h
> +++ b/target/mips/cpu.h
> @@ -506,6 +506,8 @@ struct CPUMIPSState {
>  uint64_t lladdr;
>  target_ulong llval;
>  target_ulong llnewval;
> +uint64_t llval_wp;
> +uint32_t llnewval_wp;
>  target_ulong llreg;
>  uint64_t CP0_LLAddr_rw_bitmask;
>  int CP0_LLAddr_shift;
> diff --git a/target/mips/translate.c b/target/mips/translate.c
> index 9f27aab..70785f2 100644
> --- a/target/mips/translate.c
> +++ b/target/mips/translate.c
> @@ -2401,6 +2401,31 @@ static void gen_ld(DisasContext *ctx, uint32_t opc,
>  tcg_temp_free(t0);
>  }
> 
> +static void gen_llwp(DisasContext *ctx, uint32_t base, int16_t offset,
> +uint32_t reg1, uint32_t reg2)
> +{
> +TCGv taddr = tcg_temp_new();
> +TCGv_i64 tval = tcg_temp_new_i64();
> +TCGv tmp1 = tcg_temp_new();
> +TCGv tmp2 = tcg_temp_new();
> +
> +gen_base_offset_addr(ctx, taddr, base, offset);
> +tcg_gen_qemu_ld64(tval, taddr, ctx->mem_idx);
> +#ifdef TARGET_WORDS_BIGENDIAN
> +tcg_gen_extr_i64_tl(tmp2, tmp1, tval);
> +#else
> +tcg_gen_extr_i64_tl(tmp1, tmp2, tval);
> +#endif
> +gen_store_gpr(tmp1, reg1);
> +tcg_temp_free(tmp1);
> +gen_store_gpr(tmp2, reg2);
> +tcg_temp_free(tmp2);
> +tcg_gen_st_i64(tval, cpu_env, offsetof(CPUMIPSState, llval_wp));
> +tcg_temp_free_i64(tval);
> +tcg_gen_st_tl(taddr, cpu_env, offsetof(CPUMIPSState, lladdr));
> +tcg_temp_free(taddr);
> +}
> +
>  /* Store */
>  static void gen_st (DisasContext *ctx, uint32_t opc, int rt,
>  int base, int offset)
> @@ -2497,6 +2522,51 @@ static void gen_st_cond (DisasContext *ctx, uint32_t 
> opc, > int rt,
>  tcg_temp_free(t0);
>  }
> 
> +static void gen_scwp(DisasContext *ctx, uint32_t base, int16_t offset,
> +uint32_t reg1, uint32_t reg2)
> +{
> +TCGv taddr = tcg_temp_local_new();
> +TCGv lladdr = tcg_temp_local_new();
> +TCGv_i64 tval = tcg_temp_new_i64();
> +TCGv_i64 llval = tcg_temp_new_i64();
> +TCGv_i64 val = tcg_temp_new_i64();
> +TCGv tmp1 = tcg_temp_new();
> +TCGv tmp2 = tcg_temp_new();
> +

Re: [Qemu-devel] [PATCH v8 08/87] target/mips: Add support for availability control via bit XNP

2018-08-14 Thread Aleksandar Markovic

> From: Aleksandar Markovic 
> Sent: Monday, August 13, 2018 7:52 PM
> 
> Subject: [PATCH v8 08/87] target/mips: Add support for availability control 
> via bit XNP
> 
> From: Aleksandar Rikalo 
> 
> Add a field in hflags for XNP bit, and a function check_xnp().
> 
> Signed-off-by: Aleksandar Markovic 
> Signed-off-by: Stefan Markovic 
> ---
>  target/mips/cpu.h   |  3 ++-
>  target/mips/internal.h  |  5 -
>  target/mips/translate.c | 12 
>  3 files changed, 18 insertions(+), 2 deletions(-)

Reviewed-by: Aleksandar Markovic

Re: [Qemu-devel] [PATCH v7 6/9] qcow2: Increase the default upper limit on the L2 cache size

2018-08-14 Thread Leonid Bloch


On 8/14/18 2:44 PM, Kevin Wolf wrote:

Am 14.08.2018 um 13:34 hat Leonid Bloch geschrieben:

On 8/14/18 11:18 AM, Kevin Wolf wrote:

Am 13.08.2018 um 18:42 hat Leonid Bloch geschrieben:

I don't actually think it's so bad to keep the cache permanently
allocated, but I wouldn't object to a lower default for non-Linux hosts
either. 1 MB may still be a little too low, 4 MB (covers up to 32 GB)
might be more adequate. My typical desktop VMs are larger than 8 GB, but
smaller than 32 GB.


And for a Windows VM just the OS installation takes above 40 GB. While we
probably are not running Windows VMs for our own needs, it is very common
that a customer of, for example, some cloud service uses QEMU (unknowingly)
for a full-blown Windows. So 100 GB+ images which are quite heavily used is
not a rare scenario. 256 GB - yeah, that would be on the higher end.


The OS installation is mostly sequential access, though. You only need
that much cache when you have completely random I/O across the whole
image. Otherwise the LRU based approach of the cache is good enough to
keep those tables cached that are actually in use.


Sorry, by "OS installation" I meant the installed size of the OS, which
should be available for fast and frequent access, not the installation
process itself. Obviously for one-time tasks like the installation process
it's not worth it, unless one installs all the time, instead of using ready
images, for some reason. :)


But you never use everything that is present in an OS installation of
40 GB (is it really _that_ huge these days?), and you don't read OS
files non-stop. The most frequently used parts of the OS are actually in
the guest RAM.


Yes, Windows 8.1, with all the desktop bloat - just above 40 GB. :]
I did a proper benchmarking indeed only on heavy I/O load, where full 
cache did show above 50% improvement, although just regular usage felt 
faster as well, but maybe it's just psychosomatic. :)


Leonid.



I don't think you'll really notice the difference in qcow2 unless you
have a really I/O intensive workload - and that is not usually for OS
files, but for user data. For only occasional accesses, the additional
64k for the metadata table wouldn't play a big role.

Kevin

Re: [Qemu-devel] [PATCH] qemu-img: fix regression copying secrets during convert

2018-08-14 Thread Daniel P . Berrangé

On Tue, Aug 14, 2018 at 01:38:24PM +0200, Kevin Wolf wrote:
> Am 14.08.2018 um 11:35 hat Daniel P. Berrangé geschrieben:
> > When the convert command is creating an output file that needs
> > secrets, we need to ensure those secrets are passed to both the
> > blk_new_open and bdrv_create API calls.
> > 
> > This is done by qemu-img extracting all opts matching the name
> > suffix "key-secret". Unfortunately the code doing this was run after the
> > call to bdrv_create(), which meant the QemuOpts it was extracting
> > secrets from was now empty.
> > 
> > Previously this worked by luks as a bug meant the "key-secret"
> > parameters were not purged from the QemuOpts. This bug was fixed in
> > 
> >   commit b76b4f604521e59f857d6177bc55f6f2e41fd392
> >   Author: Kevin Wolf 
> >   Date:   Thu Jan 11 16:18:08 2018 +0100
> > 
> > qcow2: Use visitor for options in qcow2_create()
> > 
> > Exposing the latent bug in qemu-img. This fix simply moves the copying
> > of secrets to before the bdrv_create() call.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> 
> Cc: qemu-sta...@nongnu.org
> 
> > @@ -2461,9 +2458,9 @@ out:
> >  qemu_progress_print(100, 0);
> >  }
> >  qemu_progress_end();
> > -qemu_opts_del(opts);
> 
> Why don't we need to free opts any more?

Urgh, that's a mistake, I deleted the wrong line. v2 arriving soon

> 
> >  qemu_opts_free(create_opts);
> >  qemu_opts_del(sn_opts);
> > +qobject_unref(open_opts);
> >  blk_unref(s.target);
> >  if (s.src) {
> >  for (bs_i = 0; bs_i < s.src_num; bs_i++) {

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[Qemu-devel] [PATCH v2] qemu-img: fix regression copying secrets during convert

2018-08-14 Thread Daniel P . Berrangé

When the convert command is creating an output file that needs
secrets, we need to ensure those secrets are passed to both the
blk_new_open and bdrv_create API calls.

This is done by qemu-img extracting all opts matching the name
suffix "key-secret". Unfortunately the code doing this was run after the
call to bdrv_create(), which meant the QemuOpts it was extracting
secrets from was now empty.

Previously this worked by luks as a bug meant the "key-secret"
parameters were not purged from the QemuOpts. This bug was fixed in

  commit b76b4f604521e59f857d6177bc55f6f2e41fd392
  Author: Kevin Wolf 
  Date:   Thu Jan 11 16:18:08 2018 +0100

qcow2: Use visitor for options in qcow2_create()

Exposing the latent bug in qemu-img. This fix simply moves the copying
of secrets to before the bdrv_create() call.

Signed-off-by: Daniel P. Berrangé 
---
 qemu-img.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 1acddf693c..b12f4cd19b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -345,21 +345,6 @@ static int img_add_key_secrets(void *opaque,
 return 0;
 }
 
-static BlockBackend *img_open_new_file(const char *filename,
-   QemuOpts *create_opts,
-   const char *fmt, int flags,
-   bool writethrough, bool quiet,
-   bool force_share)
-{
-QDict *options = NULL;
-
-options = qdict_new();
-qemu_opt_foreach(create_opts, img_add_key_secrets, options, &error_abort);
-
-return img_open_file(filename, options, fmt, flags, writethrough, quiet,
- force_share);
-}
-
 
 static BlockBackend *img_open(bool image_opts,
   const char *filename,
@@ -2018,6 +2003,7 @@ static int img_convert(int argc, char **argv)
 BlockDriverState *out_bs;
 QemuOpts *opts = NULL, *sn_opts = NULL;
 QemuOptsList *create_opts = NULL;
+QDict *open_opts = NULL;
 char *options = NULL;
 Error *local_err = NULL;
 bool writethrough, src_writethrough, quiet = false, image_opts = false,
@@ -2362,6 +2348,16 @@ static int img_convert(int argc, char **argv)
 }
 }
 
+/*
+ * The later open call will need any decryption secrets, and
+ * bdrv_create() will purge "opts", so extract them now before
+ * they are lost.
+ */
+if (!skip_create) {
+open_opts = qdict_new();
+qemu_opt_foreach(opts, img_add_key_secrets, open_opts, &error_abort);
+}
+
 if (!skip_create) {
 /* Create the new image */
 ret = bdrv_create(drv, out_filename, opts, &local_err);
@@ -2388,8 +2384,9 @@ static int img_convert(int argc, char **argv)
  * That has to wait for bdrv_create to be improved
  * to allow filenames in option syntax
  */
-s.target = img_open_new_file(out_filename, opts, out_fmt,
- flags, writethrough, quiet, false);
+s.target = img_open_file(out_filename, open_opts, out_fmt,
+ flags, writethrough, quiet, false);
+open_opts = NULL; /* blk_new_open will have freed it */
 }
 if (!s.target) {
 ret = -1;
@@ -2464,6 +2461,7 @@ out:
 qemu_opts_del(opts);
 qemu_opts_free(create_opts);
 qemu_opts_del(sn_opts);
+qobject_unref(open_opts);
 blk_unref(s.target);
 if (s.src) {
 for (bs_i = 0; bs_i < s.src_num; bs_i++) {
-- 
2.17.1

[Qemu-devel] [PATCH 07/10] target/arm: Implement ESR_EL2/HSR for AArch32 and no-EL2

2018-08-14 Thread Peter Maydell

The AArch32 HSR is the equivalent of AArch64 ESR_EL2;
we can implement it by marking our existing ESR_EL2 regdef
as STATE_BOTH. It also needs to be "RES0 from EL3 if
EL2 not implemented", so add the missing stanza to
el3_no_el2_cp_reginfo.

Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index d6e98e9d606..80855302089 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3763,6 +3763,10 @@ static const ARMCPRegInfo el3_no_el2_cp_reginfo[] = {
   .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 4,
   .access = PL2_RW,
   .type = ARM_CP_CONST, .resetvalue = 0 },
+{ .name = "ESR_EL2", .state = ARM_CP_STATE_BOTH,
+  .opc0 = 3, .opc1 = 4, .crn = 5, .crm = 2, .opc2 = 0,
+  .access = PL2_RW,
+  .type = ARM_CP_CONST, .resetvalue = 0 },
 { .name = "CPTR_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 2,
   .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
@@ -3926,7 +3930,7 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 1,
   .access = PL2_RW,
   .fieldoffset = offsetof(CPUARMState, elr_el[2]) },
-{ .name = "ESR_EL2", .state = ARM_CP_STATE_AA64,
+{ .name = "ESR_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 5, .crm = 2, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.esr_el[2]) },
 { .name = "FAR_EL2", .state = ARM_CP_STATE_BOTH,
-- 
2.18.0

[Qemu-devel] [PATCH 00/10] target/arm: Some pieces of support for 32-bit Hyp mode

2018-08-14 Thread Peter Maydell

Now we have virtualization support in the GICv2 emulation,
I thought I'd have a look at how much we were still missing
for being able to enable EL2 support for AArch32.
This set of patches fixes some minor missing pieces:
 * various small bugs in cp15 registers or places where
   we were missing the 32-bit version of a 64-bit register
 * a bugfix for MSR/MRS (banked), which were not allowing
   Hyp mode to access ELR_Hyp
 * implementation of the ERET instruction for A32/T32
 * support for taking exceptions to Hyp mode (the largest
   of these missing bits)

This isn't complete, but I thought I'd push these patches
out for review. My test setup is that I have another
couple of patches, one which fixes up hw/arm/boot.c to
boot AArch32 kernels in Hyp mode if it exists, and one
which sets ARM_FEATURE_EL2 on our A15 model. With those I
can get an outer kernel to boot with KVM support and try
to run an inner guest kernel. The inner kernel boots OK
but gets random segfaults in its userspace -- I haven't
tracked down why this is yet...

Some bits that are definitely missing:
 * ATS1HR, ATS1HW address translation ops
 * I need to check that the trap semantics for AArch32
   regs line up with their AArch64 counterparts

I also noticed that we fail to implement really quite a lot
of the HCR_EL2 trap semantics for either AArch64 or AArch32,
to the extent that I'm surprised that nested guests work
under AArch64 :-)

This patchset is based on top of my target-arm.for-3.1
branch.

thanks
-- PMM

Peter Maydell (10):
  target/arm: Correct typo in HAMAIR1 regdef name
  target/arm: Add missing .cp = 15 to HMAIR1 and HAMAIR1 regdefs
  target/arm: Implement RAZ/WI HACTLR2
  target/arm: Implement AArch32 HVBAR
  target/arm: Implement AArch32 HCR and HCR2
  target/arm: Implement AArch32 Hyp FARs
  target/arm: Implement ESR_EL2/HSR for AArch32 and no-EL2
  target/arm: Permit accesses to ELR_Hyp from Hyp mode via MSR/MRS
(banked)
  target/arm: Implement AArch32 ERET instruction
  target/arm: Implement support for taking exceptions to Hyp mode

 target/arm/helper.c| 226 ++---
 target/arm/op_helper.c |  22 ++--
 target/arm/translate.c |  41 +++-
 3 files changed, 236 insertions(+), 53 deletions(-)

-- 
2.18.0

[Qemu-devel] [PATCH 08/10] target/arm: Permit accesses to ELR_Hyp from Hyp mode via MSR/MRS (banked)

2018-08-14 Thread Peter Maydell

The MSR (banked) and MRS (banked) instructions allow accesses to ELR_Hyp
from either Monitor or Hyp mode. Our translate time check
was overly strict and only permitted access from Monitor mode.

The runtime check wo do in msr_mrs_banked_exc_checks() had the
correct code in it, but never got there because of the earlier
"currmode == tgtmode" check. Special case ELR_Hyp.

Signed-off-by: Peter Maydell 
---
 target/arm/op_helper.c | 22 +++---
 target/arm/translate.c | 10 +++---
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index d550978b5b9..952b8d122b7 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -611,6 +611,14 @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, 
uint32_t tgtmode,
  */
 int curmode = env->uncached_cpsr & CPSR_M;
 
+if (regno == 17) {
+/* ELR_Hyp: a special case because access from tgtmode is OK */
+if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
+goto undef;
+}
+return;
+}
+
 if (curmode == tgtmode) {
 goto undef;
 }
@@ -638,17 +646,9 @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, 
uint32_t tgtmode,
 }
 
 if (tgtmode == ARM_CPU_MODE_HYP) {
-switch (regno) {
-case 17: /* ELR_Hyp */
-if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
-goto undef;
-}
-break;
-default:
-if (curmode != ARM_CPU_MODE_MON) {
-goto undef;
-}
-break;
+/* SPSR_Hyp, r13_hyp: accessible from Monitor mode only */
+if (curmode != ARM_CPU_MODE_MON) {
+goto undef;
 }
 }
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index f845da7c638..3f5751d4826 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4506,10 +4506,14 @@ static bool msr_banked_access_decode(DisasContext *s, 
int r, int sysm, int rn,
 }
 break;
 case ARM_CPU_MODE_HYP:
-/* Note that we can forbid accesses from EL2 here because they
- * must be from Hyp mode itself
+/*
+ * SPSR_hyp and r13_hyp can only be accessed from Monitor mode
+ * (and so we can forbid accesses from EL2 or below). elr_hyp
+ * can be accessed also from Hyp mode, so forbid accesses from
+ * EL0 or EL1.
  */
-if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 3) {
+if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2 ||
+(s->current_el < 3 && *regno != 17)) {
 goto undef;
 }
 break;
-- 
2.18.0

[Qemu-devel] [PATCH 05/10] target/arm: Implement AArch32 HCR and HCR2

2018-08-14 Thread Peter Maydell

The AArch32 HCR and HCR2 registers alias HCR_EL2
bits [31:0] and [63:32]; implement them.

Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 39 +++
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index b6412fe9d1f..9701e413859 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3754,11 +3754,15 @@ static const ARMCPRegInfo el3_no_el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 0, .opc2 = 0,
   .access = PL2_RW,
   .readfn = arm_cp_read_zero, .writefn = arm_cp_write_ignore },
-{ .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
+{ .name = "HCR_EL2", .state = ARM_CP_STATE_BOTH,
   .type = ARM_CP_NO_RAW,
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
   .access = PL2_RW,
-  .readfn = arm_cp_read_zero, .writefn = arm_cp_write_ignore },
+  .type = ARM_CP_CONST, .resetvalue = 0 },
+{ .name = "HCR2", .state = ARM_CP_STATE_AA32,
+  .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 4,
+  .access = PL2_RW,
+  .type = ARM_CP_CONST, .resetvalue = 0 },
 { .name = "CPTR_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 2,
   .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
@@ -3872,10 +3876,26 @@ static void hcr_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
  * HCR_PTW forbids certain page-table setups
  * HCR_DC Disables stage1 and enables stage2 translation
  */
-if ((raw_read(env, ri) ^ value) & (HCR_VM | HCR_PTW | HCR_DC)) {
+if ((env->cp15.hcr_el2 ^ value) & (HCR_VM | HCR_PTW | HCR_DC)) {
 tlb_flush(CPU(cpu));
 }
-raw_write(env, ri, value);
+env->cp15.hcr_el2 = value;
+}
+
+static void hcr_writehigh(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+/* Handle HCR2 write, i.e. write to high half of HCR_EL2 */
+value = deposit64(env->cp15.hcr_el2, 32, 32, value);
+hcr_write(env, NULL, value);
+}
+
+static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
+ uint64_t value)
+{
+/* Handle HCR write, i.e. write to low half of HCR_EL2 */
+value = deposit64(env->cp15.hcr_el2, 0, 32, value);
+hcr_write(env, NULL, value);
 }
 
 static const ARMCPRegInfo el2_cp_reginfo[] = {
@@ -3883,6 +3903,17 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
   .writefn = hcr_write },
+{ .name = "HCR", .state = ARM_CP_STATE_AA32,
+  .type = ARM_CP_ALIAS,
+  .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
+  .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
+  .writefn = hcr_writelow },
+{ .name = "HCR2", .state = ARM_CP_STATE_AA32,
+  .type = ARM_CP_ALIAS,
+  .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 4,
+  .access = PL2_RW,
+  .fieldoffset = offsetofhigh32(CPUARMState, cp15.hcr_el2),
+  .writefn = hcr_writehigh },
 { .name = "ELR_EL2", .state = ARM_CP_STATE_AA64,
   .type = ARM_CP_ALIAS,
   .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 1,
-- 
2.18.0

[Qemu-devel] [PATCH 06/10] target/arm: Implement AArch32 Hyp FARs

2018-08-14 Thread Peter Maydell

The AArch32 virtualization extensions support these fault address
registers:
 * HDFAR: aliased with AArch64 FAR_EL2[31:0] and AArch32 DFAR(S)
 * HIFAR: aliased with AArch64 FAR_EL2[63:32] and AArch32 IFAR(S)

Implement the accessors for these. This fixes in passing a bug
where we weren't implementing the "RES0 from EL3 if EL2 not
implemented" behaviour for AArch64 FAR_EL2.

Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 9701e413859..d6e98e9d606 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3847,6 +3847,13 @@ static const ARMCPRegInfo el3_no_el2_cp_reginfo[] = {
 { .name = "HSTR_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 3,
   .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+{ .name = "FAR_EL2", .state = ARM_CP_STATE_BOTH,
+  .opc0 = 3, .opc1 = 4, .crn = 6, .crm = 0, .opc2 = 0,
+  .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+{ .name = "HIFAR", .state = ARM_CP_STATE_AA32,
+  .type = ARM_CP_CONST,
+  .cp = 15, .opc1 = 4, .crn = 6, .crm = 0, .opc2 = 2,
+  .access = PL2_RW, .resetvalue = 0 },
 REGINFO_SENTINEL
 };
 
@@ -3922,9 +3929,14 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
 { .name = "ESR_EL2", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 4, .crn = 5, .crm = 2, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.esr_el[2]) },
-{ .name = "FAR_EL2", .state = ARM_CP_STATE_AA64,
+{ .name = "FAR_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 6, .crm = 0, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.far_el[2]) },
+{ .name = "HIFAR", .state = ARM_CP_STATE_AA32,
+  .type = ARM_CP_ALIAS,
+  .cp = 15, .opc1 = 4, .crn = 6, .crm = 0, .opc2 = 2,
+  .access = PL2_RW,
+  .fieldoffset = offsetofhigh32(CPUARMState, cp15.far_el[2]) },
 { .name = "SPSR_EL2", .state = ARM_CP_STATE_AA64,
   .type = ARM_CP_ALIAS,
   .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 0,
-- 
2.18.0

[Qemu-devel] [PATCH 09/10] target/arm: Implement AArch32 ERET instruction

2018-08-14 Thread Peter Maydell

ARMv7VE introduced the ERET instruction, which is necessary to
return from an exception taken to Hyp mode. Implement this.
In A32 encoding it is a completely new encoding; in T32 it
is an adjustment of the behaviour of the existing
"SUBS PC, LR, #" instruction.

Signed-off-by: Peter Maydell 
---
 target/arm/translate.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 3f5751d4826..5ecc24f12fb 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8887,6 +8887,25 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)
 tcg_temp_free_i32(tmp2);
 store_reg(s, rd, tmp);
 break;
+case 0x6: /* ERET */
+if (op1 != 3) {
+goto illegal_op;
+}
+if (!arm_dc_feature(s, ARM_FEATURE_V7VE)) {
+goto illegal_op;
+}
+if ((insn & 0x000fff0f) != 0x000e) {
+/* UNPREDICTABLE; we choose to UNDEF */
+goto illegal_op;
+}
+
+if (s->current_el == 2) {
+tmp = load_cpu_field(elr_el[2]);
+} else {
+tmp = load_reg(s, 14);
+}
+gen_exception_return(s, tmp);
+break;
 case 7:
 {
 int imm16 = extract32(insn, 0, 4) | (extract32(insn, 8, 12) << 4);
@@ -11144,8 +11163,16 @@ static void disas_thumb2_insn(DisasContext *s, 
uint32_t insn)
 if (rn != 14 || rd != 15) {
 goto illegal_op;
 }
-tmp = load_reg(s, rn);
-tcg_gen_subi_i32(tmp, tmp, insn & 0xff);
+if (s->current_el == 2) {
+/* ERET from Hyp uses ELR_Hyp, not LR */
+if (insn & 0xff) {
+goto illegal_op;
+}
+tmp = load_cpu_field(elr_el[2]);
+} else {
+tmp = load_reg(s, rn);
+tcg_gen_subi_i32(tmp, tmp, insn & 0xff);
+}
 gen_exception_return(s, tmp);
 break;
 case 6: /* MRS */
-- 
2.18.0

[Qemu-devel] [PATCH 03/10] target/arm: Implement RAZ/WI HACTLR2

2018-08-14 Thread Peter Maydell

The AArch32 HACTLR2 register maps to bits [63:32] of ACTLR_EL2.
We implement ACTLR_EL2 as RAZ/WI, so make HACTLR2 also RAZ/WI.
(We put the regdef next to ACTLR_EL2 as a reminder in case we
ever make ACTLR_EL2 something other than RAZ/WI).

Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 466c8ae492e..14fd78f587a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5436,6 +5436,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 0, .opc2 = 1,
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
+/* HACTLR2 maps to ACTLR_EL2[63:32] */
+{ .name = "HACTLR2", .state = ARM_CP_STATE_AA32,
+  .cp = 15, .opc1 = 4, .crn = 1, .crm = 0, .opc2 = 3,
+  .access = PL2_RW, .type = ARM_CP_CONST,
+  .resetvalue = 0 },
 { .name = "ACTLR_EL3", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 0, .opc2 = 1,
   .access = PL3_RW, .type = ARM_CP_CONST,
-- 
2.18.0

[Qemu-devel] [PATCH 02/10] target/arm: Add missing .cp = 15 to HMAIR1 and HAMAIR1 regdefs

2018-08-14 Thread Peter Maydell

ARMCPRegInfo structs will not default to .cp = 15 if they
are ARM_CP_STATE_BOTH, but not if they are ARM_CP_STATE_AA32
(because a coprocessor number of 0 is valid for AArch32).
We forgot to explicitly set .cp = 15 for the HMAIR1 and
HAMAIR1 regdefs, which meant they would UNDEF when the guest
tried to access them under cp15.

Signed-off-by: Peter Maydell 
---
A quick grep suggests these are the only ones we got wrong.
---
 target/arm/helper.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 2c5e02c0b1a..466c8ae492e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3767,14 +3767,14 @@ static const ARMCPRegInfo el3_no_el2_cp_reginfo[] = {
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
 { .name = "HMAIR1", .state = ARM_CP_STATE_AA32,
-  .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 1,
+  .cp = 15, .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 1,
   .access = PL2_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 { .name = "AMAIR_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 10, .crm = 3, .opc2 = 0,
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
 { .name = "HAMAIR1", .state = ARM_CP_STATE_AA32,
-  .opc1 = 4, .crn = 10, .crm = 3, .opc2 = 1,
+  .cp = 15, .opc1 = 4, .crn = 10, .crm = 3, .opc2 = 1,
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
 { .name = "AFSR0_EL2", .state = ARM_CP_STATE_BOTH,
@@ -3917,7 +3917,7 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[2]),
   .resetvalue = 0 },
 { .name = "HMAIR1", .state = ARM_CP_STATE_AA32,
-  .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 1,
+  .cp = 15, .opc1 = 4, .crn = 10, .crm = 2, .opc2 = 1,
   .access = PL2_RW, .type = ARM_CP_ALIAS,
   .fieldoffset = offsetofhigh32(CPUARMState, cp15.mair_el[2]) },
 { .name = "AMAIR_EL2", .state = ARM_CP_STATE_BOTH,
@@ -3926,7 +3926,7 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .resetvalue = 0 },
 /* HAMAIR1 is mapped to AMAIR_EL2[63:32] */
 { .name = "HAMAIR1", .state = ARM_CP_STATE_AA32,
-  .opc1 = 4, .crn = 10, .crm = 3, .opc2 = 1,
+  .cp = 15, .opc1 = 4, .crn = 10, .crm = 3, .opc2 = 1,
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
 { .name = "AFSR0_EL2", .state = ARM_CP_STATE_BOTH,
-- 
2.18.0

[Qemu-devel] [PATCH 2/2] block: drop empty .bdrv_close handlers

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

.bdrv_close handler is optional after previous commit, no needs to keep
empty functions more.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/blkreplay.c| 5 -
 block/commit.c   | 5 -
 block/copy-on-read.c | 6 --
 block/mirror.c   | 5 -
 block/null.c | 6 --
 block/raw-format.c   | 5 -
 6 files changed, 32 deletions(-)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index 766150ade6..b5d9efdeca 100755
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -43,10 +43,6 @@ fail:
 return ret;
 }
 
-static void blkreplay_close(BlockDriverState *bs)
-{
-}
-
 static int64_t blkreplay_getlength(BlockDriverState *bs)
 {
 return bdrv_getlength(bs->file->bs);
@@ -135,7 +131,6 @@ static BlockDriver bdrv_blkreplay = {
 .instance_size  = 0,
 
 .bdrv_open  = blkreplay_open,
-.bdrv_close = blkreplay_close,
 .bdrv_child_perm= bdrv_filter_default_perms,
 .bdrv_getlength = blkreplay_getlength,
 
diff --git a/block/commit.c b/block/commit.c
index e1814d9693..eb414579bd 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -239,10 +239,6 @@ static void 
bdrv_commit_top_refresh_filename(BlockDriverState *bs, QDict *opts)
 bs->backing->bs->filename);
 }
 
-static void bdrv_commit_top_close(BlockDriverState *bs)
-{
-}
-
 static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
@@ -260,7 +256,6 @@ static BlockDriver bdrv_commit_top = {
 .bdrv_co_preadv = bdrv_commit_top_preadv,
 .bdrv_co_block_status   = bdrv_co_block_status_from_backing,
 .bdrv_refresh_filename  = bdrv_commit_top_refresh_filename,
-.bdrv_close = bdrv_commit_top_close,
 .bdrv_child_perm= bdrv_commit_top_child_perm,
 };
 
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index a19164f9eb..64dcc424b5 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -45,11 +45,6 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 
-static void cor_close(BlockDriverState *bs)
-{
-}
-
-
 #define PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
   | BLK_PERM_WRITE \
   | BLK_PERM_RESIZE)
@@ -143,7 +138,6 @@ BlockDriver bdrv_copy_on_read = {
 .format_name= "copy-on-read",
 
 .bdrv_open  = cor_open,
-.bdrv_close = cor_close,
 .bdrv_child_perm= cor_child_perm,
 
 .bdrv_getlength = cor_getlength,
diff --git a/block/mirror.c b/block/mirror.c
index d2806812c8..e949aa1fc5 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1424,10 +1424,6 @@ static void 
bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
 bs->backing->bs->filename);
 }
 
-static void bdrv_mirror_top_close(BlockDriverState *bs)
-{
-}
-
 static void bdrv_mirror_top_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
@@ -1454,7 +1450,6 @@ static BlockDriver bdrv_mirror_top = {
 .bdrv_co_flush  = bdrv_mirror_top_flush,
 .bdrv_co_block_status   = bdrv_co_block_status_from_backing,
 .bdrv_refresh_filename  = bdrv_mirror_top_refresh_filename,
-.bdrv_close = bdrv_mirror_top_close,
 .bdrv_child_perm= bdrv_mirror_top_child_perm,
 };
 
diff --git a/block/null.c b/block/null.c
index 5d610fdfba..d442d3e901 100644
--- a/block/null.c
+++ b/block/null.c
@@ -97,10 +97,6 @@ static int null_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 return ret;
 }
 
-static void null_close(BlockDriverState *bs)
-{
-}
-
 static int64_t null_getlength(BlockDriverState *bs)
 {
 BDRVNullState *s = bs->opaque;
@@ -263,7 +259,6 @@ static BlockDriver bdrv_null_co = {
 
 .bdrv_file_open = null_file_open,
 .bdrv_parse_filename= null_co_parse_filename,
-.bdrv_close = null_close,
 .bdrv_getlength = null_getlength,
 
 .bdrv_co_preadv = null_co_preadv,
@@ -283,7 +278,6 @@ static BlockDriver bdrv_null_aio = {
 
 .bdrv_file_open = null_file_open,
 .bdrv_parse_filename= null_aio_parse_filename,
-.bdrv_close = null_close,
 .bdrv_getlength = null_getlength,
 
 .bdrv_aio_preadv= null_aio_preadv,
diff --git a/block/raw-format.c b/block/raw-format.c
index 2fd69cdb08..6f6dc99b2c 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -459,10 +459,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 return 0;
 }
 
-static void raw_close(BlockDriverState *bs)
-{
-}
-
 static int raw_probe(const uint8_t *buf,

[Qemu-devel] [PATCH 04/10] target/arm: Implement AArch32 HVBAR

2018-08-14 Thread Peter Maydell

Implement the AArch32 HVBAR register; we can do this just by
making the existing VBAR_EL2 regdefs be STATE_BOTH.

Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 14fd78f587a..b6412fe9d1f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3750,7 +3750,7 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
 
 /* Used to describe the behaviour of EL2 regs when EL2 does not exist.  */
 static const ARMCPRegInfo el3_no_el2_cp_reginfo[] = {
-{ .name = "VBAR_EL2", .state = ARM_CP_STATE_AA64,
+{ .name = "VBAR_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 0, .opc2 = 0,
   .access = PL2_RW,
   .readfn = arm_cp_read_zero, .writefn = arm_cp_write_ignore },
@@ -3899,7 +3899,7 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 0,
   .access = PL2_RW,
   .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_HYP]) },
-{ .name = "VBAR_EL2", .state = ARM_CP_STATE_AA64,
+{ .name = "VBAR_EL2", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 0, .opc2 = 0,
   .access = PL2_RW, .writefn = vbar_write,
   .fieldoffset = offsetof(CPUARMState, cp15.vbar_el[2]),
-- 
2.18.0

[Qemu-devel] [PATCH 10/10] target/arm: Implement support for taking exceptions to Hyp mode

2018-08-14 Thread Peter Maydell

Implement the necessary support code for taking exceptions
to Hyp mode in AArch32.

Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 146 +---
 1 file changed, 123 insertions(+), 23 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 80855302089..167203ac664 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8013,6 +8013,123 @@ void aarch64_sync_64_to_32(CPUARMState *env)
 env->regs[15] = env->pc;
 }
 
+static void take_aarch32_exception(CPUARMState *env, int new_mode,
+   uint32_t mask, uint32_t offset,
+   uint32_t newpc)
+{
+/* Change the CPU state so as to actually take the exception. */
+switch_mode(env, new_mode);
+/*
+ * For exceptions taken to AArch32 we must clear the SS bit in both
+ * PSTATE and in the old-state value we save to SPSR_, so zero it 
now.
+ */
+env->uncached_cpsr &= ~PSTATE_SS;
+env->spsr = cpsr_read(env);
+/* Clear IT bits.  */
+env->condexec_bits = 0;
+/* Switch to the new mode, and to the correct instruction set.  */
+env->uncached_cpsr = (env->uncached_cpsr & ~CPSR_M) | new_mode;
+/* Set new mode endianness */
+env->uncached_cpsr &= ~CPSR_E;
+if (env->cp15.sctlr_el[arm_current_el(env)] & SCTLR_EE) {
+env->uncached_cpsr |= CPSR_E;
+}
+env->daif |= mask;
+
+if (new_mode == ARM_CPU_MODE_HYP) {
+env->thumb = (env->cp15.sctlr_el[2] & SCTLR_TE) != 0;
+env->elr_el[2] = env->regs[15];
+} else {
+/*
+ * this is a lie, as there was no c1_sys on V4T/V5, but who cares
+ * and we should just guard the thumb mode on V4
+ */
+if (arm_feature(env, ARM_FEATURE_V4T)) {
+env->thumb =
+(A32_BANKED_CURRENT_REG_GET(env, sctlr) & SCTLR_TE) != 0;
+}
+env->regs[14] = env->regs[15] + offset;
+}
+env->regs[15] = newpc;
+}
+
+static void arm_cpu_do_interrupt_aarch32_hyp(CPUState *cs)
+{
+/*
+ * Handle exception entry to Hyp mode; this is sufficiently
+ * different to entry to other AArch32 modes that we handle it
+ * separately here.
+ *
+ * The vector table entry used is always the 0x14 Hyp mode entry point,
+ * unless this is an UNDEF/HVC/abort taken from Hyp to Hyp.
+ * The offset applied to the preferred return address is always zero
+ * (see DDI0487C.a section G1.12.3).
+ * PSTATE A/I/F masks are set based only on the SCR.EA/IRQ/FIQ values.
+ */
+uint32_t addr, mask;
+ARMCPU *cpu = ARM_CPU(cs);
+CPUARMState *env = &cpu->env;
+
+switch (cs->exception_index) {
+case EXCP_UDEF:
+addr = 0x04;
+break;
+case EXCP_SWI:
+addr = 0x14;
+break;
+case EXCP_BKPT:
+/* Fall through to prefetch abort.  */
+case EXCP_PREFETCH_ABORT:
+env->cp15.ifar_s = env->exception.vaddress;
+qemu_log_mask(CPU_LOG_INT, "...with HIFAR 0x%x\n",
+  (uint32_t)env->exception.vaddress);
+addr = 0x0c;
+break;
+case EXCP_DATA_ABORT:
+env->cp15.dfar_s = env->exception.vaddress;
+qemu_log_mask(CPU_LOG_INT, "...with HDFAR 0x%x\n",
+  (uint32_t)env->exception.vaddress);
+addr = 0x10;
+break;
+case EXCP_IRQ:
+addr = 0x18;
+break;
+case EXCP_FIQ:
+addr = 0x1c;
+break;
+case EXCP_HVC:
+addr = 0x08;
+break;
+case EXCP_HYP_TRAP:
+addr = 0x14;
+default:
+cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
+}
+
+if (cs->exception_index != EXCP_IRQ && cs->exception_index != EXCP_FIQ) {
+env->cp15.esr_el[2] = env->exception.syndrome;
+}
+
+if (arm_current_el(env) != 2 && addr < 0x14) {
+addr = 0x14;
+}
+
+mask = 0;
+if (!(env->cp15.scr_el3 & SCR_EA)) {
+mask |= CPSR_A;
+}
+if (!(env->cp15.scr_el3 & SCR_IRQ)) {
+mask |= CPSR_I;
+}
+if (!(env->cp15.scr_el3 & SCR_IRQ)) {
+mask |= CPSR_F;
+}
+
+addr += env->cp15.hvbar;
+
+take_aarch32_exception(env, ARM_CPU_MODE_HYP, mask, 0, addr);
+}
+
 static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
@@ -8048,6 +8165,11 @@ static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
 env->cp15.mdscr_el1 = deposit64(env->cp15.mdscr_el1, 2, 4, moe);
 }
 
+if (env->exception.target_el == 2) {
+arm_cpu_do_interrupt_aarch32_hyp(cs);
+return;
+}
+
 /* TODO: Vectored interrupt controller.  */
 switch (cs->exception_index) {
 case EXCP_UDEF:
@@ -8155,29 +8277,7 @@ static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
 env->cp15.scr_el3 &= ~SCR_NS;
 }
 
-switch_mode (env, new_mode);
-/* For exceptions taken to AArch32 we must clear the SS bit in both
- * PSTATE and in the old-s

[Qemu-devel] [PATCH 01/10] target/arm: Correct typo in HAMAIR1 regdef name

2018-08-14 Thread Peter Maydell

We implement the HAMAIR1 register as RAZ/WI; we had a typo in the
regdef, though, and were incorrectly naming it HMAIR1 (which is
a different register which we also implement as RAZ/WI).

Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 8b07bf214ec..2c5e02c0b1a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3773,7 +3773,7 @@ static const ARMCPRegInfo el3_no_el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 10, .crm = 3, .opc2 = 0,
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
-{ .name = "HMAIR1", .state = ARM_CP_STATE_AA32,
+{ .name = "HAMAIR1", .state = ARM_CP_STATE_AA32,
   .opc1 = 4, .crn = 10, .crm = 3, .opc2 = 1,
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
@@ -3925,7 +3925,7 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
 /* HAMAIR1 is mapped to AMAIR_EL2[63:32] */
-{ .name = "HMAIR1", .state = ARM_CP_STATE_AA32,
+{ .name = "HAMAIR1", .state = ARM_CP_STATE_AA32,
   .opc1 = 4, .crn = 10, .crm = 3, .opc2 = 1,
   .access = PL2_RW, .type = ARM_CP_CONST,
   .resetvalue = 0 },
-- 
2.18.0

Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Zhoujian (jay)

> -Original Message-
> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Tuesday, August 14, 2018 7:52 PM
> To: Paolo Bonzini 
> Cc: Zhoujian (jay) ; qemu-devel@nongnu.org;
> quint...@redhat.com; wangxin (U) 
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> * Paolo Bonzini (pbonz...@redhat.com) wrote:
> > On 14/08/2018 12:48, Jay Zhou wrote:
> > > I got the following error when migrating a VM with watchdog
> > > device:
> > >
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 668099},
> > > "event": "WATCHDOG", "data": {"action": "reset"}}
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 677658},
> > > "event": "RESET", "data": {"guest": true}}
> > > {"timestamp": {"seconds": 1533884471, "microseconds": 677874},
> > > "event": "STOP"}
> > > qemu-system-x86_64: invalid runstate transition: 'prelaunch' ->
> 'postmigrate'
> > > Aborted
> > >
> > > The run state transition is RUN_STATE_FINISH_MIGRATE to
> > > RUN_STATE_PRELAUNCH, then the migration thread aborted when it tries to
> set RUN_STATE_POSTMIGRATE.
> > > There is a race between the main loop thread and the migration thread I
> think.
> >
> > In that case I think you shouldn't go to POSTMIGRATE at all, because
> > the VM has been reset.
> 
> Migration has the VM stopped; it's not expecting the state to change at that
> point.
> 
> > Alternatively, when the watchdog fires in RUN_STATE_FINISH_MIGRATE
> > state, it might delay the action until after the "cont" command is
> > invoked on the source, but I'm not sure what's the best way to achieve
> > that...
> 
> Jay: Which watchdog were you using?

Hi Dave,
it is i6300esb, which uses QEMU_CLOCK_VIRTUAL.

> 
>  a) Should the watchdog expire when the VM is stopped; I think it shouldn't -
> hw/acpi/tco.c uses a virtual timer as does i6300esb; so is the bug here that
> the watchdog being used didn't use a virtual timer?
> 
>  b) If the watchdog expires just before the VM gets stopped, is there a race
> which could hit this?  Possibly.

This is the case I met I think.

Regards,
Jay Zhou

> 
>  c) Could main_loop_should_exit guard all the 'request's by something that
> checks whether the VM is stopped?
> 
> Dave
> 
> 
> > Paolo
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH 0/2] block: make .bdrv_close optional

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

Several block drivers have to define empty .bdrv_close handler. Let's
instead make it optional.

Vladimir Sementsov-Ogievskiy (2):
  block: make .bdrv_close optional
  block: drop empty .bdrv_close handlers

 block.c  | 4 +++-
 block/blkreplay.c| 5 -
 block/commit.c   | 5 -
 block/copy-on-read.c | 6 --
 block/mirror.c   | 5 -
 block/null.c | 6 --
 block/raw-format.c   | 5 -
 block/snapshot.c | 4 +++-
 8 files changed, 6 insertions(+), 34 deletions(-)

-- 
2.11.1

[Qemu-devel] [PATCH 1/2] block: make .bdrv_close optional

2018-08-14 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c  | 4 +++-
 block/snapshot.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 39f373e035..9694018a68 100644
--- a/block.c
+++ b/block.c
@@ -3349,7 +3349,9 @@ static void bdrv_close(BlockDriverState *bs)
 bdrv_drain(bs); /* in case flush left pending I/O */
 
 if (bs->drv) {
-bs->drv->bdrv_close(bs);
+if (bs->drv->bdrv_close) {
+bs->drv->bdrv_close(bs);
+}
 bs->drv = NULL;
 }
 
diff --git a/block/snapshot.c b/block/snapshot.c
index f9903bc94e..3218a542df 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -218,7 +218,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 qobject_unref(file_options);
 qdict_put_str(options, "file", bdrv_get_node_name(file));
 
-drv->bdrv_close(bs);
+if (drv->bdrv_close) {
+drv->bdrv_close(bs);
+}
 bdrv_unref_child(bs, bs->file);
 bs->file = NULL;
 
-- 
2.11.1

Re: [Qemu-devel] [PATCH v2 02/22] config: CONFIG_SERIAL* is already in pci.mak

2018-08-14 Thread Paolo Bonzini

On 14/08/2018 14:06, Peter Maydell wrote:
> On 14 August 2018 at 12:52, Paolo Bonzini  wrote:
>> On 14/08/2018 13:40, Juan Quintela wrote:
 CONFIG_SERIAL is a dependency of both CONFIG_SERIAL and
 CONFIG_SERIAL_PCI.
>>>
>>> I guess you here mean CONFIG_SERIAL_ISA or CONFIG_SERIAL_PCI.  That is
>>> not enough.  CONFIG_SERIAL really means CONFIG_SERIAL_COMMON, and things
>>> like riscv* require it
>>
>> Right, I would put
>>
>>CONFIG_SERIAL=y
>>CONFIG_SERIAL_ISA=y
>>
>> in superio.mak and
>>
>>CONFIG_SERIAL=y
>>CONFIG_SERIAL_PCI=y
>>
>> in pci.mak.
> 
> What about the boards that use the serial.c code but do not
> have PCI, ISA or a superio chip? That is, all the boards/devices
> that call serial_mm_init() directly to create a memory-mapped
> 16550.

They just add

   CONFIG_SERIAL=y

to the .mak file.

Paolo

Re: [Qemu-devel] [PATCH] qemu-img.c: increase spacing between commands in documentation

2018-08-14 Thread Eric Blake


On 08/14/2018 03:40 AM, Kevin Wolf wrote:

And I've already expressed my opinion that it is already rather long, where
making it longer is not necessarily making it smarter.


I think if we want to improve the help text, we should split it up.

$ qemu-img --help
qemu-img version 2.12.94 (v3.0.0-rc4-5-g4fe9c3e13d-dirty)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
usage: qemu-img [standard options] command [command options]
QEMU disk image utility

 '-h', '--help'   display this help and exit
 '-V', '--version'output version information and exit
 '-T', '--trace'  [[enable=]][,events=][,file=]
  specify tracing options

Commands:

 amend   Change options of an existing disk image
 bench   Run benchmarks on a given disk image
 check   Check the disk image for consistency or repair it
 commit  Merge the disk image into its backing file
 ...

Run 'qemu-img  --help' for details.

See  for how to report bugs.
More information on the QEMU project at .


Indeed, that matches the approach that 'cvs --help' and 'git --help' 
have taken. I could live with a split along those lines as being 
something smarter.  When you want to learn the options for 'create', you 
may have to ask two different --help commands to learn everything you 
need ['--help' didn't give me enough, but told me to use 'create 
--help'], but at least you are not inundated with answers irrelevant to 
the question you are asking, so you don't have to scroll through a wall 
of text.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] mirror: Fail gracefully for source == target

2018-08-14 Thread Eric Blake


On 08/14/2018 04:58 AM, Kevin Wolf wrote:

blockdev-mirror with the same node for source and target segfaults
today: A node is in its own backing chain, so mirror_start_job() decides
that this is an active commit. When adding the intermediate nodes with
block_job_add_bdrv(), it starts the iteration through the subchain with
the backing file of source, though, so it never reaches target and
instead runs into NULL at the base.

While we could fix that by starting with source itself, there is no
point in allowing mirroring a node into itself and I wouldn't be
surprised if this caused more problems later.

So just check for this scenario and error out.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
---
  block/mirror.c | 5 +
  tests/qemu-iotests/041 | 6 ++
  tests/qemu-iotests/041.out | 4 ++--
  3 files changed, 13 insertions(+), 2 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v2] qemu-img: fix regression copying secrets during convert

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 14:39 hat Daniel P. Berrangé geschrieben:
> When the convert command is creating an output file that needs
> secrets, we need to ensure those secrets are passed to both the
> blk_new_open and bdrv_create API calls.
> 
> This is done by qemu-img extracting all opts matching the name
> suffix "key-secret". Unfortunately the code doing this was run after the
> call to bdrv_create(), which meant the QemuOpts it was extracting
> secrets from was now empty.
> 
> Previously this worked by luks as a bug meant the "key-secret"
> parameters were not purged from the QemuOpts. This bug was fixed in
> 
>   commit b76b4f604521e59f857d6177bc55f6f2e41fd392
>   Author: Kevin Wolf 
>   Date:   Thu Jan 11 16:18:08 2018 +0100
> 
> qcow2: Use visitor for options in qcow2_create()
> 
> Exposing the latent bug in qemu-img. This fix simply moves the copying
> of secrets to before the bdrv_create() call.
> 
> Signed-off-by: Daniel P. Berrangé 

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH v2 02/22] config: CONFIG_SERIAL* is already in pci.mak

2018-08-14 Thread Peter Maydell

On 14 August 2018 at 13:56, Paolo Bonzini  wrote:
> On 14/08/2018 14:06, Peter Maydell wrote:
>> What about the boards that use the serial.c code but do not
>> have PCI, ISA or a superio chip? That is, all the boards/devices
>> that call serial_mm_init() directly to create a memory-mapped
>> 16550.
>
> They just add
>
>CONFIG_SERIAL=y
>
> to the .mak file.

...but the patch that has kicked off this thread is *removing*
CONFIG_SERIAL=y from the various .mak files...

thanks
-- PMM

Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Zhoujian (jay)

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, August 14, 2018 8:02 PM
> To: Dr. David Alan Gilbert 
> Cc: Zhoujian (jay) ; qemu-devel@nongnu.org;
> quint...@redhat.com; wangxin (U) 
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >  a) Should the watchdog expire when the VM is stopped; I think it
> > shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so is
> > the bug here that the watchdog being used didn't use a virtual timer?
> 
> All watchdogs do.
> 
> >  b) If the watchdog expires just before the VM gets stopped, is there
> > a race which could hit this?  Possibly.
> 
> Yes, I think it is a race that happens just before vm_stop, but I don't
> understand why the "qemu_clock_enable" in pause_all_vcpus does not prevent it.

Hi Paolo,
The sequence is like this I think

 |
 |  <-  watchdog expired, which set reset_requested to 
SHUTDOWN_CAUSE_GUEST_RESET
 | 
 |  <-  migration thread sets to RUN_STATE_FINISH_MIGRATE, it will 
disable QEMU_CLOCK_VIRTUAL clock,
 |  but it is done after the setting of reset_requested
 |
 |  <-  main loop thread sets to RUN_STATE_PRELAUNCH since it 
detected a reset request
 |
 |  <-  migration thread sets to RUN_STATE_POSTMIGRATE


Regards,
Jay Zhou

> 
> It should be possible to write a deterministic testcase with qtest...
> 
> Paolo

Re: [Qemu-devel] [PATCH v2 02/22] config: CONFIG_SERIAL* is already in pci.mak

2018-08-14 Thread Paolo Bonzini

On 14/08/2018 14:57, Peter Maydell wrote:
> On 14 August 2018 at 13:56, Paolo Bonzini  wrote:
>> On 14/08/2018 14:06, Peter Maydell wrote:
>>> What about the boards that use the serial.c code but do not
>>> have PCI, ISA or a superio chip? That is, all the boards/devices
>>> that call serial_mm_init() directly to create a memory-mapped
>>> 16550.
>>
>> They just add
>>
>>CONFIG_SERIAL=y
>>
>> to the .mak file.
> 
> ...but the patch that has kicked off this thread is *removing*
> CONFIG_SERIAL=y from the various .mak files...

Probably because they were including pci.mak.  Indeed CONFIG_SERIAL=y
should remain in the
ARM/HPPA/Microblaze/MIPS/Mozie/NiOS2/OpenRISC/PPC/RiscV/SH/SPARC64/Xtensa
.mak files.

Another possibility would be to split .mak files per-machine, and
include the per-machine file in the toplevel.

Thanks,

Paolo

Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Paolo Bonzini

On 14/08/2018 15:03, Zhoujian (jay) wrote:
>> -Original Message-
>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>> Sent: Tuesday, August 14, 2018 8:02 PM
>> To: Dr. David Alan Gilbert 
>> Cc: Zhoujian (jay) ; qemu-devel@nongnu.org;
>> quint...@redhat.com; wangxin (U) 
>> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
>>
>> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
>>>  a) Should the watchdog expire when the VM is stopped; I think it
>>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so is
>>> the bug here that the watchdog being used didn't use a virtual timer?
>>
>> All watchdogs do.
>>
>>>  b) If the watchdog expires just before the VM gets stopped, is there
>>> a race which could hit this?  Possibly.
>>
>> Yes, I think it is a race that happens just before vm_stop, but I don't
>> understand why the "qemu_clock_enable" in pause_all_vcpus does not prevent 
>> it.
> 
> Hi Paolo,
> The sequence is like this I think
> 
>  |
>  |  <-  watchdog expired, which set reset_requested to 
> SHUTDOWN_CAUSE_GUEST_RESET
>  | 
>  |  <-  migration thread sets to RUN_STATE_FINISH_MIGRATE, it 
> will disable QEMU_CLOCK_VIRTUAL clock,
>  |  but it is done after the setting of reset_requested

So the fix would be to process the reset request here?  (In do_vm_stop
or pause_all_vcpus).  The code is currently in main_loop_should_exit().

Paolo

>  |  <-  main loop thread sets to RUN_STATE_PRELAUNCH since it 
> detected a reset request
>  |
>  |  <-  migration thread sets to RUN_STATE_POSTMIGRATE
> 
> 
> Regards,
> Jay Zhou
> 
>>
>> It should be possible to write a deterministic testcase with qtest...
>>
>> Paolo

Re: [Qemu-devel] [PATCH v0 0/7] Background snapshots

2018-08-14 Thread Mike Rapoport

On Mon, Aug 13, 2018 at 08:00:19PM +0100, Dr. David Alan Gilbert wrote:
> cc'ing in Mike*2
> * Denis Plotnikov (dplotni...@virtuozzo.com) wrote:
> > 
> > 
> > On 26.07.2018 12:23, Peter Xu wrote:
> > > On Thu, Jul 26, 2018 at 10:51:33AM +0200, Paolo Bonzini wrote:
> > > > On 25/07/2018 22:04, Andrea Arcangeli wrote:
> > > > > 
> > > > > It may look like the uffd-wp model is wish-feature similar to an
> > > > > optimization, but without the uffd-wp model when the WP fault is
> > > > > triggered by kernel code, the sigsegv model falls apart and requires
> > > > > all kind of ad-hoc changes just for this single feature. Plus uffd-wp
> > > > > has other benefits: it makes it all reliable in terms of not
> > > > > increasing the number of vmas in use during the snapshot. Finally it
> > > > > makes it faster too with no mmap_sem for reading and no sigsegv
> > > > > signals.
> > > > > 
> > > > > The non cooperative features got merged first because there was much
> > > > > activity on the kernel side on that front, but this is just an ideal
> > > > > time to nail down the remaining issues in uffd-wp I think. That I
> > > > > believe is time better spent than trying to emulate it with sigsegv
> > > > > and changing all drivers to send new events down to qemu specific to
> > > > > the sigsegv handling. We considered this before doing uffd for
> > > > > postcopy too but overall it's unreliable and more work (no single
> > > > > change was then needed to KVM code with uffd to handle postcopy and
> > > > > here it should be the same).
> > > > 
> > > > I totally agree.  The hard part in userfaultfd was the changes to the
> > > > kernel get_user_pages API, but the payback was huge because _all_ kernel
> > > > uses (KVM, vhost-net, syscalls, etc.) just work with userfaultfd.  Going
> > > > back to mprotect would be a huge mistake.
> > > 
> > > Thanks for explaining the bits.  I'd say I wasn't aware of the
> > > difference before I started the investigation (and only until now I
> > > noticed that major difference between mprotect and userfaultfd).  I'm
> > > really glad that it's much clear (at least for me) on which way we
> > > should choose.
> > > 
> > > Now I'm thinking whether we can move the userfault write protect work
> > > forward.  The latest discussion I saw so far is in 2016, when someone
> > > from Huawei tried to use the write protect feature for that old
> > > version of live snapshot but reported issue:
> > > 
> > >https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg01127.html
> > > 
> > > Is that the latest status for userfaultfd wr-protect?
> > > 
> > > If so, I'm thinking whether I can try to re-verify the work (I tried
> > > his QEMU repository but I failed to compile somehow, so I plan to
> > > write some even simpler code to try) to see whether I can get the same
> > > KVM error he encountered.
> > > 
> > > Thoughts?
> > 
> > Just to sum up all being said before.
> > 
> > Using mprotect is a bad idea because VM's memory can be accessed from the
> > number of places (KVM, vhost, ...) which need their own special care
> > of tracking memory accesses and notifying QEMU which makes the mprotect
> > using unacceptable.
> > 
> > Protected memory accesses tracking can be done via userfaultfd's WP mode
> > which isn't available right now.
> > 
> > So, the reasonable conclusion is to wait until the WP mode is available and
> > build the background snapshot on top of userfaultfd-wp.
> > But, works on adding the WP-mode is pending for a quite a long time already.
> > 
> > Is there any way to estimate when it could be available?
> 
> I think a question is whether anyone is actively working on it; I
> suspect really it's on a TODO list rather than moving at the moment.

I thought Andrea was working on it :)
 
> What I don't really understand is what stage the last version got upto.
> 
> Dave
> 
> > > 
> > > Regards,
> > > 
> > 
> > -- 
> > Best,
> > Denis
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 

-- 
Sincerely yours,
Mike.

Re: [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero

2018-08-14 Thread Eric Blake


On 08/14/2018 03:24 AM, Markus Armbruster wrote:

Should IN_BAD_ZERO also consume '.' and/or 'e' (after all, '01e2 is a
valid C constant, but not a valid JSON literal)?  But I think your
choice here is fine (again, add too much, and then the lexer has to
track a lot of state; whereas this minimal addition catches the most
obvious things with little effort).


My patch is of marginal value to begin with.  It improves error recovery
only for the "integer with redundant leading zero" case.  I guess that's
more common than "floating-point with redundant leading zero".


Yes, that was my thought.



An obvious way to extend it to "number with redundant leading zero"
would be cloning the lexer states related to numbers.  Clean, but six
more states.  Meh.

Another way is to dumb down the lexer not to care about leading zero,
and catch it in parse_literal() instead.  Basically duplicates the lexer
state machine up to where leading zero is recognized.  Not much code,
but meh.

Yet another way is to have the lexer eat "digit salad" after redundant
leading zero:

 [IN_BAD_ZERO] = {
 ['0' ... '9'] = IN_BAD_ZERO,
 ['.'] = IN_BAD_ZERO,
 ['e'] = IN_BAD_ZERO,
 ['-'] = IN_BAD_ZERO,
 ['+'] = IN_BAD_ZERO,
 },

Eats even crap like 01e...  But then the same crap without leading zero
should also be eaten.  We'd have to add state transitions to IN_BAD_ZERO
to the six states related to numbers.  Perhaps clever use of gcc's range
initialization lets us do this compactly.  Otherwise, meh again.

Opinions?


Of the various options, the patch as written seems to be the most 
minimal. Maybe tweak the commit message to call out other cases that we 
discussed as not worth doing, because the likelihood of such input is 
dramatically less.


About the only other improvement that might be worth making is adding:

[IN_BAD_ZERO] = {
  ['x'] = IN_BAD_ZERO,
  ['X'] = IN_BAD_ZERO,
  ['a' ... 'f'] = IN_BAD_ZERO,
  ['A' ... 'F'] = IN_BAD_ZERO,
}

to catch people typing JSON by hand and thinking that a hex constant 
will work.  That way, '0x1a' gets parsed as a single JSON_ERROR token, 
rather than four separate tokens of JSON_INTEGER, JSON_KEYWORD, 
JSON_INTEGER, JSON_KEYWORD (where you get the semantic error due to 
JSON_KEYWORD unexpected).




I'd also be fine with dropping this patch.


No, I like it, especially if you also like my suggestion to make error 
reporting on hex numbers resemble error reporting on octal numbers.





Reviewed-by: Eric Blake 


Thanks!



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 5/5] block: Simplify append_open_options()

2018-08-14 Thread Alberto Garcia

On Tue 14 Aug 2018 02:08:26 PM CEST, Kevin Wolf wrote:
>> > When you actually use that incorrect information, you've got a bug.
>> > Reopen with file=bar doesn't have to check whether 'file=bar' is in
>> > bs->options (because that may be outdated information), but whether
>> > the BdrvChild with the name 'file' points to a node called 'bar'.
>> 
>> Again, this is correct if we open the BDS with
>> 
>>   file.node-name=bar
>> 
>> and then we allow reopening it with
>> 
>>   file=bar
>> 
>> Different set of options, but the result still makes sense. For this
>> we need a specific check to verify that this is correct. For the
>> current behavior we don't need anything now: we only allow the exact
>> same option.
>
> That's yet another case and another reason why we want to check
> BdrvChild instead. If we take BlockdevOptions for blockdev-reopen, you
> need to put something there for nodes that you created inline
> originally, and just putting the node name there probably makes the
> most sense.
>
> Anyway, the case that I had in mind is the case where you removed a
> backing file with a block job:
>
> base <- mid <- top
>
> You stream top into mid, so at the end of the job, mid goes away and
> base is the backing file of top. But since you opened top with
> backing=mid, that's still what you get when you look at bs->options.
>
> base <- top
> (backing=mid)
>
> If you now reopen top, and bdrv_reopen looks at bs->options to check
> whether the operation is valid, you must specify backing=mid instead
> of the correct backing=base so that reopen thinks it's unchanged.

Yes, that's correct, I'm aware of that problem.

>> >> After x-blockdev-reopen is ready, 'backing' will perhaps be the
>> >> first/only exception, because we'll be able to change it and the
>> >> previous value doesn't matter.
>> >
>> > I certainly hope that it will not be the only kind of node references
>> > that you can change. Adding/removing filter nodes requires to be able
>> > to change more or less any kind of node references. But we'll see.
>> 
>> No, but anything that we allow changing has to be done on a case-by-case
>> basis. You can't simply allow changing any child reference, the specific
>> driver has to be modified in order to allow that.
>> 
>> Until the driver is changed, the current behavior prevails: if an option
>> was modified, we return an error.
>
> Makes sense to err on the safe side, though I expect that most drivers
> don't need to do much more than just allowing the switch.
>
> Maybe, if we want to keep things a bit safer, what we can do is check
> that the same node is addressed when you skip all filters.

The proper fix could be something alone those lines, yes. I'll give it a
try and see what happens.

This would be a separate patch, though, the ones that I sent shouldn't
need changes.

Berto

Re: [Qemu-devel] [PATCH 0/2] block: make .bdrv_close optional

2018-08-14 Thread Kevin Wolf

Am 14.08.2018 um 14:43 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Several block drivers have to define empty .bdrv_close handler. Let's
> instead make it optional.

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values

2018-08-14 Thread Eric Blake


On 08/08/2018 07:03 AM, Markus Armbruster wrote:

qobject_from_json() & friends use the consume_json() callback to
receive either a value or an error from the parser.

When they are fed a string that contains more than either one JSON
value or one JSON syntax error, consume_json() gets called multiple
times.

When the last call receives a value, qobject_from_json() returns that
value.  Any other values are leaked.

When any call receives an error, qobject_from_json() sets the first
error received.  Any other errors are thrown away.

When values follow errors, qobject_from_json() returns both a value
and sets an error.  That's bad.  Impact:

* block.c's parse_json_protocol() ignores and leaks the value.  It's
   used to to parse pseudo-filenames starting with "json:".  The
   pseudo-filenames can come from the user or from image meta-data such
   as a QCOW2 image's backing file name.


Fortunately, I don't think this falls in the category of a 
denial-of-service attack worthy of a CVE (memory leaks in long-running 
processes are bad, but to be an escalation attack, you'd have to 
convince someone with more rights to repeatedly reload your malicious 
image to cause them to suffer from the leak - but why would they load 
your image more than once? You can reload it yourself, but then you are 
only killing your own qemu so there is no escalation of privilege).





* vl.c's parse_display_qapi() ignores and leaks the error.  It's used
   to parse the argument of command line option -display.

* vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves
   it in @err.  main() will then pass a pointer to a non-null Error *
   to net_init_clients(), which is forbidden.  It can lead to assertion
   failure or other misbehavior.

* check-qjson.c's multiple_values() demonstrates the badness.

* The other callers are not affected since they only pass strings with
   exactly one JSON value or, in the case of negative tests, one
   error.

The impact on the _nofail() functions is relatively harmless.  They
abort when any call receives an error.  Else they return the last
value, and leak the others, if any.

Fix consume_json() as follows.  On the first call, save value and
error as before.  On subsequent calls, if any, don't save them.  If
the first call saved a value, the next call, if any, replaces the
value by an "Expecting at most one JSON value" error.  Take care not
to leak values or errors that aren't saved.

Signed-off-by: Markus Armbruster 
---
  qobject/qjson.c | 15 ++-
  tests/check-qjson.c | 10 +++---
  2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/qobject/qjson.c b/qobject/qjson.c
index 7395556069..7f69036487 100644
--- a/qobject/qjson.c
+++ b/qobject/qjson.c
@@ -33,8 +33,21 @@ static void consume_json(void *opaque, QObject *json, Error 
*err)
  {
  JSONParsingState *s = opaque;
  
+assert(!json != !err);

+assert(!s->result || !s->err);
+
+if (s->result) {


Reached whether the second item encountered was also an object (json != 
NULL) or an error (err != NULL), but seems reasonable.



+qobject_unref(s->result);
+s->result = NULL;
+error_setg(&s->err, "Expecting at most one JSON value");
+}
+if (s->err) {
+qobject_unref(json);
+error_free(err);
+return;
+}


Worth spelling this clause as error_propagate(&s->err, err)?  Oh, I see 
why you can't - you have to do the early return in the case that (json 
!= NULL) so that you don't assign s->result again if this was a parse of 
a second object.



  s->result = json;
-error_propagate(&s->err, err);
+s->err = err;
  }
  


Took me a couple of reads to verify the logic makes sense, but it looks 
right.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] hw/pci: add pci capability to pci-pci bridge

2018-08-14 Thread Laszlo Ersek

On 08/14/18 10:39, Liu, Jing2 wrote:
> Hi Laszlo,
> Sorry for late reply. I missed these mails because of wrong filter.
> And thanks very much for comments. My reply as belows.
> 
> On 8/7/2018 8:19 PM, Laszlo Ersek wrote:
>> On 08/07/18 09:04, Jing Liu wrote:
> [...]
>>> @@ -46,6 +46,12 @@ struct PCIBridgeDev {
>>>   uint32_t flags;
>>>     OnOffAuto msi;
>>> +
>>> +    /* additional resources to reserve on firmware init */
>>> +    uint64_t io_reserve;
>>> +    uint64_t mem_reserve;
>>> +    uint64_t pref32_reserve;
>>> +    uint64_t pref64_reserve;
>>>   };
>>>   typedef struct PCIBridgeDev PCIBridgeDev;
>>
>> (1) Please evaluate the following idea:
>>
> This is really good and I'm only not sure if DEFINE_PROP_ like,
> DEFINE_PROP_UINT32("bus-reserve", GenPCIERootPort,
> res_reserve.bus_reserve, -1), is a right/good way?
> I mean the third parameter "_f".

Yes, I think you can refer to members within structure fields like that.

> 
> [...]
>>
>> - gen_rp_props should be updated accordingly. The property names should
>> stay the same, but they should reference fields of the "res_reserve"
>> field.
>>
>> (2) Then, in a separate patch, you can add another "res_reserve" field
>> to "PCIBridgeDev". (As a consequence, "bus_reserve" will also be added,
>> which I think is the right thing to do.)
>>
> I consider whether we need limit the bus_reserve of pci-pci bridge. For
> old codes, we could add "unlimited" numbers of devices (but less than
> than max) onto pci-pci bridge.

In theory the bus_reserve hint makes sense too. There likely isn't a
great use case for it in practice, but that's not reason enough IMO to
diverge in the implementation (the factored-out structure and the
properties). It's certainly not wrong to offer the property, IMO.


> 
>> (3) There is a class that is derived from TYPE_PCI_BRIDGE_DEV:
>> TYPE_PCI_BRIDGE_SEAT_DEV. Is it correct to add the new properties /
>> fields to the latter as well?
>>
> Umm, I looked into the codes that doesn't have hotplug property?
> So do we need add resource reserve for it?

No clue. I don't know anything about TYPE_PCI_BRIDGE_SEAT_DEV; I'll have
to defer to Marcel and others on that.

> 
>>>   @@ -95,6 +101,13 @@ static void pci_bridge_dev_realize(PCIDevice
>>> *dev, Error **errp)
>>>   error_free(local_err);
>>>   }
>>>   +    err = pci_bridge_qemu_reserve_cap_init(dev, 0, 0,
>>> +  bridge_dev->io_reserve, bridge_dev->mem_reserve,
>>> +  bridge_dev->pref32_reserve, bridge_dev->pref64_reserve,
>>> errp);
>>> +    if (err) {
>>> +    goto cap_error;
>>> +    }
>>> +
>>>   if (shpc_present(dev)) {
>>>   /* TODO: spec recommends using 64 bit prefetcheable BAR.
>>>    * Check whether that works well. */
>>> @@ -103,6 +116,8 @@ static void pci_bridge_dev_realize(PCIDevice
>>> *dev, Error **errp)
>>>   }
>>>   return;
>>>   +cap_error:
>>> +    msi_uninit(dev);
>>
>> (4) This error handler doesn't look entirely correct; we can reach it
>> without having initialized MSI. (MSI setup is conditional; and even if
>> we attempt it, it is permitted to fail with "msi=auto".) Therefore,
>> msi_uninit() should be wrapped into "if (msi_present())", same as you
>> see in pci_bridge_dev_exitfn().
> 
> sure, can add a pre-check. But I don't understand why we need that,
> msi_uninit() will check msi_present()?

Thanks for pointing that out.

So, I grepped the source code for msi_uninit(). It seems that only one
location gates it with msi_present() -- in pci_bridge_dev_exitfn() --,
while 13 other locations just call it unconditionally.

I don't know what the consistent approach is here. I'll have to defer to
the PCI maintainers; their taste will decide. Technically your code
looks correct, then.

Thanks
Laszlo

Re: [Qemu-devel] [PATCH v2 02/22] config: CONFIG_SERIAL* is already in pci.mak

2018-08-14 Thread Juan Quintela

Peter Maydell  wrote:
> On 14 August 2018 at 12:52, Paolo Bonzini  wrote:
>> On 14/08/2018 13:40, Juan Quintela wrote:
 CONFIG_SERIAL is a dependency of both CONFIG_SERIAL and
 CONFIG_SERIAL_PCI.
>>>
>>> I guess you here mean CONFIG_SERIAL_ISA or CONFIG_SERIAL_PCI.  That is
>>> not enough.  CONFIG_SERIAL really means CONFIG_SERIAL_COMMON, and things
>>> like riscv* require it
>>
>> Right, I would put
>>
>>CONFIG_SERIAL=y
>>CONFIG_SERIAL_ISA=y
>>
>> in superio.mak and
>>
>>CONFIG_SERIAL=y
>>CONFIG_SERIAL_PCI=y
>>
>> in pci.mak.
>
> What about the boards that use the serial.c code but do not
> have PCI, ISA or a superio chip? That is, all the boards/devices
> that call serial_mm_init() directly to create a memory-mapped
> 16550.

That is what I mean that changing CONFIG_SERIAL to
$(call lor,$(CONFIG_SERIAL_ISA), $(CONFIG_SERIAL_PCI))

is not enough.


> (If anybody feels like trying to do a complicated refactoring,
> serial_mm_init() is a pretty ugly legacy API around some
> non-QOMified core 16550 code...)

It is really, really interesting, we have:

- serial_init()
  used is sh4, mips and imx serial

- serial_mm_init()
  loads of places

- serial_hds_isa_init()
  sparc64, ppc, other mips, and pc

And now you can start getting creative with qom.

O:-)

Later, Juan.

Re: [Qemu-devel] [RFC PATCH] vl: fix migration when watchdog expires

2018-08-14 Thread Zhoujian (jay)

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Tuesday, August 14, 2018 9:07 PM
> To: Zhoujian (jay) ; Dr. David Alan Gilbert
> 
> Cc: qemu-devel@nongnu.org; quint...@redhat.com; wangxin (U)
> 
> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> 
> On 14/08/2018 15:03, Zhoujian (jay) wrote:
> >> -Original Message-
> >> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> >> Sent: Tuesday, August 14, 2018 8:02 PM
> >> To: Dr. David Alan Gilbert 
> >> Cc: Zhoujian (jay) ; qemu-devel@nongnu.org;
> >> quint...@redhat.com; wangxin (U) 
> >> Subject: Re: [RFC PATCH] vl: fix migration when watchdog expires
> >>
> >> On 14/08/2018 13:52, Dr. David Alan Gilbert wrote:
> >>>  a) Should the watchdog expire when the VM is stopped; I think it
> >>> shouldn't - hw/acpi/tco.c uses a virtual timer as does i6300esb; so
> >>> is the bug here that the watchdog being used didn't use a virtual timer?
> >>
> >> All watchdogs do.
> >>
> >>>  b) If the watchdog expires just before the VM gets stopped, is
> >>> there a race which could hit this?  Possibly.
> >>
> >> Yes, I think it is a race that happens just before vm_stop, but I
> >> don't understand why the "qemu_clock_enable" in pause_all_vcpus does not
> prevent it.
> >
> > Hi Paolo,
> > The sequence is like this I think
> >
> >  |
> >  |  <-  watchdog expired, which set reset_requested to
> SHUTDOWN_CAUSE_GUEST_RESET
> >  |
> >  |  <-  migration thread sets to RUN_STATE_FINISH_MIGRATE, it
> will disable QEMU_CLOCK_VIRTUAL clock,
> >  |  but it is done after the setting of reset_requested
> 
> So the fix would be to process the reset request here?  (In do_vm_stop or
> pause_all_vcpus).  The code is currently in main_loop_should_exit().

I think this makes sense, process the reset request after the QEMU_CLOCK_VIRTUAL
disabled, will have a try.

Regards,
Jay Zhou

> 
> Paolo
> 
> >  |  <-  main loop thread sets to RUN_STATE_PRELAUNCH since it
> detected a reset request
> >  |
> >  |  <-  migration thread sets to RUN_STATE_POSTMIGRATE
> >
> >
> > Regards,
> > Jay Zhou
> >
> >>
> >> It should be possible to write a deterministic testcase with qtest...
> >>
> >> Paolo

Re: [Qemu-devel] [PATCH 0/5] Some bs->options fixes

2018-08-14 Thread Kevin Wolf

Am 29.06.2018 um 13:36 hat Alberto Garcia geschrieben:
> Hi everyone,
> 
> this is part of the blockdev-reopen work that I'm doing, but since
> I'll be away during most of July I thought that I could send already a
> couple of patches that I think are ready and don't need anything else
> from the rest of the series.
> 
> There's two main fixes here:
> 
>   1) bs->options are not kept up to date after an image is reopened
>  and no longer reflect its state.
> 
>   2) bs->options and bs->explicit_options also contain the options of
>  a BDS's children, so there's data that is duplicated and will be
>  inconsistent as soon as you change the children's options
>  directly.
> 
> The fix for (2) involves removing all children options from both
> QDicts. In the cases of node name references ("backing": "node-name")
> those remain in the QDict (they're technically parent options). I
> think we don't really need them and it should be possible to get rid
> of them, but it's a little more complicated (we need them during
> bdrv_reopen() to ensure that the user didn't try to change any of
> them).

Thanks, applied to the block branch.

Kevin

[Qemu-devel] [PATCH v2] megasas: fix sglist leak

2018-08-14 Thread Marc-André Lureau

tests/cdrom-test -p /x86_64/cdrom/boot/megasas

Produces the following ASAN leak.

==25700==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x7f06f8faac48 in malloc (/lib64/libasan.so.5+0xeec48)
#1 0x7f06f87a73c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
#2 0x55a729f17738 in pci_dma_sglist_init 
/home/elmarco/src/qq/include/hw/pci/pci.h:818
#3 0x55a729f2a706 in megasas_map_dcmd 
/home/elmarco/src/qq/hw/scsi/megasas.c:698
#4 0x55a729f39421 in megasas_handle_dcmd 
/home/elmarco/src/qq/hw/scsi/megasas.c:1574
#5 0x55a729f3f70d in megasas_handle_frame 
/home/elmarco/src/qq/hw/scsi/megasas.c:1955
#6 0x55a729f40939 in megasas_mmio_write 
/home/elmarco/src/qq/hw/scsi/megasas.c:2119
#7 0x55a729f41102 in megasas_port_write 
/home/elmarco/src/qq/hw/scsi/megasas.c:2170
#8 0x55a729220e60 in memory_region_write_accessor 
/home/elmarco/src/qq/memory.c:527
#9 0x55a7292212b3 in access_with_adjusted_size 
/home/elmarco/src/qq/memory.c:594
#10 0x55a72922cf70 in memory_region_dispatch_write 
/home/elmarco/src/qq/memory.c:1473
#11 0x55a7290f5907 in flatview_write_continue 
/home/elmarco/src/qq/exec.c:3255
#12 0x55a7290f5ceb in flatview_write /home/elmarco/src/qq/exec.c:3294
#13 0x55a7290f6457 in address_space_write /home/elmarco/src/qq/exec.c:3384
#14 0x55a7290f64a8 in address_space_rw /home/elmarco/src/qq/exec.c:3395
#15 0x55a72929ecb0 in kvm_handle_io 
/home/elmarco/src/qq/accel/kvm/kvm-all.c:1729
#16 0x55a7292a0db5 in kvm_cpu_exec 
/home/elmarco/src/qq/accel/kvm/kvm-all.c:1969
#17 0x55a7291c4212 in qemu_kvm_cpu_thread_fn 
/home/elmarco/src/qq/cpus.c:1215
#18 0x55a72a966a6c in qemu_thread_start 
/home/elmarco/src/qq/util/qemu-thread-posix.c:504
#19 0x7f06ed486593 in start_thread (/lib64/libpthread.so.0+0x7593)

Move the qemu_sglist_destroy() from megasas_complete_command() to
megasas_unmap_frame(), so map/unmap are balanced.

Signed-off-by: Marc-André Lureau 
---
 hw/scsi/megasas.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index ba1afa3c1e..7bee734718 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -465,6 +465,7 @@ static void megasas_unmap_frame(MegasasState *s, MegasasCmd 
*cmd)
 cmd->pa = 0;
 cmd->pa_size = 0;
 clear_bit(cmd->index, s->frame_map);
+qemu_sglist_destroy(&cmd->qsg);
 }
 
 /*
@@ -580,7 +581,6 @@ static void megasas_complete_frame(MegasasState *s, 
uint64_t context)
 
 static void megasas_complete_command(MegasasCmd *cmd)
 {
-qemu_sglist_destroy(&cmd->qsg);
 cmd->iov_size = 0;
 cmd->iov_offset = 0;
 
-- 
2.18.0.547.g1d89318c48

1 2 3 4 >

1 - 100 of 313 matches

Mail list logo